week12 - pandas (11주차에 조금 당겨서 배움)

Pworkspace

week12 - pandas (11주차에 조금 당겨서 배움)

haerangssa 2024. 5. 17. 11:07

1. PANDAS: panel data system

- 데이터 정렬과 손실 데이터의 통합처리

- 데이터셋의 reshaping, pivoting, slicing, indexing, subsetting

- 데이터 구조 열 삽입 지우기

- 데이터 셋에 split-applu-combine연산 및 merging. joining

- 다양한시계열 가능

2. Numpy와의 차이

pandas는 데이터 구조가 짜여있음

numpy는 X

3. pandas API

Series와 DataFrame, Indec, Scalars등

4. 데이터 셀렉

국외: UN통계부, OECD

국내: 공공데이터포털, 기상자료개방포털

5. 실습

# pandas!
# import NumPy and load pandas
import numpy as np
import pandas as pd

# scalar value data for no index
gdp_s1 = pd.Series([24288, 26084,26689,26338,28210])
print(gdp_s1)
# scalar value data with index
gdp_s2 = pd.Series([24288, 26084,26689,26338,28210],
index=[2006,2007,2008,2009,2010])
poverty_s1 = pd.Series([14.3,14.8,15.2,15.3,14.9],
index=[2006,2007,2008,2009,2010])
print(gdp_s2)
print(poverty_s1)

# import NumPy and load pandas
import numpy as np
import pandas as pd

# Series from Python dictionary
poverty_s2 = pd.Series({2006:14.3,2007:14.8,2008:15.2,2009:15.3,
2010:14.9})
print(poverty_s2)
# Series from ndarray
s3 = pd.Series(np.random.randn(4), index=['Jan', 'Feb', 'Mar', 'Apr'], name='series name')
#인덱싱 문자로도 가능
print(s3)
print(poverty_s2.to_numpy())


# import NumPy and load pandas
import numpy as np
import pandas as pd

# DataFrame from dict of Lists
index=pd.Series([2006,2007,2008,2009,2010])
gdp_s2 = pd.Series([24288, 26084,26689,26338,28210])
poverty_s1 = pd.Series([14.3,14.8,15.2,15.3,14.9])
data1={'Year':index, 'GDP':gdp_s2, 'Poverty':poverty_s1}
data_f1=pd.DataFrame(data1)
print(data_f1)


# DataFrame from dict of Series
gdp_s2 = pd.Series([24288, 26084,26689,26338,28210],
index=[2006,2007,2008,2009,2010])
poverty_s1 = pd.Series([14.3,14.8,15.2,15.3,14.9],
index=[2006,2007,2008,2009,2010])
data2={'GDP':gdp_s2, 'Poverty':poverty_s1}
data_f2=pd.DataFrame(data2)
print(data_f2)

print("----------------")
# import NumPy and load pandas
import numpy as np
import pandas as pd

# DataFrame from dict
data_f3 = pd.DataFrame({'Year':[2006,2007,2008,2009,2010],
'GDP':[24288, 26084,26689,26338,28210],
'Poverty':[14.3,14.8,15.2,15.3,14.9]})
print(data_f3)


# DataFrame from time series data of ndarray
index = pd.date_range('1/1/2000', periods=5)
data_f4 = pd.DataFrame(np.random.randn(5, 4), index=index,
columns=['Jan', 'Feb', 'Mar', 'Apr'])
print(data_f4)

data_f4.sort_index(axis=0, ascending=False)
data_f4.sort_values(by='Feb')
data_f4[0:3]
data_f2['Weight']=pd.Series([60,65,70,63,58],
index=[2006,2007,2008,2009,2010])
print(data_f2)
data_f2['Rate']=data_f2['Poverty']* data_f2['Weight']
print (data_f2)

# import NumPy and load pandas
import numpy as np
import pandas as pd

data_f2['Weight']=pd.Series([60,65,70,63,58],
index=[2006,2007,2008,2009,2010])
print(data_f2)
data_f2['Rate']=data_f2['Poverty']* data_f2['Weight']
print (data_f2)

del data_f2['Rate']
print (data_f2)

pd.DataFrame(data_f2, index=[2010,2009,2011],
columns=['Year','GDP','Poverty','GNP'])

data_f1.T

data_f1.describe()
data_f1.head(2)
data_f1.index
data_f1.GDP

'Pworkspace' 카테고리의 다른 글

week10 - numpy(쉅안온날) (0)	2024.05.28
week13 - pandas(2), 과제5, 기말고사 공지(?) (0)	2024.05.28
week11 - numpy(2)(histogram+ 너구리) (0)	2024.05.14
week9 - matplotlib(시험범위 귀띔) (0)	2024.04.30
week7 + 중간고사 공지 (0)	2024.04.16

현재글week12 - pandas (11주차에 조금 당겨서 배움)

뚜벅뚜벅뚜그벅

네트워크다중화, ip구조, ofdm 변조, 채널용량, 노출된노드문제, 프레임헤더 구조, arp패킷분석, 연결형 통신, phy패킷, 라우팅방식, coding rate, data rate, 토폴로지 분류, 역방향 에러정정, 오류검출코딩, capacity_샤론, 오류복구코딩, 웹 구조, 숨겨진노드문제, 프리픽스표기법,

Today :
Yesterday :

뚜벅뚜벅뚜그벅