Python Pandas : astype (DataFrame의 컬럼 Data type 바꾸기) & dtype(Series의 Data type 추출)

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

달나라 노트

Python Pandas : astype (DataFrame의 컬럼 Data type 바꾸기) & dtype(Series의 Data type 추출) 본문

Python/Python Pandas

Python Pandas : astype (DataFrame의 컬럼 Data type 바꾸기) & dtype(Series의 Data type 추출)

CosmosProject 2021. 1. 6. 13:21

728x90

Pandas에서는 DataFrame에 있는 Column들의 Data type을 바꾸기 위해 astype이라는 method를 제공합니다.

import pandas as pd

dict_test = {
    'col1': [1, 2, 3, 4, 5],
    'col2': [1.0, 2.0, 3.0, 4.0, 5.0],
    'col3': ['1.2', '2.3', '3.4', '4.5', '5.6'],
}

df_test = pd.DataFrame(dict_test)
print(df_test)
print(type(df_test))
print('col1 dtype :', df_test['col1'].dtype)
print('col2 dtype :', df_test['col2'].dtype)
print('col3 dtype :', df_test['col3'].dtype)


- Output
   col1  col2 col3
0     1   1.0  1.2
1     2   2.0  2.3
2     3   3.0  3.4
3     4   4.0  4.5
4     5   5.0  5.6
<class 'pandas.core.frame.DataFrame'>
col1 dtype : int64
col2 dtype : float64
col3 dtype : object

먼저 test용 DataFrame을 만들어봅시다.

또한 위 예시에서 만든 DataFrame의 각 Column의 Data type을 봅시다.

- 참고

Pandas의 Series에는 dtype이라는 함수가 있는데 이것은 해당 Series에 있는 요소들의 Data type을 반환해줍니다.

따라서 위 예시에서 만든 DataFrame의 각 Column의 Data type을 알기 위해 DataFrame에서 각 Column을 Series 형태로 뽑아내어 dtype 함수를 적용해 data type을 얻어냈습니다.

col1은 1, 2, 3, 4, 5라는 요소들이 있고 이것은 정수이죠. 따라서 Data type이 int64로 정해져있습니다.

col2는 1.0, 2.0, 3.0, 4.0, 5.0이라는 요소가 있고 이것은 소수이죠. 따라서 Data type이 float64로 정해졌습니다.

col3에는 숫자가 적혀있지만 따옴표로 감싸져서 만들어졌으므로 이것은 문자(string)입니다. 따라서 col3의 Data type은 object(문자)입니다.

df_test['col1'] = df_test['col1'].astype('str')
df_test['col2'] = df_test['col2'].astype('int')
df_test['col3'] = df_test['col3'].astype('float')
print(df_test)
print(type(df_test))

print('col1 dtype :', df_test['col1'].dtype)
print('col2 dtype :', df_test['col2'].dtype)
print('col3 dtype :', df_test['col3'].dtype)


- Output
  col1  col2  col3
0    1     1   1.2
1    2     2   2.3
2    3     3   3.4
3    4     4   4.5
4    5     5   5.6
<class 'pandas.core.frame.DataFrame'>
col1 dtype : object
col2 dtype : int64
col3 dtype : float64

이제 astype을 이용해서 각 Column의 data type을 바꿔봅시다.

loc를 이용해 각 컬럼을 추출해서 해당 컬럼의 데이터를 바꾼 후 다시 원본 컬럼에 삽입해줍니다.

str은 string(문자)를 의미합니다.

int는 integer(정수)를 의미합니다.

float는 float(실수)를 의미합니다.

위 예시의 Output을 보면 Data type이 원하는대로 바뀐 것을 알 수 있습니다.

df_test.loc[:, 'col1'] = df_test.loc[:, 'col1'].astype('str')
df_test.loc[:, 'col2'] = df_test.loc[:, 'col2'].astype('int')
df_test.loc[:, 'col3'] = df_test.loc[:, 'col3'].astype('float')
print(df_test)
print(type(df_test))

print('col1 dtype :', df_test['col1'].dtype)
print('col2 dtype :', df_test['col2'].dtype)
print('col3 dtype :', df_test['col3'].dtype)


- Output
  col1  col2  col3
0    1     1   1.2
1    2     2   2.3
2    3     3   3.4
3    4     4   4.5
4    5     5   5.6
<class 'pandas.core.frame.DataFrame'>
col1 dtype : object
col2 dtype : int64
col3 dtype : float64

astype을 위처럼 loc와 함께 사용하는 경우도 가능하지만 어떤 경우에는 loc와 같이 astype을 사용하는 경우 column의 type change가 제대로 이뤄지지 않는 경우가 있으므로 첫 번째 예시처럼 사용하는 것을 권장합니다.

728x90

'Python > Python Pandas' 카테고리의 다른 글

Python Pandas : isin (각각의 요소가 DataFrame 또는 Series에 존재하는지 파악) (0)	2021.01.07
Python Pandas : sort_values (DataFrame의 정렬, DataFrame 정렬하기) (0)	2021.01.06
Python Pandas : concat (Series 합치기, DataFrame 합치기) (0)	2021.01.05
Python Pandas : value_counts (Series에 들어있는 값 개수 세기) (0)	2021.01.05
Python Pandas : pandas.to_numeric (data type을 숫자로 바꾸기) (0)	2020.11.25

'Python/Python Pandas' Related Articles

Comments

달나라 노트

Python Pandas : astype (DataFrame의 컬럼 Data type 바꾸기) & dtype(Series의 Data type 추출) 본문

Python Pandas : astype (DataFrame의 컬럼 Data type 바꾸기) & dtype(Series의 Data type 추출)

'Python > Python Pandas' 카테고리의 다른 글

티스토리툴바