Python Pandas : contains (문자열의 포함여부 판단하기)

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

달나라 노트

Python Pandas : contains (문자열의 포함여부 판단하기) 본문

Python/Python Pandas

Python Pandas : contains (문자열의 포함여부 판단하기)

CosmosProject 2021. 6. 30. 19:00

728x90

Pandas의 str.contains method는 특정 Series에 적용할 수 있으며

해당 Series에 있는 값들이 어떤 문자열을 포함하고있으면 True, 포함하고있지 않으면 False를 return합니다.

Syntax

Series.str.contains(string/pattern, case=True/False, regex=True/False)

string/pattern : 찾을 문자열 또는 패턴

case : True일 경우 case sensitive(대소문자 구분), False일 경우 case insensitive(대소문자 구분 안함)

regex : True일 경우 string/pattern을 regular expression pattern으로 인식. False일 경우 string/pattern을 문자 그대로 인식.

import pandas as pd

dict_test = {
    'col1': [1, 2, 3, 4, 5, 6],
    'col2': ['apple', 'abcde', 'lelele', 'Ppa', 'xyzab', '123']
}

df_test = pd.DataFrame(dict_test)

s = df_test.loc[:, 'col2'] # 1. Series 생성
s = s.str.contains('pp', case=False, regex=False) # 2. Series에 있는 값 중 pp라는 텍스트가 포함되어있는지 여부를 체크함.
print(s)


-- Result
0     True
1    False
2    False
3     True
4    False
5    False
Name: col2, dtype: bool

위 예시를 봅시다.

1. loc를 이용해 DataFrame에서 col2의 데이터만 뽑아 Series로 만들었습니다.

2. Series에 있는 값 중 pp라는 text가 포함되어있는지 여부를 return합니다.

결과에서 보이듯이 return되는 값은 Series이며 pp라는 문자가 포함되어있는 index = 0, 3의 값(apple, Ppa)은 True, 그리고 나머지는pp라는 텍스트가 없으니 False가 return되었습니다.

여기서 case=False이므로 대소문자를 구분하지 않습니다.

따라서 apple에는 pp가 포함되어있으니 True

Ppa에도 pp가 포함되어있으니 True입니다.

import pandas as pd

dict_test = {
    'col1': [1, 2, 3, 4, 5, 6],
    'col2': ['apple', 'abcde', 'lelele', 'Ppa', 'xyzab', '123']
}

df_test = pd.DataFrame(dict_test)

s = df_test.loc[:, 'col2'] # 1. Series 생성
s = s.str.contains('pp', case=True, regex=False) # 2. Series에 있는 값 중 pp라는 텍스트를 찾음.
print(s)


-- Result
0     True
1    False
2    False
3    False
4    False
5    False
Name: col2, dtype: bool

case=True로 변경하면 대소문자를 구분합니다.

따라서 Ppa는 더 이상 pp라는 문자를 포함하지 않은 것으로 판단되어 index=3 행의 결과값은 False로 return됩니다.

import pandas as pd

dict_test = {
    'col1': [1, 2, 3, 4, 5, 6],
    'col2': ['apple', 'abcde', 'lelele', 'Ppa', 'xyzab', '123']
}

df_test = pd.DataFrame(dict_test)

s = df_test.loc[:, 'col2'] # Series 생성
s = s.str.contains('a+', case=False, regex=True) # Series에 있는 값 중 a가 1개 이상 포함되면 True return
print(s)


-- Result
0     True
1     True
2    False
3     True
4     True
5    False
Name: col2, dtype: bool

regex=True로 설정하면 a+를 문자 그대로가 아니라 정규표현식 패턴으로 봅니다.

a+는 a가 1개 이상 존재한다는 의미로서

Series에 a가 1개 이상 포함된 문자열들에 대해서만 True값이 반환되었습니다.

import pandas as pd

dict_test = {
    'col1': [1, 2, 3, 4, 5, 6],
    'col2': ['apple', 'abcde', 'lelele', 'Ppa', 'xyzab', '123']
}

df_test = pd.DataFrame(dict_test)

s = df_test.loc[:, 'col2'] # Series 생성
s = s.str.contains('a+', case=False, regex=True) # Series에 있는 값 중 a가 1개 이상 포함되면 True return

df_test = df_test.loc[s, :]
print(df_test)


-- Result
   col1   col2
0     1  apple
1     2  abcde
3     4    Ppa
4     5  xyzab

contains는 위 예시처럼

loc와 같이 사용하여 특정 문자를 포함하는 행(row)만 추출할 때 사용될 수 있습니다.

728x90

'Python > Python Pandas' 카테고리의 다른 글

Python Pandas : groupby & rolling (window function 흉내내기) (0)	2021.07.02
Python Pandas : min, max (컬럼간의 값 비교하기) (0)	2021.07.02
Python Pandas : pandas.io.sql.get_schema (DataFrame 내용을 sql create table syntax로 만들기) (0)	2021.06.13
Python Pandas : values (DataFrame을 numpy arrary 형태로 변환하기) (0)	2021.06.11
Python Pandas : shape (DataFrame의 행/열 개수(DataFrame 크기) 반환) (0)	2021.06.11

'Python/Python Pandas' Related Articles

Comments

달나라 노트

Python Pandas : contains (문자열의 포함여부 판단하기) 본문

Python Pandas : contains (문자열의 포함여부 판단하기)

'Python > Python Pandas' 카테고리의 다른 글

티스토리툴바