Python & Pandas: How to query if a list-type column contains something?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


Python & Pandas: How to query if a list-type column contains something?



I have a dataframe, which contains info about movies. It has a column called genre, which contains a list of genres it belongs to. For example


genre


df['genre']

## returns

0 ['comedy', 'sci-fi']
1 ['action', 'romance', 'comedy']
2 ['documentary']
3 ['crime','horror']
...



I want to know how can I query the df, so it returns the movie belongs to a cerain genre?



For example, something may like df['genre'].contains('comedy') returns 0, 1.


df['genre'].contains('comedy')



I know for a list, I can do things like


'comedy' in ['comedy', 'sci-fi']



but in pandas, I didn't find something similar, the only thing I know is df['genre'].str.contains(), but it didn't work for the list type.


df['genre'].str.contains()




4 Answers
4



You can use apply for create mask and then boolean indexing:


apply


mask


boolean indexing


mask = df.genre.apply(lambda x: 'comedy' in x)
df1 = df[mask]
print (df1)
genre
0 [comedy, sci-fi]
1 [action, romance, comedy]



using sets


df.genre.map(set(['comedy']).issubset)

0 True
1 True
2 False
3 False
dtype: bool


df.genre[df.genre.map(set(['comedy']).issubset)]

0 [comedy, sci-fi]
1 [action, romance, comedy]
dtype: object



presented in a way I like better


comedy = set(['comedy'])
iscomedy = comedy.issubset
df[df.genre.map(iscomedy)]



more efficient


comedy = set(['comedy'])
iscomedy = comedy.issubset
df[[iscomedy(l) for l in df.genre.values.tolist()]]



using str in two passes
slow! and not perfectly accurate!


str


df[df.genre.str.join(' ').str.contains('comedy')]



According to the source code, you can use .str.contains(..., regex=False).


.str.contains(..., regex=False)





That was my initial thought which unfortunately doesn't work as it returns True even for partial string matches.
– Nickil Maveli
Jan 7 '17 at 8:29


True



A complete example:


import pandas as pd

data = pd.DataFrame([[['foo', 'bar']],
[['bar', 'baz']]], columns=['list_column'])
print(data)
list_column
0 [foo, bar]
1 [bar, baz]

filtered_data = data.loc[
lambda df: df.list_column.apply(
lambda l: 'foo' in l
)
]
print(filtered_data)
list_column
0 [foo, bar]






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Makefile test if variable is not empty

Will Oldham

'Series' object is not callable Error / Statsmodels illegal variable name