Python & Pandas: How to query if a list-type column contains something?

Clash Royale CLAN TAG#URR8PPPPython & Pandas: How to query if a list-type column contains something?
I have a dataframe, which contains info about movies. It has a column called genre, which contains a list of genres it belongs to. For example
genre
df['genre']
## returns
0 ['comedy', 'sci-fi']
1 ['action', 'romance', 'comedy']
2 ['documentary']
3 ['crime','horror']
...
I want to know how can I query the df, so it returns the movie belongs to a cerain genre?
For example, something may like df['genre'].contains('comedy') returns 0, 1.
df['genre'].contains('comedy')
I know for a list, I can do things like
'comedy' in ['comedy', 'sci-fi']
but in pandas, I didn't find something similar, the only thing I know is df['genre'].str.contains(), but it didn't work for the list type.
df['genre'].str.contains()
4 Answers
4
You can use apply for create mask and then boolean indexing:
apply
mask
boolean indexing
mask = df.genre.apply(lambda x: 'comedy' in x)
df1 = df[mask]
print (df1)
genre
0 [comedy, sci-fi]
1 [action, romance, comedy]
using sets
df.genre.map(set(['comedy']).issubset)
0 True
1 True
2 False
3 False
dtype: bool
df.genre[df.genre.map(set(['comedy']).issubset)]
0 [comedy, sci-fi]
1 [action, romance, comedy]
dtype: object
presented in a way I like better
comedy = set(['comedy'])
iscomedy = comedy.issubset
df[df.genre.map(iscomedy)]
more efficient
comedy = set(['comedy'])
iscomedy = comedy.issubset
df[[iscomedy(l) for l in df.genre.values.tolist()]]
using str in two passes
slow! and not perfectly accurate!
str
df[df.genre.str.join(' ').str.contains('comedy')]
According to the source code, you can use .str.contains(..., regex=False).
.str.contains(..., regex=False)
True
A complete example:
import pandas as pd
data = pd.DataFrame([[['foo', 'bar']],
[['bar', 'baz']]], columns=['list_column'])
print(data)
list_column
0 [foo, bar]
1 [bar, baz]
filtered_data = data.loc[
lambda df: df.list_column.apply(
lambda l: 'foo' in l
)
]
print(filtered_data)
list_column
0 [foo, bar]
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
That was my initial thought which unfortunately doesn't work as it returns
Trueeven for partial string matches.– Nickil Maveli
Jan 7 '17 at 8:29