Python & Pandas: How to query if a list-type column contains something?
Python & Pandas: How to query if a list-type column contains something?
I have a dataframe, which contains info about movies. It has a column called genre
, which contains a list of genres it belongs to. For example
genre
df['genre']
## returns
0 ['comedy', 'sci-fi']
1 ['action', 'romance', 'comedy']
2 ['documentary']
3 ['crime','horror']
...
I want to know how can I query the df, so it returns the movie belongs to a cerain genre?
For example, something may like df['genre'].contains('comedy')
returns 0, 1.
df['genre'].contains('comedy')
I know for a list, I can do things like
'comedy' in ['comedy', 'sci-fi']
but in pandas, I didn't find something similar, the only thing I know is df['genre'].str.contains()
, but it didn't work for the list type.
df['genre'].str.contains()
4 Answers
4
You can use apply
for create mask
and then boolean indexing
:
apply
mask
boolean indexing
mask = df.genre.apply(lambda x: 'comedy' in x)
df1 = df[mask]
print (df1)
genre
0 [comedy, sci-fi]
1 [action, romance, comedy]
using sets
df.genre.map(set(['comedy']).issubset)
0 True
1 True
2 False
3 False
dtype: bool
df.genre[df.genre.map(set(['comedy']).issubset)]
0 [comedy, sci-fi]
1 [action, romance, comedy]
dtype: object
presented in a way I like better
comedy = set(['comedy'])
iscomedy = comedy.issubset
df[df.genre.map(iscomedy)]
more efficient
comedy = set(['comedy'])
iscomedy = comedy.issubset
df[[iscomedy(l) for l in df.genre.values.tolist()]]
using str
in two passes
slow! and not perfectly accurate!
str
df[df.genre.str.join(' ').str.contains('comedy')]
According to the source code, you can use .str.contains(..., regex=False)
.
.str.contains(..., regex=False)
True
A complete example:
import pandas as pd
data = pd.DataFrame([[['foo', 'bar']],
[['bar', 'baz']]], columns=['list_column'])
print(data)
list_column
0 [foo, bar]
1 [bar, baz]
filtered_data = data.loc[
lambda df: df.list_column.apply(
lambda l: 'foo' in l
)
]
print(filtered_data)
list_column
0 [foo, bar]
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
That was my initial thought which unfortunately doesn't work as it returns
True
even for partial string matches.– Nickil Maveli
Jan 7 '17 at 8:29