Perform cumulative count on a pandas column taking order into account

Multi tool use


Perform cumulative count on a pandas column taking order into account
I have a pandas data frame
pd.DataFrame(columns=["A", "B"], data=[['id1','a'],['id1','a'], ['id1','a'], ['id1','b'], ['id1','b'], ['id1','a'], ['id1','a'], ['id2','c'], ['id2','c'], ['id2','a'], ['id2','c']])
A B
0 id1 a
1 id1 a
2 id1 a
3 id1 b
4 id1 b
5 id1 a
6 id1 a
7 id2 c
8 id2 c
9 id2 a
10 id2 c
I want to do a group by that gives me cumulative sum of occurrences in B column considering the order
A B C
0 id1 a 3
1 b 2
2 a 2
3 id2 c 2
4 a 1
5 c 1
1 Answer
1
You can use itertools.groupby
followed by a list comprehension. This works because, unlike Pandas GroupBy
, the itertools
version does not aggregate like elements which are non-adjacent.
itertools.groupby
GroupBy
itertools
from itertools import groupby
grouper = groupby(zip(df['A'], df['B']))
res = [(i1, i2, len(list(j))) for (i1, i2), j in grouper]
df = pd.DataFrame(res, columns=['A', 'B', 'C'])
print(df)
A B C
0 id1 a 3
1 id1 b 2
2 id1 a 2
3 id2 c 2
4 id2 a 1
5 id2 c 1
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.