Perform cumulative count on a pandas column taking order into account

Perform cumulative count on a pandas column taking order into account

I have a pandas data frame

pd.DataFrame(columns=["A", "B"], data=[['id1','a'],['id1','a'], ['id1','a'], ['id1','b'], ['id1','b'], ['id1','a'], ['id1','a'], ['id2','c'], ['id2','c'], ['id2','a'], ['id2','c']]) A B 0 id1 a 1 id1 a 2 id1 a 3 id1 b 4 id1 b 5 id1 a 6 id1 a 7 id2 c 8 id2 c 9 id2 a 10 id2 c

I want to do a group by that gives me cumulative sum of occurrences in B column considering the order

A B C 0 id1 a 3 1 b 2 2 a 2 3 id2 c 2 4 a 1 5 c 1

1 Answer
1

You can use itertools.groupby followed by a list comprehension. This works because, unlike Pandas GroupBy, the itertools version does not aggregate like elements which are non-adjacent.

itertools.groupby

GroupBy

itertools

from itertools import groupby grouper = groupby(zip(df['A'], df['B'])) res = [(i1, i2, len(list(j))) for (i1, i2), j in grouper] df = pd.DataFrame(res, columns=['A', 'B', 'C']) print(df) A B C 0 id1 a 3 1 id1 b 2 2 id1 a 2 3 id2 c 2 4 id2 a 1 5 id2 c 1

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Ciugk