Aggregate unique values in list to new list

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


Aggregate unique values in list to new list



Trying to find the best way to aggregate values (value pairs) from a list in python.


foo = [
{'color': 'yellow', 'type': 'foo'},
{'color': 'yellow', 'type': 'bar'},
{'color': 'red', 'type': 'foo'},
{'color': 'red', 'type': 'foo'},
{'color': 'green', 'type': 'foo'},
{'color': 'red', 'type': 'bar'}
]



end goal is something like


newFoo = [
{'color': 'yellow', 'type': 'foo', 'count': 1},
{'color': 'yellow', 'type': 'bar', 'count': 1},
{'color': 'red', 'type': 'foo', 'count': 2},
{'color': 'red', 'type': 'bar', 'count': 1},
{'color': 'green', 'type': 'foo', 'count': 1}
]



I'm not very good with python but have been trying to accomplish it sort of but this is about as far as I can get:


def loop(ar):
dik =
for line in ar:
blah =

for k,v in line.items():
blah.append({k,v})
blah.append({'count':'1'})
dik.append(blah)
print(dik)



any help appreciated.





In Python, list and array are not interchangeable. You are talking about lists, not arrays.
– Mad Physicist
36 mins ago




4 Answers
4



You can use Counter from collections:


Counter


collections


from collections import Counter
from pprint import pprint

foo = [
{'color': 'yellow', 'type': 'foo'},
{'color': 'yellow', 'type': 'bar'},
{'color': 'red', 'type': 'foo'},
{'color': 'red', 'type': 'foo'},
{'color': 'green', 'type': 'foo'},
{'color': 'red', 'type': 'bar'}
]

c = Counter( tuple( (i['color'], i['type']) for i in foo))
pprint([{'color': k[0], 'type': k[1], 'count': v} for k, v in c.items()])



Output:


[{'color': 'yellow', 'count': 1, 'type': 'foo'},
{'color': 'yellow', 'count': 1, 'type': 'bar'},
{'color': 'red', 'count': 2, 'type': 'foo'},
{'color': 'green', 'count': 1, 'type': 'foo'},
{'color': 'red', 'count': 1, 'type': 'bar'}]



Edit:



If you want to sort the new list, you can do something like this:


l = sorted(newFoo, key=lambda v: (v['color'], v['type']), reverse=True)
pprint(l)



Will print:


[{'color': 'yellow', 'count': 1, 'type': 'foo'},
{'color': 'yellow', 'count': 1, 'type': 'bar'},
{'color': 'red', 'count': 2, 'type': 'foo'},
{'color': 'red', 'count': 1, 'type': 'bar'},
{'color': 'green', 'count': 1, 'type': 'foo'}]



Edit:



Thanks to @MadPhysicist, you can generalize the above example:


c = Counter(tuple(item for item in i.items()) for i in foo)
pprint([{**dict(k), 'count': v} for k, v in c.items()])





in a situation like this where do I handle KeyError?
– erp
19 mins ago


KeyError





@erp Do you have a KeyError? You can always do i.get('color', 'default color'), instead of i['color']
– Andrej Kesely
18 mins ago




KeyError


i.get('color', 'default color')


i['color']





It's a list, not an array. Aside from that, very nice.
– Mad Physicist
6 mins ago





@MadPhysicist corrected :)
– Andrej Kesely
4 mins ago





{'color': k[0], 'type': k[1], 'count': v} would look nicer as {**dict(k), 'count': v}. Similarly, (i['color'], i['type']) could be generalized to tuple(item for item in i.items())
– Mad Physicist
3 mins ago


{'color': k[0], 'type': k[1], 'count': v}


{**dict(k), 'count': v}


(i['color'], i['type'])


tuple(item for item in i.items())



Here's an easy option if you don't mind duplicates. If you want only one record, Andrej's answer with Counter is great.


Counter


newFoo = [dict(d, **{'count': foo.count(d)}) for d in foo]
>>> newFoo

[{'color': 'yellow', 'type': 'foo', 'count': 1},
{'color': 'yellow', 'type': 'bar', 'count': 1},
{'color': 'red', 'type': 'foo', 'count': 2},
{'color': 'red', 'type': 'foo', 'count': 2},
{'color': 'green', 'type': 'foo', 'count': 1},
{'color': 'red', 'type': 'bar', 'count': 1}]





You can always filter the values through set()
– Andrej Kesely
14 mins ago


set()





dicts aren't hashable
– bphi
9 mins ago





what about put them inside set in tuple-form and then back to dict? I like your answer in any case :)
– Andrej Kesely
7 mins ago



Haha, this took me longer that I want to admit and there is a lot of better answers, but I did this in an old-fashion way and maybe this helps you understand how to achieve it without fancy libraries.


# You clone the list before making any checks,
# because you can't iterate an empty list.
new_foo = foo

for old in foo: # for each item in the old list
for new in new_foo: # we make a check to find that item in the new one
if old['type'] == new['type'] and old['color'] == new['color']: # and if those 2 keys match
if not 'count' in new: # we try to find the count key
new['count'] = 1 # add it if it wasn't found
else:
new['count'] = new['count'] + 1 # sum 1 if it was found
break # and then stop looking, break the 2nd loop.



That should add counts on every item that we want to count. However, it leaves the repeated ones without a count key. As we cloned the list as first thing, sadly those still exist in our new list, so let's use that to filter them out.


for item in new_foo:
if not 'count' in item:
new_foo.remove(item)



Result:


{'color': 'yellow', 'type': 'foo', 'count': 1}
{'color': 'yellow', 'type': 'bar', 'count': 1}
{'color': 'red', 'type': 'foo', 'count': 2}
{'color': 'green', 'type': 'foo', 'count': 1}
{'color': 'red', 'type': 'bar', 'count': 1}



I am aware that there are better answers, but I think understanding the basics is important before dealing with advanced technics.



You need to check whether the pair of 'color' and 'type' are already in your list. If so, you would just iterate the 'count' value of it by one instead of appending a new one.





This is just a restatement of the question, and adds no new information. It should not be posted as an answer.
– Mad Physicist
7 mins ago






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Makefile test if variable is not empty

Will Oldham

'Series' object is not callable Error / Statsmodels illegal variable name