Aggregate unique values in list to new list

Aggregate unique values in list to new list

Trying to find the best way to aggregate values (value pairs) from a list in python.

foo = [ {'color': 'yellow', 'type': 'foo'}, {'color': 'yellow', 'type': 'bar'}, {'color': 'red', 'type': 'foo'}, {'color': 'red', 'type': 'foo'}, {'color': 'green', 'type': 'foo'}, {'color': 'red', 'type': 'bar'} ]

end goal is something like

newFoo = [ {'color': 'yellow', 'type': 'foo', 'count': 1}, {'color': 'yellow', 'type': 'bar', 'count': 1}, {'color': 'red', 'type': 'foo', 'count': 2}, {'color': 'red', 'type': 'bar', 'count': 1}, {'color': 'green', 'type': 'foo', 'count': 1} ]

I'm not very good with python but have been trying to accomplish it sort of but this is about as far as I can get:

def loop(ar): dik = for line in ar: blah = for k,v in line.items(): blah.append({k,v}) blah.append({'count':'1'}) dik.append(blah) print(dik)

any help appreciated.

In Python, list and array are not interchangeable. You are talking about lists, not arrays.
– Mad Physicist
36 mins ago

4 Answers
4

You can use Counter from collections:

Counter

collections

from collections import Counter from pprint import pprint foo = [ {'color': 'yellow', 'type': 'foo'}, {'color': 'yellow', 'type': 'bar'}, {'color': 'red', 'type': 'foo'}, {'color': 'red', 'type': 'foo'}, {'color': 'green', 'type': 'foo'}, {'color': 'red', 'type': 'bar'} ] c = Counter( tuple( (i['color'], i['type']) for i in foo)) pprint([{'color': k[0], 'type': k[1], 'count': v} for k, v in c.items()])

Output:

[{'color': 'yellow', 'count': 1, 'type': 'foo'}, {'color': 'yellow', 'count': 1, 'type': 'bar'}, {'color': 'red', 'count': 2, 'type': 'foo'}, {'color': 'green', 'count': 1, 'type': 'foo'}, {'color': 'red', 'count': 1, 'type': 'bar'}]

Edit:

If you want to sort the new list, you can do something like this:

l = sorted(newFoo, key=lambda v: (v['color'], v['type']), reverse=True) pprint(l)

Will print:

[{'color': 'yellow', 'count': 1, 'type': 'foo'}, {'color': 'yellow', 'count': 1, 'type': 'bar'}, {'color': 'red', 'count': 2, 'type': 'foo'}, {'color': 'red', 'count': 1, 'type': 'bar'}, {'color': 'green', 'count': 1, 'type': 'foo'}]

Edit:

Thanks to @MadPhysicist, you can generalize the above example:

c = Counter(tuple(item for item in i.items()) for i in foo) pprint([{**dict(k), 'count': v} for k, v in c.items()])

in a situation like this where do I handle KeyError?
– erp
19 mins ago

KeyError

@erp Do you have a KeyError? You can always do i.get('color', 'default color'), instead of i['color']
– Andrej Kesely
18 mins ago

KeyError

i.get('color', 'default color')

i['color']

It's a list, not an array. Aside from that, very nice.
– Mad Physicist
6 mins ago

@MadPhysicist corrected :)
– Andrej Kesely
4 mins ago

{'color': k[0], 'type': k[1], 'count': v} would look nicer as {**dict(k), 'count': v}. Similarly, (i['color'], i['type']) could be generalized to tuple(item for item in i.items())
– Mad Physicist
3 mins ago

{'color': k[0], 'type': k[1], 'count': v}

{**dict(k), 'count': v}

(i['color'], i['type'])

tuple(item for item in i.items())

Here's an easy option if you don't mind duplicates. If you want only one record, Andrej's answer with Counter is great.

Counter

newFoo = [dict(d, **{'count': foo.count(d)}) for d in foo] >>> newFoo [{'color': 'yellow', 'type': 'foo', 'count': 1}, {'color': 'yellow', 'type': 'bar', 'count': 1}, {'color': 'red', 'type': 'foo', 'count': 2}, {'color': 'red', 'type': 'foo', 'count': 2}, {'color': 'green', 'type': 'foo', 'count': 1}, {'color': 'red', 'type': 'bar', 'count': 1}]

You can always filter the values through set()
– Andrej Kesely
14 mins ago

set()

dicts aren't hashable
– bphi
9 mins ago

what about put them inside set in tuple-form and then back to dict? I like your answer in any case :)
– Andrej Kesely
7 mins ago

Haha, this took me longer that I want to admit and there is a lot of better answers, but I did this in an old-fashion way and maybe this helps you understand how to achieve it without fancy libraries.

# You clone the list before making any checks, # because you can't iterate an empty list. new_foo = foo for old in foo: # for each item in the old list for new in new_foo: # we make a check to find that item in the new one if old['type'] == new['type'] and old['color'] == new['color']: # and if those 2 keys match if not 'count' in new: # we try to find the count key new['count'] = 1 # add it if it wasn't found else: new['count'] = new['count'] + 1 # sum 1 if it was found break # and then stop looking, break the 2nd loop.

That should add counts on every item that we want to count. However, it leaves the repeated ones without a count key. As we cloned the list as first thing, sadly those still exist in our new list, so let's use that to filter them out.

for item in new_foo: if not 'count' in item: new_foo.remove(item)

Result:

{'color': 'yellow', 'type': 'foo', 'count': 1} {'color': 'yellow', 'type': 'bar', 'count': 1} {'color': 'red', 'type': 'foo', 'count': 2} {'color': 'green', 'type': 'foo', 'count': 1} {'color': 'red', 'type': 'bar', 'count': 1}

I am aware that there are better answers, but I think understanding the basics is important before dealing with advanced technics.

You need to check whether the pair of 'color' and 'type' are already in your list. If so, you would just iterate the 'count' value of it by one instead of appending a new one.

This is just a restatement of the question, and adds no new information. It should not be posted as an answer.
– Mad Physicist
7 mins ago

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Ciugk