Aggregate unique values in list to new list
Aggregate unique values in list to new list
Trying to find the best way to aggregate values (value pairs) from a list in python.
foo = [
{'color': 'yellow', 'type': 'foo'},
{'color': 'yellow', 'type': 'bar'},
{'color': 'red', 'type': 'foo'},
{'color': 'red', 'type': 'foo'},
{'color': 'green', 'type': 'foo'},
{'color': 'red', 'type': 'bar'}
]
end goal is something like
newFoo = [
{'color': 'yellow', 'type': 'foo', 'count': 1},
{'color': 'yellow', 'type': 'bar', 'count': 1},
{'color': 'red', 'type': 'foo', 'count': 2},
{'color': 'red', 'type': 'bar', 'count': 1},
{'color': 'green', 'type': 'foo', 'count': 1}
]
I'm not very good with python but have been trying to accomplish it sort of but this is about as far as I can get:
def loop(ar):
dik =
for line in ar:
blah =
for k,v in line.items():
blah.append({k,v})
blah.append({'count':'1'})
dik.append(blah)
print(dik)
any help appreciated.
4 Answers
4
You can use Counter
from collections
:
Counter
collections
from collections import Counter
from pprint import pprint
foo = [
{'color': 'yellow', 'type': 'foo'},
{'color': 'yellow', 'type': 'bar'},
{'color': 'red', 'type': 'foo'},
{'color': 'red', 'type': 'foo'},
{'color': 'green', 'type': 'foo'},
{'color': 'red', 'type': 'bar'}
]
c = Counter( tuple( (i['color'], i['type']) for i in foo))
pprint([{'color': k[0], 'type': k[1], 'count': v} for k, v in c.items()])
Output:
[{'color': 'yellow', 'count': 1, 'type': 'foo'},
{'color': 'yellow', 'count': 1, 'type': 'bar'},
{'color': 'red', 'count': 2, 'type': 'foo'},
{'color': 'green', 'count': 1, 'type': 'foo'},
{'color': 'red', 'count': 1, 'type': 'bar'}]
Edit:
If you want to sort the new list, you can do something like this:
l = sorted(newFoo, key=lambda v: (v['color'], v['type']), reverse=True)
pprint(l)
Will print:
[{'color': 'yellow', 'count': 1, 'type': 'foo'},
{'color': 'yellow', 'count': 1, 'type': 'bar'},
{'color': 'red', 'count': 2, 'type': 'foo'},
{'color': 'red', 'count': 1, 'type': 'bar'},
{'color': 'green', 'count': 1, 'type': 'foo'}]
Edit:
Thanks to @MadPhysicist, you can generalize the above example:
c = Counter(tuple(item for item in i.items()) for i in foo)
pprint([{**dict(k), 'count': v} for k, v in c.items()])
in a situation like this where do I handle
KeyError
?– erp
19 mins ago
KeyError
@erp Do you have a
KeyError
? You can always do i.get('color', 'default color')
, instead of i['color']
– Andrej Kesely
18 mins ago
KeyError
i.get('color', 'default color')
i['color']
It's a list, not an array. Aside from that, very nice.
– Mad Physicist
6 mins ago
@MadPhysicist corrected :)
– Andrej Kesely
4 mins ago
{'color': k[0], 'type': k[1], 'count': v}
would look nicer as {**dict(k), 'count': v}
. Similarly, (i['color'], i['type'])
could be generalized to tuple(item for item in i.items())
– Mad Physicist
3 mins ago
{'color': k[0], 'type': k[1], 'count': v}
{**dict(k), 'count': v}
(i['color'], i['type'])
tuple(item for item in i.items())
Here's an easy option if you don't mind duplicates. If you want only one record, Andrej's answer with Counter
is great.
Counter
newFoo = [dict(d, **{'count': foo.count(d)}) for d in foo]
>>> newFoo
[{'color': 'yellow', 'type': 'foo', 'count': 1},
{'color': 'yellow', 'type': 'bar', 'count': 1},
{'color': 'red', 'type': 'foo', 'count': 2},
{'color': 'red', 'type': 'foo', 'count': 2},
{'color': 'green', 'type': 'foo', 'count': 1},
{'color': 'red', 'type': 'bar', 'count': 1}]
You can always filter the values through
set()
– Andrej Kesely
14 mins ago
set()
dicts aren't hashable
– bphi
9 mins ago
what about put them inside set in tuple-form and then back to dict? I like your answer in any case :)
– Andrej Kesely
7 mins ago
Haha, this took me longer that I want to admit and there is a lot of better answers, but I did this in an old-fashion way and maybe this helps you understand how to achieve it without fancy libraries.
# You clone the list before making any checks,
# because you can't iterate an empty list.
new_foo = foo
for old in foo: # for each item in the old list
for new in new_foo: # we make a check to find that item in the new one
if old['type'] == new['type'] and old['color'] == new['color']: # and if those 2 keys match
if not 'count' in new: # we try to find the count key
new['count'] = 1 # add it if it wasn't found
else:
new['count'] = new['count'] + 1 # sum 1 if it was found
break # and then stop looking, break the 2nd loop.
That should add counts on every item that we want to count. However, it leaves the repeated ones without a count key. As we cloned the list as first thing, sadly those still exist in our new list, so let's use that to filter them out.
for item in new_foo:
if not 'count' in item:
new_foo.remove(item)
Result:
{'color': 'yellow', 'type': 'foo', 'count': 1}
{'color': 'yellow', 'type': 'bar', 'count': 1}
{'color': 'red', 'type': 'foo', 'count': 2}
{'color': 'green', 'type': 'foo', 'count': 1}
{'color': 'red', 'type': 'bar', 'count': 1}
I am aware that there are better answers, but I think understanding the basics is important before dealing with advanced technics.
You need to check whether the pair of 'color' and 'type' are already in your list. If so, you would just iterate the 'count' value of it by one instead of appending a new one.
This is just a restatement of the question, and adds no new information. It should not be posted as an answer.
– Mad Physicist
7 mins ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
In Python, list and array are not interchangeable. You are talking about lists, not arrays.
– Mad Physicist
36 mins ago