How to split string into substrings of identical letters?

Multi tool use


How to split string into substrings of identical letters?
How do I split a string consisting of lowercase English alphabetical letters into substrings consisting of identical letters - so for an input:
"aaaabbcccdd"
The function should output a list:
["aaaa", "bb", "ccc", "dd"]
itertools.groupby
Should é and e be classed as identical? What have you tried?
– Sayse
51 mins ago
The string consists of only English alphabetical characters. I'll amend the post.
– Ukendar Vadivel
3 mins ago
3 Answers
3
The following list comprehension using itertools.groupby
and str.join
will work:
itertools.groupby
str.join
from itertools import groupby
s = "aaaabbcccdd"
[''.join(g) for _, g in groupby(s)]
# ["aaaa", "bb", "ccc", "dd"]
Thanks, that's a great solution!
– Ukendar Vadivel
55 mins ago
You can use regex with a back reference:
import re
from operator import itemgetter
print(list(map(itemgetter(0), re.findall(r'((.)2*)', "aaaabbcccdd"))))
This outputs:
['aaaa', 'bb', 'ccc', 'dd']
A Counter solution -
from collections import Counter
[i*j for i, j in Counter("aaaabbcccdd").iteritems()]
# OP ["aaaa", "bb", "ccc", "dd"]
Did you try this on less orderly input?
– schwobaseggl
53 mins ago
IP
"aaaabbcccddasdffas"
OP ['aaaaaa', 'ccc', 'bb', 'ddd', 'ff', 'ss']
Order should not matter since counter is going to count the alphabets– ThatBird
51 mins ago
"aaaabbcccddasdffas"
['aaaaaa', 'ccc', 'bb', 'ddd', 'ff', 'ss']
You do notice that
'aaaaaa'
is not a substring of the input in that case?– schwobaseggl
51 mins ago
'aaaaaa'
lowercase alphabetic letters into substrings consisting of identical letters
substrings consisting of identical letters tell me that all a
s should be grouped together– ThatBird
48 mins ago
lowercase alphabetic letters into substrings consisting of identical letters
a
Well, I guess the OP could be clearer or provide a more general example :) Sounds to me like only keeping "substrings" (which your op is not!) together while maintaining order. Just wanted to point out the difference.
– schwobaseggl
45 mins ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
checkout
itertools.groupby
– Chris_Rands
1 hour ago