How to split string into substrings of identical letters?

Multi tool use
Multi tool use
The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


How to split string into substrings of identical letters?



How do I split a string consisting of lowercase English alphabetical letters into substrings consisting of identical letters - so for an input:


"aaaabbcccdd"



The function should output a list:


["aaaa", "bb", "ccc", "dd"]





checkout itertools.groupby
– Chris_Rands
1 hour ago


itertools.groupby





Should é and e be classed as identical? What have you tried?
– Sayse
51 mins ago







The string consists of only English alphabetical characters. I'll amend the post.
– Ukendar Vadivel
3 mins ago




3 Answers
3



The following list comprehension using itertools.groupby and str.join will work:


itertools.groupby


str.join


from itertools import groupby

s = "aaaabbcccdd"
[''.join(g) for _, g in groupby(s)]
# ["aaaa", "bb", "ccc", "dd"]





Thanks, that's a great solution!
– Ukendar Vadivel
55 mins ago



You can use regex with a back reference:


import re
from operator import itemgetter
print(list(map(itemgetter(0), re.findall(r'((.)2*)', "aaaabbcccdd"))))



This outputs:


['aaaa', 'bb', 'ccc', 'dd']



A Counter solution -


from collections import Counter

[i*j for i, j in Counter("aaaabbcccdd").iteritems()]
# OP ["aaaa", "bb", "ccc", "dd"]





Did you try this on less orderly input?
– schwobaseggl
53 mins ago





IP "aaaabbcccddasdffas" OP ['aaaaaa', 'ccc', 'bb', 'ddd', 'ff', 'ss'] Order should not matter since counter is going to count the alphabets
– ThatBird
51 mins ago


"aaaabbcccddasdffas"


['aaaaaa', 'ccc', 'bb', 'ddd', 'ff', 'ss']





You do notice that 'aaaaaa' is not a substring of the input in that case?
– schwobaseggl
51 mins ago




'aaaaaa'





lowercase alphabetic letters into substrings consisting of identical letters substrings consisting of identical letters tell me that all as should be grouped together
– ThatBird
48 mins ago




lowercase alphabetic letters into substrings consisting of identical letters


a





Well, I guess the OP could be clearer or provide a more general example :) Sounds to me like only keeping "substrings" (which your op is not!) together while maintaining order. Just wanted to point out the difference.
– schwobaseggl
45 mins ago






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

krX 4cYrp,rmY4,smbuez927oMmofg,cSsCs Y2o7uc,CWM L5EDiMdmB29fiqhkQ,Uq Whe8Rc
LJD8Rr0F Pq,QbEBgiYJc,26 B5M,as8mxeFrKCjmX

Popular posts from this blog

Makefile test if variable is not empty

Visual Studio Code: How to configure includePath for better IntelliSense results

Will Oldham