Python Regex : Match Only Alpha Numeric,hyphen and white space

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


Python Regex : Match Only Alpha Numeric,hyphen and white space



Though there are several topics similar, i couldn't arrive to a solution. Hence creating a new question



I have a flat file saved in utf-8 format, which i try to parse and read data.I need to ignore all unicode characters from the string and include only alpha numeric ,hyphen and white space.Below is the regex i use


formatted_text= re.sub('[^a-zA-Z0-9s-]','',original_string)



But this does not remove the Unicode characters. If i remove s from the regex pattern it ignores uni codes but also the white space.



Here is the screen shot of data.Since i couldn't copy it in the same format i am attaching the screen shot



Screen Shot





Please, don't post images with text. Edit your question and put the data in text format.
– Andrej Kesely
25 mins ago







Hi sorry for that.When i try to put the data in text format it's not letting me copy those special characters. Shall i consolidate the images to one rather than having 3.Or do you want me to remove the images still ?
– Muthu
17 mins ago





You can leave the images, but it's easier for others to test your code/input/output when it's in text form. Not many people will copy the characters by hand. You should add also text then (with proper formatting).
– Andrej Kesely
14 mins ago





I tried your example and it works, here is online regex: regex101.com/r/OWBivH/1
– Andrej Kesely
1 min ago









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Makefile test if variable is not empty

Will Oldham

'Series' object is not callable Error / Statsmodels illegal variable name