Python Regex : Match Only Alpha Numeric,hyphen and white space
Python Regex : Match Only Alpha Numeric,hyphen and white space
Though there are several topics similar, i couldn't arrive to a solution. Hence creating a new question
I have a flat file saved in utf-8 format, which i try to parse and read data.I need to ignore all unicode characters from the string and include only alpha numeric ,hyphen and white space.Below is the regex i use
formatted_text= re.sub('[^a-zA-Z0-9s-]','',original_string)
But this does not remove the Unicode characters. If i remove s from the regex pattern it ignores uni codes but also the white space.
Here is the screen shot of data.Since i couldn't copy it in the same format i am attaching the screen shot
Screen Shot
Hi sorry for that.When i try to put the data in text format it's not letting me copy those special characters. Shall i consolidate the images to one rather than having 3.Or do you want me to remove the images still ?
– Muthu
17 mins ago
You can leave the images, but it's easier for others to test your code/input/output when it's in text form. Not many people will copy the characters by hand. You should add also text then (with proper formatting).
– Andrej Kesely
14 mins ago
I tried your example and it works, here is online regex: regex101.com/r/OWBivH/1
– Andrej Kesely
1 min ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Please, don't post images with text. Edit your question and put the data in text format.
– Andrej Kesely
25 mins ago