Read all words from a given text file and print a count for each

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


Read all words from a given text file and print a count for each



Test.txt contains the following sentence(How much wood would a woodchuck chuck if a woodchuck could
chuck wood.)



This program is supposed to read all words from a given text file (until eof)
and print out a count for each word. The word should be
processed case-insensitive (all capitals), punctuation should be
removed and the output should be sorted by
frequency.



However I've come to a simple problem where it's counting lines and not the words, help a brother out.



Make a translation table for getting rid of non-word characters


dropChars = "!@#$%ˆ& ()_+-={}|\:;"’<>,.?/1234567890"
dropDict = dict([(c, '') for c in dropChars])
dropTable = str.maketrans(dropDict)



Read a file and build the table.


f = open("Test.txt")
testList=list()
lineNum = 0
table = {} # dictionary: words -> set of line numbers
for line in f:
testList.append(line)
for line in testList :
lineNum += 1
words = line.upper().translate(dropTable).split()
for word in words:
if word in table:
table[word].add(lineNum)
else:
table[word] = {lineNum}
f.close()



Print the table


for word in sorted(table.keys()):
print(word, end = ": ")
for lineNum in sorted(table[word]):
print(lineNum, end = " ")
print()





Why don't you just split on the space and create a set ?
– Hearner
33 mins ago







I don't see how a set could be used to the frequency of a word. Could you demonstrate?
– NationzGG
28 mins ago




2 Answers
2



f = open('Test.txt')



cnt=0
for word in f.read().split():
print(word)
cnt +=1
print cnt



This might help you brother...although i am also a newbie in python.



This code:


from collections import Counter
data = open( 'Test1.txt' ).read() # read the file
data = ''.join( [i.upper() if i.isalpha() else ' ' for i in data] ) # remove the punctuation
c = Counter( data.split() ) # count the words
c.most_common()



prints:


[('A', 2), ('CHUCK', 2), ('WOODCHUCK', 2), ('WOOD', 2), ('WOULD', 1), ('COULD', 1), ('HOW', 1), ('MUCH', 1), ('IF', 1)]



I wonder if the code is too short? =)






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Makefile test if variable is not empty

Will Oldham

Visual Studio Code: How to configure includePath for better IntelliSense results