Removing words from a file
Posted
by
user1765792
on Stack Overflow
See other posts from Stack Overflow
or by user1765792
Published on 2012-10-22T16:46:14Z
Indexed on
2012/10/22
17:01 UTC
Read the original article
Hit count: 120
I'm trying to take a regular text file and remove words identified in a separate file (stopwords) containing the words to be removed separated by carriage returns ("\n").
Right now I'm converting both files into lists so that the elements of each list can be compared. I got this function to work, but it doesn't remove all of the words I have specified in the stopwords file. Any help is greatly appreciated.
def elimstops(file_str): #takes as input a string for the stopwords file location
stop_f = open(file_str, 'r')
stopw = stop_f.read()
stopw = stopw.split('\n')
text_file = open('sample.txt') #Opens the file whose stop words will be eliminated
prime = text_file.read()
prime = prime.split(' ') #Splits the string into a list separated by a space
tot_str = "" #total string
i = 0
while i < (len(stopw)):
if stopw[i] in prime:
prime.remove(stopw[i]) #removes the stopword from the text
else:
pass
i += 1
# Creates a new string from the compilation of list elements
# with the stop words removed
for v in prime:
tot_str = tot_str + str(v) + " "
return tot_str
© Stack Overflow or respective owner