Removing words from a file
- by user1765792
I'm trying to take a regular text file and remove words identified in a separate file (stopwords) containing the words to be removed separated by carriage returns ("\n").
Right now I'm converting both files into lists so that the elements of each list can be compared. I got this function to work, but it doesn't remove all of the words I have specified in the stopwords file. Any help is greatly appreciated.
def elimstops(file_str): #takes as input a string for the stopwords file location
stop_f = open(file_str, 'r')
stopw = stop_f.read()
stopw = stopw.split('\n')
text_file = open('sample.txt') #Opens the file whose stop words will be eliminated
prime = text_file.read()
prime = prime.split(' ') #Splits the string into a list separated by a space
tot_str = "" #total string
i = 0
while i < (len(stopw)):
if stopw[i] in prime:
prime.remove(stopw[i]) #removes the stopword from the text
else:
pass
i += 1
# Creates a new string from the compilation of list elements
# with the stop words removed
for v in prime:
tot_str = tot_str + str(v) + " "
return tot_str