Removing words from a file

Posted by user1765792 on Stack Overflow See other posts from Stack Overflow or by user1765792
Published on 2012-10-22T16:46:14Z Indexed on 2012/10/22 17:01 UTC
Read the original article Hit count: 121

Filed under:
|
|
|
|

I'm trying to take a regular text file and remove words identified in a separate file (stopwords) containing the words to be removed separated by carriage returns ("\n").

Right now I'm converting both files into lists so that the elements of each list can be compared. I got this function to work, but it doesn't remove all of the words I have specified in the stopwords file. Any help is greatly appreciated.

def elimstops(file_str): #takes as input a string for the stopwords file location
  stop_f = open(file_str, 'r')
  stopw = stop_f.read()
  stopw = stopw.split('\n')
  text_file = open('sample.txt') #Opens the file whose stop words will be eliminated
  prime = text_file.read()
  prime = prime.split(' ') #Splits the string into a list separated by a space
  tot_str = "" #total string
  i = 0
  while i < (len(stopw)):
    if stopw[i] in prime:
      prime.remove(stopw[i]) #removes the stopword from the text
    else:
      pass
    i += 1
  # Creates a new string from the compilation of list elements 
  # with the stop words removed
  for v in prime:
    tot_str = tot_str + str(v) + " " 
  return tot_str

© Stack Overflow or respective owner

Related posts about python

Related posts about string