Splitting 25mb .txt file into smaller files using text delimiter
Posted
by
user574141
on Stack Overflow
See other posts from Stack Overflow
or by user574141
Published on 2011-01-13T11:45:39Z
Indexed on
2011/01/13
11:53 UTC
Read the original article
Hit count: 242
Regards, SO
I am new to python and Perl. I have been trying to solve a simple problem and getting tied in knots with syntax. I hope someone has the time and patience to help. I have a 25mb file in ".txt" format which contains news-wire articles going back to 1970. Each news story is concatenated to the next, with only the "Copyright" statement to delimit. Each news story starts with "Item XX of XXX DOCUMENTS". There are certain metadata that are repeated throughout, I will use these for tagging later on.
I wish to split this 25mb file into separate .txt files, each containing one news story (i.e. the text between "DOCUMENTS" and "Copyright", saving each with a different name (obviously).
I am trying to 1 ) open the file... 2) iterate over lines in the file checking for the eof delimiter, and if it is not present writing the line to a list 3)write that list to a seperate small file.
I'm having big problems with changing filenames using the counter, and how do I make Python start from where I left off, is the "seek" function appropriate?
so far I have been trying this approach, completely unsuccessfully:
myfile = open ("myfile.txt", 'r')
filenumber = 0
for line in myfile.readline():
filenumber += 1
w=0
while myfile.readline() != '\s+DOCUMENTS\s*\n'
### read my line into a list
mysmallfile()['w'] = [myfile.readline()]
w += 1
output = open('C:\\Users\\dunner7\\Documents\###how do I change the filename each iteration???', 'w')
output.writelines(mysmallfile)
###go back to start.
Thank you for your time and patience.
RD
© Stack Overflow or respective owner