Splitting 25mb .txt file into smaller files using text delimiter

Posted by user574141 on Stack Overflow See other posts from Stack Overflow or by user574141
Published on 2011-01-13T11:45:39Z Indexed on 2011/01/13 11:53 UTC
Read the original article Hit count: 238

Filed under:
|
|

Regards, SO

I am new to python and Perl. I have been trying to solve a simple problem and getting tied in knots with syntax. I hope someone has the time and patience to help. I have a 25mb file in ".txt" format which contains news-wire articles going back to 1970. Each news story is concatenated to the next, with only the "Copyright" statement to delimit. Each news story starts with "Item XX of XXX DOCUMENTS". There are certain metadata that are repeated throughout, I will use these for tagging later on.

I wish to split this 25mb file into separate .txt files, each containing one news story (i.e. the text between "DOCUMENTS" and "Copyright", saving each with a different name (obviously).

I am trying to 1 ) open the file... 2) iterate over lines in the file checking for the eof delimiter, and if it is not present writing the line to a list 3)write that list to a seperate small file.

I'm having big problems with changing filenames using the counter, and how do I make Python start from where I left off, is the "seek" function appropriate?

so far I have been trying this approach, completely unsuccessfully:

myfile = open ("myfile.txt", 'r')
filenumber = 0
for line in myfile.readline():  
    filenumber += 1    
    w=0  
    while myfile.readline() != '\s+DOCUMENTS\s*\n'  
    ### read my line into a list  
    mysmallfile()['w'] = [myfile.readline()]  
    w += 1  
    output = open('C:\\Users\\dunner7\\Documents\###how do I change the filename      each     iteration???', 'w')  
    output.writelines(mysmallfile)   
    ###go back to start.   

Thank you for your time and patience.

RD

© Stack Overflow or respective owner

Related posts about python

Related posts about file