Search Results

Search found 1 results on 1 pages for 'user300890'.

Page 1/1 | 1 

  • Python: Most efficient way to concatenate and rearrange files

    - by user300890
    Hi, I am reading from several files, each file is divided into 2 pieces, first a header section of a few thousand lines followed by a body of a few thousand. My problem is I need to concatenate these files into one file where all the headers are on the top followed by the body. Currently I am using two loops; one to pull out all the headers and write them, and the second to write the body of each file (I also include a tmp_count variable to limit the number of lines to be loading into memory before dumping to file). This is pretty slow - about 6min for 13gb file. Can anyone tell me how to optimize this or if there is a faster way to do this in python ? Thanks! Here is my code: def cat_files_sam(final_file_name,work_directory_master,file_count): final_file = open(final_file_name,"w") if len(file_count) > 1: file_count=sort_output_files(file_count) # only for @ headers for bowtie_file in file_count: #print bowtie_file tmp_list = [] tmp_count = 0 for line in open(os.path.join(work_directory_master,bowtie_file)): if line.startswith("@"): if tmp_count == 1000000: final_file.writelines(tmp_list) tmp_list = [] tmp_count = 0 tmp_list.append(line) tmp_count += 1 else: final_file.writelines(tmp_list) break for bowtie_file in file_count: #print bowtie_file tmp_list = [] tmp_count = 0 for line in open(os.path.join(work_directory_master,bowtie_file)): if line.startswith("@"): continue if tmp_count == 1000000: final_file.writelines(tmp_list) tmp_list = [] tmp_count = 0 tmp_list.append(line) tmp_count += 1 final_file.writelines(tmp_list) final_file.close()

    Read the article

1