Hi, I have a large text file I need to sort in Java. The format is:
word [tab] frequency [new line]
The algorithm for sorting is:
Read some of the file, filtering for purlely alphabetic words.
Once you have X number of alphabetic words, call Collections.sort and write the result to a file.
Repeat until you have finished reading the file.
Start reading two sorted files, comparing line by line for the word with higher frequency, and writing at the same time to a new file as to not load much into your memory
Repeat until all files are merged into one large file
Right now I've divided the large file into smaller ones (sorted by descending frequency) with 10,000 lines each. I know I need to somehow merge these files back together, but I'm not sure how to go about this.
I've created a LinkedList to keep track of all the files created. The algorithm says to compare each line in the two files, except I've tried a case where , say file1 = 8,6,5,3,1 and file2 = 9,8,8,8,8. Then if I compare them line by line I would get file3 = 9,8,8,6,8,5,8,3,8,1 which is incorrectly sorted (they should be in decreasing order).
I think I'm misunderstanding some part of the algorithm. If someone could point out what I should do instead, I'd greatly appreciate it. Thanks.
edit: Yes this is an assignment. We aren't allowed to increase memory unfortunately :(