- by CC
Hi all,
I'm working on a piece of code to split files.
I want to split flat file (that's ok, it is working fine) and xml file.
The idea is to split based of a number of files to split:
I have a file, and I want to split it in x files (x is a parameters).
I'm doing the split by taking the size of the file and spliting the size by the number of files to split.
Then, mysolution was to use a BufferedReader and to use it like
while ((n = reader.read(buffer, 0, buffer.length)) != -1) {
{
The main problem is that for the xml file I cannot just split it, but I have to split it based on a block delimited by a start xml tag and end xml tag:
<start tag>
bla bla xml stuff
</end tag>
So I cannot cut a block at the middle. So if when I'm at the half of a block, is the size of my new file is greater than my max, I will have to read until the end of the tag, and then, to start a next file.
The problem is that I have all sort of cases, and is a bit difficult to search the end tag.
- the block reads a text until the middle of the end tag
- the block reads a text until the end of the end tag, and no more other caracter after
- etc
and in the same time to have a loop and read the next block.
Some times the end of a block concatenated with the start of the next one, I have the end xml tag.
I hope you get the idea.
My question is, does anyone have some algorithm that does that more accurate and who i treating all special cases ?
The idea is to split the file as quickly as possible.
Thanks alot.