an algorhithm for filtering out raw txt files
- by Roman Luštrik
Imagine you have a .txt file of the following structure:
>>> header
>>> header
>>> header
K L M
200 0.1 1
201 0.8 1
202 0.01 3
...
800 0.4 2
>>> end of file
50 0.1 1
75 0.78 5
...
I would like to read all the data except lines denoted by >>> and lines below the >>> end of file line.
So far I've solved this using read.table(comment.char = ">", skip = x, nrow = y) (x and y are currently fixed). This reads the data between the header and >>> end of file.
However, I would like to make my function a bit more plastic regarding the number of rows. Data may have values larger than 800, and consequently more rows.
I could scan or readLines the file and see which row corresponds to the >>> end of file and calculate the number of lines to be read. What approach would you use?