an algorhithm for filtering out raw txt files
Posted
by
Roman Luštrik
on Stack Overflow
See other posts from Stack Overflow
or by Roman Luštrik
Published on 2011-01-07T18:50:04Z
Indexed on
2011/01/07
18:53 UTC
Read the original article
Hit count: 199
Imagine you have a .txt
file of the following structure:
>>> header
>>> header
>>> header
K L M
200 0.1 1
201 0.8 1
202 0.01 3
...
800 0.4 2
>>> end of file
50 0.1 1
75 0.78 5
...
I would like to read all the data except lines denoted by >>>
and lines below the >>> end of file
line.
So far I've solved this using read.table(comment.char = ">", skip = x, nrow = y)
(x
and y
are currently fixed). This reads the data between the header and >>> end of file
.
However, I would like to make my function a bit more plastic regarding the number of rows. Data may have values larger than 800, and consequently more rows.
I could scan
or readLines
the file and see which row corresponds to the >>> end of file
and calculate the number of lines to be read. What approach would you use?
© Stack Overflow or respective owner