I need to save a few informations about some files. Nothing too fancy so I thought I would go with a simple one line per item text file. Something like this :
# write
io.print "%i %s %s\n" % [File.mtime(fname), fname, Digest::SHA1.file(fname).hexdigest]
# read
io.each do |line|
mtime, name, hash = line.scanf "%i %s %s"
end
Of course this doesn't work because a file name can contain spaces (breaking scanf) and line breaks (breaking IO#each).
The line break problem can be avoided by dropping the use of each and going with a bunch of gets(' ')
while not io.eof?
mtime = Time.at(io.gets(" ").to_i)
name = io.gets " "
hash = io.gets "\n"
end
Dealing with spaces in the names is another matter. Now we need to do some escaping.
note : I like space as a record delimiter but I'd have no issue changing it for one easier to use. In the case of filenames though, the only one that could help is ascii nul "\0" but a nul delimited file isn't really a text file anymore...
I initially had a wall of text detailing the iterations of my struggle to make a correct escaping function and its reciprocal but it was just boring and not really useful. I'll just give you the final result:
def write_name(io, val)
io << val.gsub(/([\\ ])/, "\\\\\\1") # yes that' 6 backslashes !
end
def read_name(io)
name, continued = "", true
while continued
continued = false
name += io.gets(' ').gsub(/\\(.)/) do |c|
if c=="\\\\"
"\\"
elsif c=="\\ "
continued=true
" "
else
raise "unexpected backslash escape : %p (%s %i)" % [c, io.path, io.pos]
end
end
end
return name.chomp(' ')
end
I'm not happy at all with read_name. Way too long and akward, I feel it shouldn't be that hard.
While trying to make this work I tried to come up with other ways :
the bittorrent encoded / php serialize way : prefix the file name with the length of the name then just io.read(name_len.to_i). It works but it's a real pita to edit the file by hand. At this point we're halfway to a binary format.
String#inspect : This one looks expressly made for that purpose ! Except it seems like the only way to get the value back is through eval. I hate the idea of eval-ing a string I didn't generate from trusted data.
So. Opinions ? Isn't there some lib which can do all this ? Am I missing something obvious ? How would you do that ?