Working with a large data object between ruby processes
Posted
by Gdeglin
on Stack Overflow
See other posts from Stack Overflow
or by Gdeglin
Published on 2010-05-26T03:15:47Z
Indexed on
2010/06/02
20:34 UTC
Read the original article
Hit count: 614
I have a Ruby hash that reaches approximately 10 megabytes if written to a file using Marshal.dump. After gzip compression it is approximately 500 kilobytes.
Iterating through and altering this hash is very fast in ruby (fractions of a millisecond). Even copying it is extremely fast.
The problem is that I need to share the data in this hash between Ruby on Rails processes. In order to do this using the Rails cache (file_store or memcached) I need to Marshal.dump the file first, however this incurs a 1000 millisecond delay when serializing the file and a 400 millisecond delay when serializing it.
Ideally I would want to be able to save and load this hash from each process in under 100 milliseconds.
One idea is to spawn a new Ruby process to hold this hash that provides an API to the other processes to modify or process the data within it, but I want to avoid doing this unless I'm certain that there are no other ways to share this object quickly.
Is there a way I can more directly share this hash between processes without needing to serialize or deserialize it?
Here is the code I'm using to generate a hash similar to the one I'm working with:
@a = []
0.upto(500) do |r|
@a[r] = []
0.upto(10_000) do |c|
if rand(10) == 0
@a[r][c] = 1 # 10% chance of being 1
else
@a[r][c] = 0
end
end
end
@c = Marshal.dump(@a) # 1000 milliseconds
Marshal.load(@c) # 400 milliseconds
Update:
Since my original question did not receive many responses, I'm assuming there's no solution as easy as I would have hoped.
Presently I'm considering two options:
- Create a Sinatra application to store this hash with an API to modify/access it.
- Create a C application to do the same as #1, but a lot faster.
The scope of my problem has increased such that the hash may be larger than my original example. So #2 may be necessary. But I have no idea where to start in terms of writing a C application that exposes an appropriate API.
A good walkthrough through how best to implement #1 or #2 may receive best answer credit.
© Stack Overflow or respective owner