Working with a large data object between ruby processes

Posted by Gdeglin on Stack Overflow See other posts from Stack Overflow or by Gdeglin
Published on 2010-05-26T03:15:47Z Indexed on 2010/06/02 20:34 UTC
Read the original article Hit count: 608

I have a Ruby hash that reaches approximately 10 megabytes if written to a file using Marshal.dump. After gzip compression it is approximately 500 kilobytes.

Iterating through and altering this hash is very fast in ruby (fractions of a millisecond). Even copying it is extremely fast.

The problem is that I need to share the data in this hash between Ruby on Rails processes. In order to do this using the Rails cache (file_store or memcached) I need to Marshal.dump the file first, however this incurs a 1000 millisecond delay when serializing the file and a 400 millisecond delay when serializing it.

Ideally I would want to be able to save and load this hash from each process in under 100 milliseconds.

One idea is to spawn a new Ruby process to hold this hash that provides an API to the other processes to modify or process the data within it, but I want to avoid doing this unless I'm certain that there are no other ways to share this object quickly.

Is there a way I can more directly share this hash between processes without needing to serialize or deserialize it?

Here is the code I'm using to generate a hash similar to the one I'm working with:

@a = []
0.upto(500) do |r|
  @a[r] = []
  0.upto(10_000) do |c|
    if rand(10) == 0 
      @a[r][c] = 1 # 10% chance of being 1
    else
      @a[r][c] = 0
    end
  end
end

@c = Marshal.dump(@a) # 1000 milliseconds
Marshal.load(@c) # 400 milliseconds

Update:

Since my original question did not receive many responses, I'm assuming there's no solution as easy as I would have hoped.

Presently I'm considering two options:

  1. Create a Sinatra application to store this hash with an API to modify/access it.
  2. Create a C application to do the same as #1, but a lot faster.

The scope of my problem has increased such that the hash may be larger than my original example. So #2 may be necessary. But I have no idea where to start in terms of writing a C application that exposes an appropriate API.

A good walkthrough through how best to implement #1 or #2 may receive best answer credit.

© Stack Overflow or respective owner

Related posts about ruby-on-rails

Related posts about c