Working with a large data object between ruby processes

Posted by Gdeglin on Stack Overflow See other posts from Stack Overflow or by Gdeglin
Published on 2010-05-26T03:15:47Z Indexed on 2010/06/02 20:34 UTC
Read the original article Hit count: 631

Filed under:

serialization

I have a Ruby hash that reaches approximately 10 megabytes if written to a file using Marshal.dump. After gzip compression it is approximately 500 kilobytes.

Iterating through and altering this hash is very fast in ruby (fractions of a millisecond). Even copying it is extremely fast.

The problem is that I need to share the data in this hash between Ruby on Rails processes. In order to do this using the Rails cache (file_store or memcached) I need to Marshal.dump the file first, however this incurs a 1000 millisecond delay when serializing the file and a 400 millisecond delay when serializing it.

Ideally I would want to be able to save and load this hash from each process in under 100 milliseconds.

One idea is to spawn a new Ruby process to hold this hash that provides an API to the other processes to modify or process the data within it, but I want to avoid doing this unless I'm certain that there are no other ways to share this object quickly.

Is there a way I can more directly share this hash between processes without needing to serialize or deserialize it?

Here is the code I'm using to generate a hash similar to the one I'm working with:

@a = []
0.upto(500) do |r|
  @a[r] = []
  0.upto(10_000) do |c|
    if rand(10) == 0 
      @a[r][c] = 1 # 10% chance of being 1
    else
      @a[r][c] = 0
    end
  end
end

@c = Marshal.dump(@a) # 1000 milliseconds
Marshal.load(@c) # 400 milliseconds

Update:

Since my original question did not receive many responses, I'm assuming there's no solution as easy as I would have hoped.

Presently I'm considering two options:

Create a Sinatra application to store this hash with an API to modify/access it.
Create a C application to do the same as #1, but a lot faster.

The scope of my problem has increased such that the hash may be larger than my original example. So #2 may be necessary. But I have no idea where to start in terms of writing a C application that exposes an appropriate API.

A good walkthrough through how best to implement #1 or #2 may receive best answer credit.

Developer IT

Working with a large data object between ruby processes - Developer IT

Working with a large data object between ruby processes

ruby-on-rails

c

ruby

Performance

serialization

Related posts about ruby-on-rails

Ruby on Rails - How can I start? [closed]

Ruby on rails: Image downloads with Authentication/Authorization/Time outs

DES3 decryption in Ruby on Rails

Ruby on Rails deployment, on "thin" server with lot of attachments

Apply Behavior Driven Development to Ruby on Rails with Rspec

Related posts about c

Categories cloud