stiivi - Developer IT

Search Results

Search found 2 results on 1 pages for 'stiivi'.

Page 1/1 | 1

Why is curl in Ruby slower than command-line curl?

- by Stiivi

I am trying to download more than 1m pages (URLs ending by a sequence ID). I have implemented kind of multi-purpose download manager with configurable number of download threads and one processing thread. The downloader downloads files in batches: curl = Curl::Easy.new batch_urls.each { |url_info| curl.url = url_info[:url] curl.perform file = File.new(url_info[:file], "wb") file << curl.body_str file.close # ... some other stuff } I have tried to download 8000 pages sample. When using the code above, I get 1000 in 2 minutes. When I write all URLs into a file and do in shell: cat list | xargs curl I gen all 8000 pages in two minutes. Thing is, I need it to have it in ruby code, because there is other monitoring and processing code. I have tried: Curl::Multi - it is somehow faster, but misses 50-90% of files (does not download them and gives no reason/code) multiple threads with Curl::Easy - around the same speed as single threaded Why is reused Curl::Easy slower than subsequent command line curl calls and how can I make it faster? Or what I am doing wrong? I would prefer to fix my download manager code than to make downloading for this case in a different way. Before this, I was calling command-line wget which I provided with a file with list of URLs. Howerver, not all errors were handled, also it was not possible to specify output file for each URL separately when using URL list. Now it seems to me that the best way would be to use multiple threads with system call to 'curl' command. But why when I can use directly Curl in Ruby? Code for the download manager is here, if it might help: Download Manager (I have played with timeouts, from not-setting it to various values, it did not seem help) Any hints appreciated.

Read the article

Database abstraction/adapters for ruby

- by Stiivi

What are the database abstractions/adapters you are using in Ruby? I am mainly interested in data oriented features, not in those with object mapping (like active record or data mapper). I am currently using Sequel. Are there any other options? I am mostly interested in: simple, clean and non-ambiguous API data selection (obviously), filtering and aggregation raw value selection without field mapping: SELECT col1, col2, col3 = [val1, val2, val3] not hash of { :col1 = val1 ...} API takes into account table schemas 'some_schema.some_table' in a consistent (and working) way; also reflection for this (get schema from table) database reflection: get list of table columns, their database storage types and perhaps adaptor's abstracted types table creation, deletion be able to work with other tables (insert, update) in a loop enumerating selection from another table without requiring to fetch all records from table being enumerated Purpose is to manipulate data with unknown structure at the time of writing code, which is the opposite to object mapping where structure or most of the structure is usually well known. I do not need the object mapping overhead. What are the options, including back-ends for object-mapping libraries?