Building simple Reddit scraper

Posted by Bazant Fundator on Stack Overflow See other posts from Stack Overflow or by Bazant Fundator
Published on 2013-06-25T10:19:23Z Indexed on 2013/06/25 10:21 UTC
Read the original article Hit count: 509

Filed under:

Let's say that I would like to make a collection of images from reddit for my own amusement. I have ran the code on my development env and It haven't gone past the first page of posts (anything beyond requries the after string from the JSON. Additionally, When I turn on the validation, the whole loop breaks if the item doesn't pass it, not just the current iteration. I would be glad If you helped me understand mistakes I made.

class Link
    include Mongoid::Document
    include Mongoid::Timestamps

    field :author, type: String
    field :url, type: String

    validates_uniqueness_of :url, # no duplicates
    validates :url, uniqueness :true

end


def fetch (count, after)
    count_s = count.to_s # convert count to string
    link = "http://reddit.com/r/aww/.json?count="+count_s+"&after="+after #so it can be used there
    res = HTTParty.get(link) # GET req. to the reddit server
    json = JSON.parse(res.body) # Parse the response

    if json['kind'] == "Listing" then   # check if the retrieved item is a Listing
        for i in 1...(count) do # for each list item
            datum = json['data']['children'][i]['data'] #i-th element properties
            if datum['domain'].in?(["imgur.com", "i.imgur.com"]) then # fetch only imgur links 
                Link.create!(author: datum['author'], url: datum['url']) # save to db 
            end 
        end
        count += 25
        fetch(count, json['data']['after']) # if it retrieved the right kind of object, move on to the next page
    end 

end

fetch(25," ") # run it

Developer IT

Building simple Reddit scraper - Developer IT

Building simple Reddit scraper

ruby

JSON

sinatra

mongoid

httparty

Related posts about ruby

Setting up Rails to work with sqlserver

marshal data too short!!!

Sinatra and XML POST request

how to change ruby path from /usr/bin/ruby to /usr/local/bin/ruby

strange bundler error: tar_input.rb:49:in `initialize': not in gzip format (Zlib::GzipFile::Error) o

Related posts about JSON

Using JSON.NET for dynamic JSON parsing

Azure Mobile Services: what files does it consist of?

Retrieving Json Array

Deserializing JSON data to C# using JSON.NET

Parsing nested JSON objects with JSON Framework for Objective-C

Categories cloud