Search Results

Search found 132 results on 6 pages for 'nokogiri'.

Page 2/6 | < Previous Page | 1 2 3 4 5 6 | Next Page >

How to search an XML when parsing it using SAX in nokogiri

- by ralph

I have a simple but huge xml file like below. I want to parse it using SAX and only print out text between the title tag. <root> <site>some site</site> <title>good title</title> </root> I have the following code: require 'rubygems' require 'nokogiri' include Nokogiri class PostCallbacks < XML::SAX::Document def start_element(element, attributes) if element == 'title' puts "found title" end end def characters(text) puts text end end parser = XML::SAX::Parser.new(PostCallbacks.new) parser.parse_file("myfile.xml") problem is that it prints text between all the tags. How can I just print text between the title tag?

Read the article
Nokogiri Truncating XML Input

- by bdorry

I am having issues with a colleagues machine truncating XML while using Nokogiri to parse a Media RSS feed. The feed is a standard Media RSS feed, and the XML is not malformed. It looks like it simply stops at a certain point in the XML and closes any tags that would have been open at that current point in the document. (Unfortunately I do not have the XML avialable to me right now, but I will update this question with the actual XML when I have it available to me). My confusion comes from it working fine on my machine (OSX 10.6, Nokogiri 1.4.4) while it in correctly on his machine using the same setup - however his machine is a few years older. I imagine that there is a difference somewhere but unfortunately I don't know what to look for. Any thoughts or direction would be greatly appreciated.

Read the article
Nokogiri extract data from xml

- by Awea

Hi guys, i try to extract data from a xml in rails application with the Nokogiri gem, the xml : <item> <description> <img src="something" title="anothething"> <p>text, bla bla...</p> </description> </item> Actually i do something like this to extract data from the xml : def test_content @return = Array.new site = 'http://www.les-encens.com/modules/feeder/rss.php?id_category=0' @doc = Nokogiri::XML(open(site, "UserAgent" => "Ruby-OpenURI")) @doc.xpath("//item").each do |n| @return << [ n.xpath('description') ] end end Could you show me how extract just the src attribute from the img tag ?

Read the article
Sax Parsing strange element with nokogiri

- by SHUMAcupcake

I want to sax-parse in nokogiri, but when it comes to parse xml element that have a long and crazy xml element name or a attribute on it.. then everthing goes crazy. Fore instans if I like to parse this xml file and grab all the title element, how do I do that with nokogiri-sax. <titles> <title xml:lang="sv">Arkivvetenskap</title> <title xml:lang="en">Archival science</title> </titles>

Read the article
RVM 1.9.1 & nokogiri

- by scaney

Having trouble installing the nokogiri gem under rvm ruby 1.9.1. gem install nokogiri I'm getting ... /usr/include/libxml2... no libxml2 is missing. try 'port install libxml2' or 'yum install libxml2-devel' *** extconf.rb failed *** but i checked: sudo apt-get install libxml2 and i got: Reading state information... Done libxml2 is already the newest version. is this a root thing perhaps? RVM runs everything in userspace.

Read the article
Nokogiri HttParty Xpath Ruby on Rails

- by Brian

I am working with a mmorpg (Eve Online) request that returns xml. I am using httparty for the request and I am trying to use nokogiri to obtain attribute values for a specific element. Here's an example of the response: <eveapi version="2"><currentTime>2012-10-19 22:41:56</currentTime><result><rowset name="transactions" key="refID" columns="date,refID,refTypeID,ownerName1,ownerID1,ownerName2,ownerID2,argName1,argID1,amount,balance,reason,taxReceiverID,taxAmount"><row date="2012-10-18 23:41:50" refID="232323" refTypeID="9" ownerName1="University of Caille" ownerID1="32232" ownerName2="name" ownerID2="34343" argName1="" argID1="0" amount="5000.00" balance="5000.00" reason="Starter fund" taxReceiverID="" taxAmount=""/></rowset></result><cachedUntil>2012-10-19 23:03:40</cachedUntil></eveapi> I only need to access attributes for the element "row" and there can be many rows returned. I have read about xpath and from what I understand if I do the following it should return all rows: doc.xpath('row') however it does not return anything. Here's what I have so far: options = {:keyID => 111111, :vCode => 'fddfdfdfdf'} response = HTTParty.post('https://api.eveonline.com/char/WalletJournal.xml.aspx', :body => options) doc = Nokogiri::XML(response.body) doc.xpath('row').each do |r| end The loop is never executed. What am I doing wrong? I need to return all row elements and gain access to each of the row's attributes. Thanks.

Read the article
How to get Nokogiri to ignore HTML elements that doesn't exist

- by user296507

any idea how i can get the code below to produce this output? 1 - 2 - B i'm getting this error "undefined method `text' for nil:NilClass (NoMethodError)", because i think table 1 does not have the element 'td class=r2' in it. require 'rubygems' require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML.parse(<<-eohtml) <table class="t1"> <tbody> <tr> <td class="r1">1</td> </tr> </tbody> </table> <table class="t2"> <tbody> <tr> <td class="r1">2</td> <td class="r2">B</td> </tr> </tbody> </table> eohtml doc.css('tbody > tr').each do |n| r1 = n.at_css(".r1").text r2 = n.at_css(".r2").text puts "#{r1} - #{r2}" end

Read the article
Ruby - Nokogiri - Need to put node.value to an array

- by r3nrut

What I'm trying to do is read the value for all the nodes in this XML and put them into an array. This should be simple but for some reason it's driving me nuts. XML <ArrayOfAddress> <Address> <AddressId>297424fe-cfff-4ee1-8faa-162971d2645f</AddressId> <FirstName>George</FirstName> <LastName>Washington</LastName> <Address1>123 Main St</Address1> <Address2>Apt #611</Address2> <City>New York</City> <State>NY</State> <PostalCode>10110</PostalCode> <CountryCode>US</CountryCode> <EmailAddress>[email protected]</EmailAddress> <PhoneNumber>5555551234</PhoneNumber> <AddressType>CustomerAddress</AddressType> </Address> </ArrayOfAddress> Code class MassageRepsone def parse_resp @@get_address.url_builder #URL passed through HTTPClient - @@resp is the xml above doc = Nokogiri::XML::Reader(@@resp) @@values = doc.each do |node| node.value end end @@get_address.parse_resp obj = [@@values] Array(obj) p obj end The code snippet from above returns the following: 297424fe-cfff-4ee1-8faa-162971d2645f George Washington 123 Main St Apt #622 New York NY 10110 US test.test.com 5555551234 CustomerAddress I tried putting @@values to a string and applying chomp but that just prints the newlines as nil and puts quotes around the values. Not sure what the next step is or if I need to approach this differently with Nokogiri.

Read the article
RUBY Nokogiri CSS HTML Parsing

- by user296507

I'm having some problems trying to get the code below to output the data in the format that I want. What I'm after is the following: CCC1-$5.00 CCC1-$10.00 CCC1-$15.00 CCC2-$7.00 where $7 belongs to CCC2 and the others to CCC1, but I can only manage to get the data in this format: CCC1-$5.00 CCC1-$10.00 CCC1-$15.00 CCC1-$7.00 CCC2-$5.00 CCC2-$10.00 CCC2-$15.00 CCC2-$7.00 Any help would be appreciated. require 'rubygems' require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML.parse(<<-eohtml) <div class="AAA"> <table cellspacing="0" cellpadding="0" border="0" summary="sum"> <tbody> <tr> <td class="BBB"> <span class="CCC">CCC1</span> </td> <td class="DDD"> <table cellspacing="0" cellpadding="0" border="0"> <tbody> <tr><td class="FFF">$5.00</td></tr> <tr><td class="FFF">$10.00</td></tr> <tr><td class="FFF">$15.00</td></tr> </tbody> </table> </td> </tr> </tbody> </table> <table cellspacing="0" cellpadding="0" border="0" summary="sum"> <tbody> <tr> <td class="BBB"> <span class="CCC">CCC2</span> </td> <td class="DDD"> <table cellspacing="0" cellpadding="0" border="0"> <tbody> <tr><td class="FFF">$7.00</td></tr> </tbody> </table> </td> </tr> </tbody> </table> </div> eohtml doc.css('td.BBB > span.CCC').each do |something| doc.css('tr > td.EEE, tr > td.FFF').each do |something_more| puts something.content + '-'+ something_more.content end end

Read the article
Nokogiri Doc Element Not Returning Correctly

- by TenJack

I am trying to scrape a wiktionary entry: uri = URI.parse("http://en.wiktionary.org/wiki/" + CGI.escape('abjure')) doc = Nokogiri::HTML(open(uri, 'User-Agent' => 'ruby')) but the doc shows no elements for this word. The other words work fine and this word used to work. I have no idea what changed. Anyone see anything wrong with this?

Read the article
Format form fields for bootstrap using rails+nokogiri

- by user1116573

I have the following in an initializer in a rails app that uses Twitter bootstrap so that it removes the div.field_with_errors that rails applies when validation fails on a field but also the initializer adds the help/validation text after the erroneous input field: require 'nokogiri' ActionView::Base.field_error_proc = Proc.new do |html_tag, instance| html = %(<div class="field_with_errors">#{html_tag}</div>).html_safe form_fields = [ 'textarea', 'input', 'select' ] elements = Nokogiri::HTML::DocumentFragment.parse(html_tag).css("label, " + form_fields.join(', ')) elements.each do |e| if e.node_name.eql? 'label' html = %(#{e}).html_safe elsif form_fields.include? e.node_name if instance.error_message.kind_of?(Array) html = %(#{e}<span class="help-inline"> #{instance.error_message.join(',')}</span>).html_safe else html = %(#{e}<span class="help-inline"> #{instance.error_message}</span>).html_safe end end end html end This works fine but I also need to apply the .error class to the surrounding div.control-group for each error. My initializer currently gives the following output: <div class="control-group"> <label class="control-label" for="post_message">Message</label> <div class="controls"> <input id="post_message" name="post[message]" required="required" size="30" type="text" value="" /><span class="help-inline"> can't be blank</span> </div> </div> but I need something adding to my initializer so that it adds the .error class to the div.control-group like so: <div class="control-group error"> <label class="control-label" for="post_message">Message</label> <div class="controls"> <input id="post_message" name="post[message]" required="required" size="30" type="text" value="" /><span class="help-inline"> can't be blank</span> </div> </div> The solution will probably need to allow for the fact that each validation error could have more than one label and input that are all within the same div.control-group (eg radio buttons / checkboxes / 2 text fields side by side). I assume it needs some sort of e.at_xpath() to find the div.control-group parent and add the .error class to it but I'm not sure how to do this. Can anyone help? PS This may all be possible using the formtastic or simple_form gems but I'd rather just use my own html if possible. EDIT If I put e['class'] = 'foo' in the if e.node_name.eql? 'label' section then it applies the class to the label so I think I just need to find the parent tag of e and then apply an .error class to it but I can't figure out what the xpath would be to get from label to its div.control-group parent; no combination of dots, slashes or whatever seems to work but xpath isn't my strong point.

Read the article
Scraping &#151 character (long dash) error in Nokogiri

- by DavidP6

I having trouble scraping a certain long dash that is encoded as — ; on the Time magazine site. It looks like this: —. It works fine when this dash is encoded as mdash, but when the problem dash is scraped, it is returned as unknown characters. I am using Nokogiri and am wondering if I have to use some sort of special encoding? The page says it is encoded with UTF-8.

Read the article
Nokogiri find text in paragraphs

- by astropanic

I want to replace the inner_text in all paragraphs in my XHTML document. I know I can get all text with Nokogiri like this doc.xpath("//text()") But I want only operate on text in paragraphs, how I can select all text in paragraphs without affecting eventually existent anchor texts in links ? #For example : <p>some text <a href="/">This should not be changed</a> another one</a>

Read the article
Parsing XML feed into Ruby object using nokogiri?

- by Galen King

Hi all, I am pretty green with coding in Ruby but am trying to pull an XML feed into a Ruby object as follows (ignore the ugly code please): <% doc = Nokogiri::XML(open("http://api.workflowmax.com/job.api/current?apiKey=#{@feed.service.api_key}&accountKey=#{@feed.service.account_key}")) %> <% doc.xpath('//Jobs/Job').each do |node| %> <h2><%= node['name'].text %></h2> <p><%= node['description'].text %></p> <% end %> Basically I want to iterate through each Job and output the name, description etc. What am I missing? Many thanks, Galen

Read the article
How should I do a loop a nokogiri search in ruby?

- by kim

I have the following that I retreive the title of each url from an array that contains a list of urls. require 'rubygems' require 'nokogiri' require 'open-uri' @urls = ["http://google.com", "http://yahoo.com", "http://rubyonrails.org"] @found_titles = Array.new @found_titles[0] = Nokogiri::HTML(open("#{@urls[0]}")).search("title").inner_html #this can go on forever...but #@found_titles[1] = Nokogiri::HTML(open("#{@urls[1]}")).search("title").inner_html #@found_titles[2] = Nokogiri::HTML(open("#{@urls[2]}")).search("title").inner_html puts "#{@found_titles[0]}" How should i form a loop method for this so i can get the title even when the list in @url array gets longer.

Read the article
error when trying to install nokogiri

- by sam

im trying to install nokogiri to use in a ruby on rails application to read xml files, ive been following the instructions on their page for home brew 0.9, when i try and install the libivcon from source as bellow wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.13.1.tar.gz tar xvfz libiconv-1.13.1.tar.gz cd libiconv-1.13.1 ./configure --prefix=/usr/local/Cellar/libiconv/1.13.1 make sudo make install i get the following error `make: *** No rule to make target `install'. Stop.` any idea why that might be ? sorry if the answers a real simple one im pretty new to ror / terminal and ive been going round in loops with this for almost a day, any helps much appreciated !

Read the article
Contents of a node in Nokogiri

- by Styggentorsken

Is there a way to select all the contents of a node in Nokogiri? <root> <element>this is <hi>the content</hi> of my æøå element</element> </root> The result of getting the content of /root/element should be this is <hi>the content</hi> of my æøå element Edit: It seems like the solution is simply to use myElement.inner_html(). The problem I had was in fact that I was relying on an old version of libxml2, which escaped all the special characters.

Read the article
how to use nokogiri methods .xpath & .at_xpath

- by Radek

I'm learning how to use nokogiri and few questions came to me based on the code below require 'rubygems' require 'mechanize' post_agent = WWW::Mechanize.new post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708') puts "\nabsolute path with tbody gives nil" puts post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]').xpath('text()').to_s.strip.inspect puts "\n.at_xpath gives an empty string" puts post_page.parser.at_xpath("//div[@id='posts']/div/table/tr/td/div[2]").at_xpath('text()').to_s.strip.inspect puts "\ntwo lines solution with .at_xpath gives an empty string" rows = post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]") puts rows[0].at_xpath('text()').to_s.strip.inspect puts puts "two lines working code" rows = post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]") puts rows[0].xpath('text()').to_s.strip puts "\none line working code" puts post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]")[0].xpath('text()').to_s.strip puts "\nanother one line code" puts post_page.parser.at_xpath("//div[@id='posts']/div/table/tr/td/div[2]").xpath('text()').to_s.strip puts "\none line code with full path" puts post_page.parser.xpath("/html/body/div/div/div/div/div/table/tr/td/div[2]")[0].xpath('text()').to_s.strip is it better to use // or / in xpath? @AnthonyWJones says that 'the use of an unprefixed //' is not so good idea I had to remove tbody from any working xpath otherwise I got 'nil' result. How is possible to remove an element from the xpath to get things work? do I have to use .xpath twice to extract data if not using full xpath? why I cannot make .at_xpath working to extract data? it works nicely here what is the difference?

Read the article
How to factorize common tags with nokogiri builder ?

- by plafoucriere

Hi, I'd like to create several builders, with common tags, in order to have xml docs like : <xml version="1.0"?> <a_kind_of_root>  <event_date>20100514</event_date> <event_id>123</event_id> <event_type>Conference</event_type>  <my_tag>some text</my_tag> </a_kind_of_root> </xml> <xml version="1.0"?> <another_kind_of_root>  <event_date>20100514</event_date> <event_id>123</event_id> <event_type>Conference</event_type>  <my_other_tag>some integer</my_other_tag> </another_kind_of_root> </xml> I don't know how to put the common part inside a Nokogiri::XML::Builder Thanks

Read the article
How to make Nokogiri transparently return un/encoded Html entities untouched?

- by svenfuchs

How can I use Nokogiri with having html entities (like German umlauts) untouched? I.e.: # this is fine node = Nokogiri::HTML.fragment('<p>ö</p>') node.to_s # => '<p>ö</p>' # this is not node = Nokogiri::HTML.fragment('<p>ö</p>') node.to_s # => '<p>ö</p>' # this is what I need node = Nokogiri::HTML.fragment('<p>ö</p>') node.to_s # => '<p>ö</p>' I've tried to mess with both PARSE_OPTIONS and :save_with options but could not come up with a way to have Nokogiri just transparently behave like above. Any pointers?

Read the article
rails bundler error installing nokigiri (1.5.5), and Bundler cannot continue

- by Michael Durrant

An error occurred while installing nokogiri (1.5.5), and Bundler cannot continue How to fix and get past the error? Installing nokogiri (1.5.5) with native extensions Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension. /usr/bin/ruby1.8 extconf.rb checking for libxml/parser.h... yes checking for libxslt/xslt.h... no ----- libxslt is missing. please visit http://nokogiri.org/tutorials/installing_nokogiri.html for help with installing dependencies.

Read the article
NameError: uninitialized constant Nokogiri::HTML::DocumentFragment

- by Mike Sutton

About three hours ago I started seeing the above error in my production server. It comes from a call to the sanitize gem: vendor/rails/activerecord/lib/../../activesupport/lib/active_support/dependencies.rb:276:in 'load_missing_constant' vendor/rails/activerecord/lib/../../activesupport/lib/active_support/dependencies.rb:468:in `const_missing' vendor/gems/sanitize-1.2.0/lib/sanitize.rb:91:in `clean!' vendor/gems/sanitize-1.2.0/lib/sanitize.rb:84:in `clean' vendor/gems/sanitize-1.2.0/lib/sanitize.rb:49:in `clean' app/helpers/application_helper.rb:28:in `display_none' app/views/main/_blogs.html.erb:13:in `_run_erb_47app47views47main47_blogs46html46erb' The error only occurs on the production server (linux), not my development machine (windows) I tried rolling back my latest deployment but it didn't fix it. I've updated to sanitize 1.2.0 (which was the latest version brought down by gem update sanitize, though I note my host is running 1.3.6. Can anyone provide any clues to help fix this?

Read the article
how to translate this hpricot code to nokogiri ?

- by wefwgeweg

Hpricot(html).inner_text.gsub("\r"," ").gsub("\n"," ").split(" ").join(" ") hpricot = Hpricot(html) hpricot.search("script").remove hpricot.search("link").remove hpricot.search("meta").remove hpricot.search("style").remove found it on http://www.savedmyday.com/2008/04/25/how-to-extract-text-from-html-using-rubyhpricot/

Read the article
nokogiri: wrap <tbody> around <table>'s child

- by wefwgeweg

how can i do this ? i need to place tbody after table tags, basically to emulate Firefox's behavior. Thanks.

Read the article
How Do I Select for Multiple Classes Using Nokogiri and Ruby

- by Russ Bradberry

From a table element, I would like to select all rows that have the class even or the class odd. I tried the jQuery syntax: report.css("table.data tr[class~=odd even]").each{|line| parse_line_item(line)} but it threw an error, any help is appreciated, thanks.

Read the article

< Previous Page | 1 2 3 4 5 6 | Next Page >