Extract paragraphs from Wikipedia API using PHP cURL
Posted
by Kane
on Stack Overflow
See other posts from Stack Overflow
or by Kane
Published on 2010-05-21T06:25:26Z
Indexed on
2010/05/21
7:20 UTC
Read the original article
Hit count: 386
Here's what I'm trying to do using the Wikipedia (MediaWiki) API - http://en.wikipedia.org/w/api.php
Do a GET on http://en.wikipedia.org/w/api.php?format=xml&action=opensearch&search=[keyword] to retrieve a list of suggested pages for the keyword
Loop through each suggested page using a GET on http://en.wikipedia.org/w/api.php?format=json&action=query&export&titles=[page title]
Extract any paragraphs found on the page into an array
Do something with the array
I'm stuck on #3. I can see a bunch of JSON data that includes "\n\n" between paragraphs, but for some reason the PHP explode() function doesn't work.
Essentially I just want to grab the "meat" of each Wikipedia page (not titles or any formatting, just the content) and break it by paragraph into an array.
Any ideas? Thanks!
© Stack Overflow or respective owner