Extract paragraphs from Wikipedia API using PHP cURL
- by Kane
Here's what I'm trying to do using the Wikipedia (MediaWiki) API - http://en.wikipedia.org/w/api.php
Do a GET on http://en.wikipedia.org/w/api.php?format=xml&action=opensearch&search=[keyword] to retrieve a list of suggested pages for the keyword
Loop through each suggested page using a GET on http://en.wikipedia.org/w/api.php?format=json&action=query&export&titles=[page title]
Extract any paragraphs found on the page into an array
Do something with the array
I'm stuck on #3. I can see a bunch of JSON data that includes "\n\n" between paragraphs, but for some reason the PHP explode() function doesn't work.
Essentially I just want to grab the "meat" of each Wikipedia page (not titles or any formatting, just the content) and break it by paragraph into an array.
Any ideas? Thanks!