Wikipedia API: list=alllinks confusion

Posted by Chris Salij on Stack Overflow See other posts from Stack Overflow or by Chris Salij
Published on 2010-06-09T10:20:01Z Indexed on 2010/06/09 10:22 UTC
Read the original article Hit count: 442

Filed under:
|
|
|

I'm doing a research project for the summer and I've got to use get some data from Wikipedia, store it and then do some analysis on it. I'm using the Wikipedia API to gather the data and I've got that down pretty well.

What my questions is in regards to the links-alllinks option in the API doc here After reading the description, both there and in the API itself (it's down and bit and I can't link directly to the section), I think I understand what it's supposed to return. However when I ran a query it gave me back something I didn't expect.

Here's the query I ran:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=google&rvprop=ids|timestamp|user|comment|content&rvlimit=1&list=alllinks&alunique&allimit=40&format=xml

Which in essence says: Get the last revision of the Google page, include the id, timestamp, user, comment and content of each revision, and return it in XML format. The allinks (I thought) should give me back a list of wikipedia pages which point to the google page (In this case the first 40 unique ones).

I'm not sure what the policy is on swears, but this is the result I got back exactly:

<?xml version="1.0"?>
<api>
    <query><normalized>
        <n from="google" to="Google" />
        </normalized>
        <pages>
            <page pageid="1092923" ns="0" title="Google">
                <revisions>
                    <rev revid="366826294" parentid="366673948" user="Citation bot" timestamp="2010-06-08T17:18:31Z" comment="Citations: [161]Tweaked: url. [[User:Mono|Mono]]" xml:space="preserve">
                        <!-- The page content, I've replaced this cos its not of interest -->
                    </rev>
                </revisions>
            </page>
        </pages>
        <alllinks>
            <l ns="0" title="!" />
            <l ns="0" title="!!" />
            <l ns="0" title="!!!" />
            <l ns="0" title="!!!!" />
            <l ns="0" title="!!!!!!!!!!!!!!!!!!!!!" />
            <l ns="0" title="!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" />
            <l ns="0" title="!!!!!!!!!!!!!!!!!!!!*was up all u hater just stopingby to show u some love*!!!!!!!!!!!!!!!!!!!!!!!!!!!" />
            <l ns="0" title="!!!!!!!!!!!!&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;********(( )))))F/W///CHRYSLER/FUCKING/FUCKING/FUCKING/I HATE THE QUEEN!!!/I AM HORRID HENRY/Chrysler Cirrus/php" />
            <l ns="0" title="!!!!!Hephaestos IS A FUCKING WHINY GUY!!!!!!" />
            <l ns="0" title="!!!!Do you really want to see this article on your default search?" />
            <l ns="0" title="!!!!Legal!!!!" />
            <l ns="0" title="!!!!YOU ARE A COCKSUCKING WHINY GREASER!!!!" />
            <l ns="0" title="!!!BESQUERKAN!!!" />
            <l ns="0" title="!!!Fuck You!!!" />
            <l ns="0" title="!!!Fuck You!!! And Then Some" />
            <l ns="0" title="!!!Fuck You!!! And Then some" />
            <l ns="0" title="!!!Fuck You!!! And then Some" />
            <l ns="0" title="!!!Fuck You!!! and Then Some" />
            <l ns="0" title="!!!Three !!! Amigos!!!" />
            <l ns="0" title="!!! (album)" />
            <l ns="0" title="!!! (band)" />
            <l ns="0" title="!!1" />
            <l ns="0" title="!!BOSS!!" />
            <l ns="0" title="!!Destroy-Oh-Boy!!" />
            <l ns="0" title="!!Fuck you!!" />
            <l ns="0" title="!!M" />
            <l ns="0" title="!!Que Corra La Voz!!" />
            <l ns="0" title="!! (chess)" />
            <l ns="0" title="!! (disambiguation)" />
            <l ns="0" title="!! 6- -.4rtist.com" />
            <l ns="0" title="!!m" />
            <l ns="0" title="!!suck my balls!!" />
            <l ns="0" title="!!~~YOU WIN~~!!" />
            <l ns="0" title="!&#039;O-!khung language" />
            <l ns="0" title="!(1)Full Name:(2)Age:(3)Sex:(4)Occupation:(5)Phone Number: (6)Delivery Address:(7)Country of Residence:. Dr.John Aboh" />
            <l ns="0" title="!-" />
            <l ns="0" title="!-My Degrassi Top 10 Episodes" />
            <l ns="0" title="!10 Show" />
            <l ns="0" title="!2005" />
            <l ns="0" title="!2006" />
        </alllinks>
    </query>
    <query-continue>
        <revisions rvstartid="366673948" />
        <alllinks alfrom="!2009" />
    </query-continue>
</api>

As you can see if you look at the <alllinks> part, its just a load of random gobbledy-gook. No nearly what I thought I'd get. I've done a fair bit of searching but I can't seem to find a direct answer to my question.

  1. What should the list=alllinks option return?
  2. Why am I getting this crap in there?

Thanks for your help

© Stack Overflow or respective owner

Related posts about api

Related posts about wikipedia