Resolving html entities with NSXMLParse on iPhone

Posted by Roberto on Stack Overflow See other posts from Stack Overflow or by Roberto
Published on 2010-03-03T11:43:54Z Indexed on 2010/03/15 0:39 UTC
Read the original article Hit count: 964

Hi all, i think i read every single web page relating to this problem but i still cannot find a solution to it, so here i am.

I have an HTML web page wich is not under my control and i need to parse it from my iPhone application. Here it is a sample of the web page i'm talking about:

<HTML>
  <HEAD>
    <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
  </HEAD>
  <BODY>
    <LI class="bye bye" rel="hello 1">
      <H5 class="onlytext">
        <A name="morning_part">morning</A>
      </H5>
      <DIV class="mydiv">
        <SPAN class="myclass">something about you</SPAN> 
        <SPAN class="anotherclass">
          <A href="http://www.google.it">Bye Bye &egrave; un saluto</A>
        </SPAN>
      </DIV>
    </LI>
  </BODY>
</HTML>

I'm using NSXMLParser and it is going well till it find the è html entity. It calls foundCharacters: for "Bye Bye" and then it calls resolveExternalEntityName:systemID:: with an entityName of "egrave". In this method i'm just returning the character "è" trasformed in an NSData, the foundCharacters is called again adding the string "è" to the previous one "Bye Bye " and then the parser raise the NSXMLParserUndeclaredEntityError error.

I have no DTD and i cannot change the html file i'm parsing. Do you have any ideas on this problem? Thanks in advance to all of you, Rob.

Update (12/03/2010). After the suggestion of Griffo i ended up with something like this:

data = [self replaceHtmlEntities:data];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
[parser setDelegate:self];
[parser parse];

where replaceHtmlEntities:(NSData *) is something like this:

- (NSData *)replaceHtmlEntities:(NSData *)data {

    NSString *htmlCode = [[NSString alloc] initWithData:data encoding:NSISOLatin1StringEncoding];
    NSMutableString *temp = [NSMutableString stringWithString:htmlCode];

    [temp replaceOccurrencesOfString:@"&amp;" withString:@"&" options:NSLiteralSearch range:NSMakeRange(0, [temp length])];
    [temp replaceOccurrencesOfString:@"&nbsp;" withString:@" " options:NSLiteralSearch range:NSMakeRange(0, [temp length])];
    ...
    [temp replaceOccurrencesOfString:@"&Agrave;" withString:@"À" options:NSLiteralSearch range:NSMakeRange(0, [temp length])];

    NSData *finalData = [temp dataUsingEncoding:NSISOLatin1StringEncoding];
    return finalData;

}

But i am still looking the best way to solve this problem. I will try TouchXml in the next days but i still think that there should be a way to do this using NSXMLParser API, so if you know how, feel free to write it here :)

© Stack Overflow or respective owner

Related posts about nsxmlparser

Related posts about html-entities