What are the common techniques to handle user-generated HTML modified differently by different browsers?

Posted by Jakie on Programmers See other posts from Programmers or by Jakie
Published on 2011-10-07T01:41:00Z Indexed on 2011/11/13 2:07 UTC
Read the original article Hit count: 239

Filed under:
|
|

I am developing a website updater. The front end uses HTML, CSS and JavaScript, and the backend uses Python.

The way it works is that <p/>, <b/> and some other HTML elements can be updated by the user. To enable this, I load the webpage and, with JQuery, convert all those elements to <textarea/> elements. Once they the content of the text area is changed, I apply the change to the original elements and send it to a Python script to store the new content.

The problem is that I'm finding that different browsers change the original HTML.

  • How do you get around this issue?
  • What Python libraries do you use?
  • What techniques or application designs do you use to avoid or overcome this issue?

The problems I found are:

  • IE removes the quotes around class and id attributes. For example, <img class='abc'/> becomes <img class=abc/>.
  • Firefox removes the backslash from the line breaks: <br \> becomes <br>.
  • Some websites have very specific display technicalities, so an insertion of a simple "\n"(which IE does) can affect the display of a website. Example: changing <img class='headingpic' /><div id="maincontent"> to <img class='headingpic'/>\n <div id="maincontent"> inserts a vertical gap in IE.

The things I have unsuccessfully tried to overcome these issues:

  • Using either JQuery or Python to remove all >\n< occurences, <br> etc. But this fails because I get different patterns in IE, sometimes a ·\n, sometimes a \n···.
  • In a Python, parse the new HTML, extract the new text/content, insert it into the old HTML so the elements and format never change, just the content. This is very difficult and seems to be overkill.

© Programmers or respective owner

Related posts about python

Related posts about algorithms