What are the common techniques to handle user-generated HTML modified differently by different browsers?
Posted
by
Jakie
on Programmers
See other posts from Programmers
or by Jakie
Published on 2011-10-07T01:41:00Z
Indexed on
2011/11/13
2:07 UTC
Read the original article
Hit count: 239
I am developing a website updater. The front end uses HTML, CSS and JavaScript, and the backend uses Python.
The way it works is that <p/>
, <b/>
and some other HTML elements can be updated by the user. To enable this, I load the webpage and, with JQuery, convert all those elements to <textarea/>
elements. Once they the content of the text area is changed, I apply the change to the original elements and send it to a Python script to store the new content.
The problem is that I'm finding that different browsers change the original HTML.
- How do you get around this issue?
- What Python libraries do you use?
- What techniques or application designs do you use to avoid or overcome this issue?
The problems I found are:
- IE removes the quotes around
class
andid
attributes. For example,<img class='abc'/>
becomes<img class=abc/>
. - Firefox removes the backslash from the line breaks:
<br \>
becomes<br>
. - Some websites have very specific display technicalities, so an insertion of a simple "\n"(which IE does) can affect the display of a website. Example: changing
<img class='headingpic' /><div id="maincontent">
to<img class='headingpic'/>\n <div id="maincontent">
inserts a vertical gap in IE.
The things I have unsuccessfully tried to overcome these issues:
- Using either JQuery or Python to remove all
>\n<
occurences,<br>
etc. But this fails because I get different patterns in IE, sometimes a·\n
, sometimes a\n···
. - In a Python, parse the new HTML, extract the new text/content, insert it into the old HTML so the elements and format never change, just the content. This is very difficult and seems to be overkill.
© Programmers or respective owner