In-document schema declarations and lxml

Posted by shylent on Stack Overflow See other posts from Stack Overflow or by shylent
Published on 2010-06-05T09:35:52Z Indexed on 2010/06/05 9:42 UTC
Read the original article Hit count: 461

Filed under:
|
|
|

As per the official documentation of lxml, if one wants to validate a xml document against a xml schema document, one has to

  1. construct the XMLSchema object (basically, parse the schema document)
  2. construct the XMLParser, passing the XMLSchema object as its schema argument
  3. parse the actual xml document (instance document) using the constructed parser

There can be variations, but the essense is pretty much the same no matter how you do it, - the schema is specified 'externally' (as opposed to specifying it inside the actual xml document).

If you follow this procedure, the validation occurs, sure enough, but if I understand it correctly, that completely ignores the whole idea of the schemaLocation and noNamespaceSchemaLocation attributes from xsi.

This introduces a whole bunch of limitations, starting with the fact, that you have to deal with instance<->schema relation all by yourself (either store it externally or write some hack to retrieve the schema location from the root element of the instance document), you can not validate the document using multiple schemata (say, when each schema governs its own namespace) and so on.

So the question is: maybe I am missing something completely trivial or doing it wrong? Or are my statements about lxml's limitations regarding schema validation true?

To recap, I'd like to be able to:

  • have the parser use the schema location declarations in the instance document at parse/validation time
  • use multiple schemata to validate a xml document
  • declare schema locations on non-root elements (not of extreme importance)

Maybe I should look for a different library? Although, that'd be a real shame, - lxml is a de-facto xml processing library for python and is regarded by everyone as the best one in terms of performace/features/convenience (and rightfully so, to a certain extent)

© Stack Overflow or respective owner

Related posts about python

Related posts about Xml