In-document schema declarations and lxml
- by shylent
As per the official documentation of lxml, if one wants to validate a xml document against a xml schema document, one has to
construct the XMLSchema object (basically, parse the schema document)
construct the XMLParser, passing the XMLSchema object as its schema argument
parse the actual xml document (instance document) using the constructed parser
There can be variations, but the essense is pretty much the same no matter how you do it, - the schema is specified 'externally' (as opposed to specifying it inside the actual xml document).
If you follow this procedure, the validation occurs, sure enough, but if I understand it correctly, that completely ignores the whole idea of the schemaLocation and noNamespaceSchemaLocation attributes from xsi.
This introduces a whole bunch of limitations, starting with the fact, that you have to deal with instance<-schema relation all by yourself (either store it externally or write some hack to retrieve the schema location from the root element of the instance document), you can not validate the document using multiple schemata (say, when each schema governs its own namespace) and so on.
So the question is: maybe I am missing something completely trivial or doing it wrong? Or are my statements about lxml's limitations regarding schema validation true?
To recap, I'd like to be able to:
have the parser use the schema location declarations in the instance document at parse/validation time
use multiple schemata to validate a xml document
declare schema locations on non-root elements (not of extreme importance)
Maybe I should look for a different library? Although, that'd be a real shame, - lxml is a de-facto xml processing library for python and is regarded by everyone as the best one in terms of performace/features/convenience (and rightfully so, to a certain extent)