Over Christmas I got to play a bit with the W3C RIF PRD and came across a few issues which I thought I would record for posterity. Specifically, I was working on a grammar for the presentation syntax using a GLR grammar parser tool (I was using the current CTP of ‘M’ (MGrammer) and Intellipad – I do so hope the MS guys don’t kill off M and Intellipad now they have dropped the other parts of SQL Server Modelling). I realise that the presentation syntax is non-normative and that any issues with it do not therefore compromise the standard. However, presentation syntax is useful in its own right, and it would be great to iron out any issues in a future revision of the standard.
The main issues are actually not to do with the grammar at all, but rather with the ‘running example’ in the RIF PRD recommendation. I started with the code provided in Example 9.1. There are several discrepancies when compared with the EBNF rules documented in the standard. Broadly the problems can be categorised as follows:
· Parenthesis mismatch – the wrong number of parentheses are used in various places. For example, in GoldRule, the RHS of the rule (the ‘Then’) is nested in the LHS (‘the If’). In NewCustomerAndWidgetRule, the RHS is orphaned from the LHS. Together with additional incorrect parenthesis, this leads to orphanage of UnknownStatusRule from the entire Document.
· Invalid use of parenthesis in ‘Forall’ constructs. Parenthesis should not be used to enclose formulae.
Removal of the invalid parenthesis gave me a feeling of inconsistency when comparing formulae in Forall to formulae in If. The use of parenthesis is not actually inconsistent in these two context, but in an If construct it ‘feels’ as if you are enclosing formulae in parenthesis in a LISP-like fashion. In reality, the parenthesis is simply being used to group subordinate syntax elements. The fact that an If construct can contain only a single formula as an immediate child adds to this feeling of inconsistency.
· Invalid representation of compact URIs (CURIEs) in the context of Frame productions. In several places the URIs are not qualified with a namespace prefix (‘ex1:’). This conflicts with the definition of CURIEs in the RIF Datatypes and Built-Ins 1.0 document. Here are the productions:
CURIE ::= PNAME_LN
| PNAME_NS
PNAME_LN ::= PNAME_NS PN_LOCAL
PNAME_NS ::= PN_PREFIX? ':'
PN_LOCAL ::= ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* PN_CHARS)?
PN_CHARS ::= PN_CHARS_U
| '-' | [0-9] | #x00B7
| [#x0300-#x036F] | [#x203F-#x2040]
PN_CHARS_U ::= PN_CHARS_BASE
| '_'
PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6]
| [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF]
| [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
| [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
| [#x10000-#xEFFFF]
PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)?
The more I look at CURIEs, the more my head hurts! The RIF specification allows prefixes and colons without local names, which surprised me. However, the CURIE Syntax 1.0 working group note specifically states that this form is supported…and then promptly provides a syntactic definition that seems to preclude it! However, on (much) deeper inspection, it appears that ‘ex1:’ (for example) is allowed, but would really represent a ‘fragment’ of the ‘reference’, rather than a prefix! Ouch! This is so completely ambiguous that it surely calls into question the whole CURIE specification. In any case, RIF does not allow local names without a prefix.
· Missing ‘External’ specifiers for built-in functions and predicates.
The EBNF specification enforces this for terms within frames, but does not appear to enforce (what I believe is) the correct use of External on built-in predicates. In any case, the running example only specifies ‘External’ once on the predicate in UnknownStatusRule. External() is required in several other places.
· The List used on the LHS of UnknownStatusRule is comma-delimited. This is not supported by the EBNF definition. Similarly, the argument list of pred:list-contains is illegally comma-delimited.
· Unnecessary use of conjunction around a single formula in DiscountRule. This is strictly legal in the EBNF, but redundant.
All the above issues concern the presentation syntax used in the running example. There are a few minor issues with the grammar itself. Note that Michael Kiefer stated in his paper “Rule Interchange Format: The Framework” that:
“The presentation syntax of RIF … is an abstract syntax and, as such, it omits certain details that might be important for unambiguous parsing.”
· The grammar cannot differentiate unambiguously between strategies and priorities on groups. A processor is forced to resolve this by detecting the use of IRIs and integers.
This could easily be fixed in the grammar.
· The grammar cannot unambiguously parse the ‘->’ operator in frames. Specifically, ‘-’ characters are allowed in PN_LOCAL names and hence a parser cannot determine if ‘status->’ is (‘status’ ‘->’) or (‘status-’ ‘>’). One way to fix this is to amend the PN_LOCAL production as follows:
PN_LOCAL ::= ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* ((PN_CHARS)-('-')))?
However, unilaterally changing the definition of this production, which is defined in the SPARQL Query Language for RDF specification, makes me uncomfortable.
· I assume that the presentation syntax is case-sensitive. I couldn’t find this stated anywhere in the documentation, but function/predicate names do appear to be documented as being case-sensitive.
· The EBNF does not specify whitespace handling. A couple of productions (RULE and ACTION_BLOCK) are crafted to enforce the use of whitespace. This is not necessary. It seems inconsistent with the rest of the specification and can cause parsing issues. In addition, the Const production exhibits whitespaces issues. The intention may have been to disallow the use of whitespace around ‘^^’, but any direct implementation of the EBNF will probably allow whitespace between ‘^^’ and the SYMSPACE.
Of course, I am being a little nit-picking about all this. On the whole, the EBNF translated very smoothly and directly to ‘M’ (MGrammar) and proved to be fairly complete. I have encountered far worse issues when translating other EBNF specifications into usable grammars. I can’t imagine there would be any difficulty in implementing the same grammar in Antlr, COCO/R, gppg, XText, Bison, etc.
A general observation, which repeats a point made above, is that the use of parenthesis in the presentation syntax can feel inconsistent and un-intuitive. It isn’t actually inconsistent, but I think the presentation syntax could be improved by adopting braces, rather than parenthesis, to delimit subordinate syntax elements in a similar way to so many programming languages. The familiarity of braces would communicate the structure of the syntax more clearly to people like me.
If braces were adopted, parentheses could be retained around ‘var (frame | ‘new()’) constructs in action blocks. This use of parenthesis feels very LISP-like, and I think that this is my issue. It’s as if the presentation syntax represents the deformed love-child of LISP and C. In some places (specifically, action blocks), parenthesis is used in a LISP-like fashion. In other places it is used like braces in C. I find this quite confusing.
Here is a corrected version of the running example (Example 9.1) in compliant presentation syntax:
Document(
Prefix( ex1 <http://example.com/2009/prd2> )
(* ex1:CheckoutRuleset *)
Group rif:forwardChaining (
(* ex1:GoldRule *)
Group 10 (
Forall ?customer such that And(?customer # ex1:Customer
?customer[ex1:status->"Silver"])
(Forall ?shoppingCart such that ?customer[ex1:shoppingCart->?shoppingCart]
(If Exists ?value (And(?shoppingCart[ex1:value->?value]
External(pred:numeric-greater-than-or-equal(?value 2000))))
Then Do(Modify(?customer[ex1:status->"Gold"])))))
(* ex1:DiscountRule *)
Group (
Forall ?customer such that ?customer # ex1:Customer
(If Or( ?customer[ex1:status->"Silver"]
?customer[ex1:status->"Gold"])
Then Do ((?s ?customer[ex1:shoppingCart-> ?s])
(?v ?s[ex1:value->?v])
Modify(?s [ex1:value->External(func:numeric-multiply (?v 0.95))]))))
(* ex1:NewCustomerAndWidgetRule *)
Group (
Forall ?customer such that And(?customer # ex1:Customer
?customer[ex1:status->"New"] )
(If Exists ?shoppingCart ?item
(And(?customer[ex1:shoppingCart->?shoppingCart]
?shoppingCart[ex1:containsItem->?item]
?item # ex1:Widget ) )
Then Do( (?s ?customer[ex1:shoppingCart->?s])
(?val ?s[ex1:value->?val])
(?voucher ?customer[ex1:voucher->?voucher])
Retract(?customer[ex1:voucher->?voucher])
Retract(?voucher)
Modify(?s[ex1:value->External(func:numeric-multiply(?val 0.90))]))))
(* ex1:UnknownStatusRule *)
Group (
Forall ?customer such that ?customer # ex1:Customer
(If Not(Exists ?status
(And(?customer[ex1:status->?status]
External(pred:list-contains(List("New" "Bronze" "Silver" "Gold") ?status)) )))
Then Do( Execute(act:print(External(func:concat("New customer: " ?customer))))
Assert(?customer[ex1:status->"New"]))))
)
)
I hope that helps someone out there :-)