Automatic transformation of XML namespaces/RDF resource format

RDF resource format
This chapter describes the RDF resource format, which I call asset.

The RDF file is valid when both it conforms to the grammar forest with  and   roots.

rdfs:seeAlso predicates
When reading an RDF file, it should process triples of the forms: :transform rdfs:seeAlso (IRI1 IRI2 ...) . :validate rdfs:seeAlso (IRI1 IRI2 ...). This should add the IRIs to the list of RDF files to be downloaded (in the order of recursive retrieval described elsewhere in this specification).

Obviously  is for transformation and   is for validation.

WARNING: Recursive downloading (and thus ) for validation may be removed in a future version.

TODO: Various form of seeAlso for both processing after or before current asset.

Scripts
A script is something which accepts an input (some XML text, in this specification) and generates an output (a text and/or a program exit status). (A script may be a Unix command, Web service, etc.)

A script is represented as an RDF node with certain properties.

This specification provides the following classes of scripts:


 * command line
 * script in a specified programming language
 * A Web service

Validator kind (see below) is either entire document or by parts.

A script should not have both :transformerKind and :validatorKind (see below).

It is up to implementation what to do if a single node has several types (such as both  and  ). Rationale: ease of programming and efficiency.

Script in a specified programming language
Script for a named programming language (see below): Either  or   (but not both) should be provided. Not every programming language may support both. can be present only if there is.
 * :Command
 * {1..1} :language (IRI) (programming language)
 * {0..1} :minVersion (minimum version)
 * {0..1} :maxVersion (maximum version)
 * {0..1} :scriptURL (URL of the script)
 * {0..1} :commandString (command string)
 * {0..1} :params (as in the example below)
 * {0..1} :okResult (result denoting OK)
 * {0..1} :preservance (preservance, float 0..1, 1.0 by default)
 * {0..1} :stability (stability, float 0..1, 1.0 by default)
 * {0..1} :preference (preference, float 0..1, 1.0 by default)
 * {0..1} :transformerKind (transformer kind)
 * {0..1} :validatorKind (validator kind)

Validation is considered passed if both the exit status of the command is success and the output is equal to  (if there is   predicate). Remark: There is  in Turtle, don't forget to use it at the end of the output when needed. (TODO: What about different newline indicators in different OSes?)

A Web service

 * :WebService
 * {1..1} :action (IRI) (request IRI)
 * {1..1} :method (HTTP method)
 * {1..1} :xmlField [xsd:string] (field for XML)
 * {0..1} :okResult (result denoting OK)
 * {0..1} :transformerKind (transformer kind)
 * {0..1} :validatorKind (validator kind)
 * {0..1} :preservance (preservance)
 * {0..1} :stability (stability)
 * {0..1} :preference (preference)

Validity constraint:  must be present only for validators. must be present only for transformers. must be present for validators. must be present for transformers.

RDF describing a namespace
Namespaces are described as instances of :Namespace class.

Their format tree:


 * :Namespace
 * {0..*} :validator (validator)
 * _ (script node)

Example: (TODO: Explain that :attribute node gets an attribute from source XML. Also say that it does not work for :entire and simple sequential transformations.)

 a :Namespace ; dc:description  ; # Other Dublin Core metadata. :link [ :url  ; :role  ; :nature  ; :purpose  ] ;   :validator [ a :Command ; :language lang:Python ; :minVersion "2.1" ; :maxVersion "3.2" ; :scriptURL  ; :params ([ :name "name1"; :value "value1" ] [ :name "name2"; :value "value2" ]               [ :name "lexer" ;                 :value: [ :attribute [ :NS http://portonvictor.org/ns/comment ; :name "format" ] ]               ]) ; :OkResult "OK" ; :preservance 0.9 ; :stability 0.9 ; :preference 0.9 ; :validatorKind :entire ] ;

A  is specified in the same way as   (see below), except that   parameter is ignored. The validator may have  to specify what output of the validator signifies a valid document. In absence of  for a named script and   valid document is signified by successful command return value (0 on Unix) and for   the value of   defaults to empty string.

In  the property :language may also refer to a namespace URL of some XML scheme (such as http://www.w3.org/2001/XMLSchema). In this case  is ignored.

A human readable description of a namespace should be specified with Dublin Core parameters.

The  nodes are like resources in RDDL (but with our namespace instead of RDDL namespace).

A namespace description may provide  parameter to specify how to validate the documents whose root element is of our namespace. The :validate parameter has a subparameter  which should be understood accordingly RDDL specification.

There may be multiple  parameters in order to allow to use schema of different natures.


 * link parameter with subparameters  and   is backward compatible with RDDL and should be understood in accordance with the RDDL specification.

Also a namespace may be a member of the following classes:,  ,. See grouping examples.

RDF describing a transformer
Note: Transformers should be run in a secure sandbox, so that they would be unable to damage or read user's files. Also the time of the entire operation should be limited. (Rationale: If we are going to limit particular parts of the entire process rather it as a whole, then we would be unable to limit parts of operations done by sandboxed application, and the entire stuff would make no sense.) We may also limit the total amount of data transferred through the network, if the operating system supports it. (We can't limit a specific operation inside the sandbox.)

Implementation note: Such sandboxing can be implemented for example with SELinux for Linux. It is tempting to use Java security manager, but as of start of 2014 year, Java security is too buggy and therefore should not be used.

Their formal tree:


 * :Transformer
 * {1..*} :sourceNamespace (source namespaces) (not every transformer is associated with a namespace)
 * {0..*} :sourceNamespace (source namespaces)
 * {0..*} :targetNamespace (target namespaces)
 * {0..1} :universal [xsd:boolean, default false] (ignore target) [TODO: better name]
 * {0..1} :inward [xsd:boolean, default true] (process XML from outward to inward or inward to outward)
 * {0..1} :precedence (precedence)
 * {0..*} :script (script)
 * _ (script node)

If there is  option, then the target namespace is ignored for the purpose of figuring out the next transformation script. (In this case it is also recommended to skip  option and give a warning if it is present?)

Rationale: Consider converting XInclude to some other "inclusion" framework. Which of the transformations apply can be decided by the order of loading RDF files. This is the simplest way. TODO: Another option: "black list" some transformers and/or scripts.

Here is an example of an XSLT transformer:

<...>   a :Transformer ; dc:description  ; # Other Dublin Core metadata. :sourceNamespace <...> ; :targetNamespace <...> ; :precedence <...> ; :script [ a :Command ; :language lang:XSLT ; :minVersion "2.0" ; :scriptURL  ; :transformerKind :entire ; :argument [ :name "debug" ; :value false ] ;     :argument [ :name "other" ; :value 123 ] ;     #:initial-context-node ... ; # See XSLT 2.0 spec. :initial-template "first" ; :initial-mode: "first" ; :preservance 0.9 ; :stability 0.9 ; :preference 0.9 ].

Both  and   parameters are not required.

It is recommended but not required that objects of predicates  and   are of   class.

A transformer may have no target NS. Example: XInclude. In this case every NS in consideration can act as the target.

We need to define precedences for different kinds of transformers, for example we would probably have the precedence “include” for XInclude and other cross-document facilities, “macro” for macroses, or precedence “formatting” for a transformer generating XSL formatting objects or SVG.

Common arguments
All transformers are subclasses of the class. All transformers accept the following parameters:


 * may be,  ,  ,  ,  . It is used accordingly the section “Order kinds of of document transformers”.
 * ,  and   specify a number 0..1.0.   describes how much of the XML meaning is preserved (that is not lost during conversion). :stability describes how reliable is the transformer (that is whether it is likely to crash or produce meaningless results), :preference is to denote other factors for calculating priority (see below).

Priority of a chain of transformations is calculated using preservance, stability, and preference of the links of the chain. The recommended algorithm is to multiply all preservances, stabilities, and precedences of all links and then sum them.

All validators are subclasses of the class :validator. All valdators accept the following parameters:


 * may be :entire or :parts. It is used as described in the Validation chapter.

Particular types of transformers
A language transformer (as below) has either  or   predicate (but not both).

XSLT, Java, Python, Ruby, et al
a :Command ; :language lang:Python ; :minVersion "2.1" ; :maxVersion "3.2" ; :scriptURL  ]

This example means that the script http://example.org/script.py is run by Python interpreter of at least 2.1 up to 3.2 version.

Max version may be of the form  to denote all subversions of X. TODO: Describe the grammar and comparison order of versions. http://www.dmitry-kazakov.de/ada/strings_edit.htm#11 and https://groups.google.com/forum/#!topic/comp.lang.ada/GRM4ZDi4H6M


 * named script
 * {0..1} :minVersion "2.1" (xsd:string) (minimum version)
 * {0..1} :maxVersion "3.2" (xsd:string) (maximum version)
 * {1..1} :scriptURL (script URL)
 * {0..1} :arguments (only for XSLT) (script arguments)
 * {0..1} :initialTemplate (only for XSLT) (the initial template for XSLT)
 * {0..1} :initialMode (only for XSLT) (specifies the initial mode for XSLT)

Recommendation: If several suitable versions of the interpreter are available, use the maximal allowed version.

The following languages should be available:


 * XSLT
 * Python
 * Java
 * Ruby
 * Perl
 * TODO

Web service.
a :WebService ; :form  ; :method "post" ; # or "get" :xmlField "text".

This sends POST request to http://example.org/form which should return an XML document.

Describing precedences
is an RDF-S class, whose members are RDF-S classes.

It is required that precedences are members of  class.

and  for precedences work only if both the left and the right side are declared as precedences  in the same RDF file.

 a :Precedence rdfs:subClassOf <...> ; :higherThan <...> ; :higherThan <...> ; :lowerThan <...>.

The predicate  can apply to precedences.

The "subclass'" relation is the smallest partial order for given  relations of all loaded assets. It is an error if there is no such partial order (that if there are cycles).

The following rules (see also https://math.stackexchange.com/q/2593701/4876) are used to deduce which entities have “higher than” precedence relative an other entity:


 * Every precedence is higher than itself.
 * If  parameter is specified inside a   description then the described entity is of higher precedence than the referred to entity.
 * If a class A has higher precedence than an entity B and the entity B has higher precedence than an class C, then the class A has higher precedence than the class C.
 * If A has strictly higher precedence than B then the same holds for every their respective subclasses A1 and B1.

The entities are related by “higher than” relation if and only if this relation can be deduced from the above rules (for all currently loaded RDF resources). In other words, higher than is the smallest partial order conforming to the above.

If a circle of precedences is encountered this is a fatal error.

A precedence is singleton when either it is declared to be a member of  class as in the following example or it is a direct or indirect subclass of a signleton:  a :Singleton. https://math.stackexchange.com/a/2606958/4876 about calculating the "higher than" relation.

Implementation notes