XML Redirecting URLs Using Resolvers
This article describes how you can set up your XML parser so that validation can occur while offline.
The problem of remote DTDs
Section titled “The problem of remote DTDs”A problem we first had was when we were disconnected from the internet was that our XML parser (Saxon) would try and connect to the internet to validate the DTD file.
The DocBook DTD has the DOCTYPE tag of:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.docbook.org/xml/4.2/docbookx.dtd">As you can see, there is a URL, http://www.docbook.org/xml/4.2/docbookx.dtd, which is requested every time
we validate the document. If we weren’t connected, the parser would then complain that it couldn’t reach the internet.
Error reported by XML parser: Cannot read from http://www.docbook.org/xml/4.2/docbookx.dtd (www.docbook.org)Transformation failed: Run-time errors were reportedThe public identifier in this case is -//OASIS//DTD DocBook XML V4.2//EN
Download the DTDs locally
Section titled “Download the DTDs locally”To rectify this, download and copy the DTDs locally onto your machine.
A good directory to choose might be somewhere in /usr/share/ if you are on UNIX.
On my Windows box, I had Cygwin installed, so I added the DocBook XML 4.2 distribution package to it.
However, don’t go ahead and change the URL inside your XML file just yet - what happens if you send this file to another person, or try to open the same document on another machine? Distributing this kind of XML document would lead to problems.
Download resolver.jar
Section titled “Download resolver.jar”Visit the Apache xml-commons site and download the xml-commons-resolver-1.1 release.
Unpack this, and add the file resolver.jar to your CLASSPATH.
Look at the documentation inside xml-commons-resolver-1.1/docs/ for further information.
Creating your catalog file
Section titled “Creating your catalog file”Create your catalog file similar to what we have below, but replace the URI with where you downloaded your
docbookx.dtd locally. I happened to create my file as D:\cygwin\usr\share\catalog\docbook.catalog
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <public publicId="-//OASIS//DTD DocBook XML V4.2//EN" uri="file:///D:/cygwin/usr/share/docbook-xml42/docbookx.dtd"/></catalog>This will tell the resolver that when it sees the public identifier
-//OASIS//DTD DocBook XML V4.2//EN
inside the XML document, it should lookup this catalog, and go to the
URI specified instead. In this case it will open the file
D:\cygwin\usr\share\docbook-xml42\docbookx.dtd.
You can also specify the uri as relative.
Check your catalog file
Section titled “Check your catalog file”Use the resolver program that comes with the Apache distribution.
Make sure that the catalog will work by using the following:
java org.apache.xml.resolver.apps.resolver -d 2 \ -c 'D:\cygwin\usr\share\catalog\docbook.catalog' \ -p "-//OASIS//DTD DocBook XML V4.2//EN" publicCannot find CatalogManager.propertiesLoading catalog: ./xcatalogLoading catalog: D:\cygwin\usr\share\catalog\docbook.catalogResolve PUBLIC (publicid, systemid): public id: -//OASIS//DTD DocBook XML V4.2//ENResult: file:/D:/cygwin/usr/share/docbook-xml42/docbookx.dtdNote that it says that is couldn’t find CatalogManager.properties, so we will need to rectify that.
Creating your catalog manager properties file
Section titled “Creating your catalog manager properties file”In order for the resolver to work, you must create a CatalogManager.properties file.
This file must be placed somewhere in your CLASSPATH.
Add the property catalogs that specifies the location of the catalog file.
If you have more than one you can separate the with semicolons.
The contents of my one line CatalogManager.properties file is below
catalogs=D:\\cygwin\\usr\\share\\catalog\\docbook.catalogTesting the whole setup
Section titled “Testing the whole setup”When I tested this setup with saxon, I received an error:
$ java org.apache.xml.resolver.apps.xparse -d 2 magicmonster.xmlLoading catalog: D:\cygwin\usr\share\catalog\docbook.catalogjavax.xml.parsers.ParserConfigurationException: AElfred parser is non-validatingA validating parser is needed to test this. I removed saxon from my CLASSPATH, then installed xerces 2.6.2.
I needed to include xercesImpl.jar and xmlParserAPIs.jar into my CLASSPATH. I am using Java 1.2
Another alternative to xerces is to use Sun’s java 1.4. This is distributed with the Crimson parser.
java org.apache.xml.resolver.apps.xparse -d 2 magicmonster.xmlLoading catalog: D:\cygwin\usr\share\catalog\docbook.catalogAttempting validating, namespace-aware parseResolved public: -//OASIS//DTD DocBook XML V4.2//EN file:/D:/cygwin/usr/share/docbook-xml42/docbookx.dtdParse succeeded (1.382) with no errors and no warnings.The above worked out well. It has successfully resolved the public identifier to the local file, and has successfully validated my xml document.
Incorporating all this into Saxon
Section titled “Incorporating all this into Saxon”I still needed xerces parser included in my CLASSPATH.
Make sure saxon.jar is included after both of the Xerces jar files, or specifiy which Parser to use by using the
following option:
-Djavax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImplIf you are using the Crimson parser that comes with Sun’s Java 1.4, use the following option:
-Djavax.xml.parsers.SAXParserFactory=org.apache.crimson.jaxp.SAXParserFactoryImplAfter this, you can add the following options to the Saxon stylesheet command line:
-x org.apache.xml.resolver.tools.ResolvingXMLReader-y org.apache.xml.resolver.tools.ResolvingXMLReader-r org.apache.xml.resolver.tools.CatalogResolverJAXP URI Resolver
Section titled “JAXP URI Resolver”JAXP also provides an interface you can use when use when programmatically transforming documents using XSL.
See javax.xml.transform.URIResolver.resolve
/** * Called by the processor when it encounters * an xsl:include, xsl:import, or document() function. * * @param href An href attribute, which may be relative or absolute. * @param base The base URI against which the first argument will be made * absolute if the absolute URI is required. * * @return A Source object, or null if the href cannot be resolved, * and the processor should try to resolve the URI itself. * * @throws TransformerException if an error occurs when trying to * resolve the URI. */public Source resolve(String href, String base) throws TransformerException;We have used this to write some classpath URI resolvers
Bibliography and Related Links
Section titled “Bibliography and Related Links”XML Entity and URI Resolvers by Norman Walsh