Thursday, March 19, 2015

Validating XML Against XSD(s) in Java

There are numerous tools available for validating an XML document against an XSD. These include operating system scripts and tools such as xmllint, XML editors and IDEs, and even online validators. I have found it useful to have my own easy-to-use XML validation tool because of limitations or issues of the previously mentioned approaches. Java makes it easy to write such a tool and this post demonstrates how easy it is to develop a simple XML validation tool in Java.

The Java tool developed in this post requires JDK 8. However, the simple Java application can be modified fairly easily to work with JDK 7 or even with a version of Java as old as JDK 5. In most cases, I have tried to comment the code that requires JDK 7 or JDK 8 to identify these dependencies and provide alternative approaches in earlier versions of Java. I have done this so that the tool can be adapted to work even in environments with older versions of Java.

The complete code listing for the Java-based XML validation tool discussed in this post is included at the end of the post. The most significant lines of code from that application when discussing validation of XML against one or more XSDs is shown next.

Essence of Validating XML Against XSD with Java
final Schema schema = schemaFactory.newSchema(xsdSources);
final Validator validator = schema.newValidator();
validator.validate(new StreamSource(new File(xmlFilePathAndName)));

The previous code listing shows the straightforward approach available in the standard JDK for validating XML against XSDs. An instance of javax.xml.validation.Schema is instantiated with a call to javax.xml.validation.SchemaFactory.newSchema(Source[]) (where the array of javax.xml.transform.Source objects represents one or more XSDs). An instance of javax.xml.validation.Validator is obtained from the Schema instance via Schema's newValidator() method. The XML to be validated can be passed to that Validator's validate(Source) method to perform the validation of the XML against the XSD or XSDs originally provided to the Schema object created with SchemaFactory.newSchema(Source[]).

The next code listing includes the code just highlighted but represents the entire method in which that code resides.

validateXmlAgainstXsds(String, String[])
/**
 * Validate provided XML against the provided XSD schema files.
 *
 * @param xmlFilePathAndName Path/name of XML file to be validated;
 *    should not be null or empty.
 * @param xsdFilesPathsAndNames XSDs against which to validate the XML;
 *    should not be null or empty.
 */
public static void validateXmlAgainstXsds(
   final String xmlFilePathAndName, final String[] xsdFilesPathsAndNames)
{
   if (xmlFilePathAndName == null || xmlFilePathAndName.isEmpty())
   {
      out.println("ERROR: Path/name of XML to be validated cannot be null.");
      return;
   }
   if (xsdFilesPathsAndNames == null || xsdFilesPathsAndNames.length < 1)
   {
      out.println("ERROR: At least one XSD must be provided to validate XML against.");
      return;
   }
   final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

   final StreamSource[] xsdSources = generateStreamSourcesFromXsdPathsJdk8(xsdFilesPathsAndNames);

   try
   {
      final Schema schema = schemaFactory.newSchema(xsdSources);
      final Validator validator = schema.newValidator();
      out.println(  "Validating " + xmlFilePathAndName + " against XSDs "
                  + Arrays.toString(xsdFilesPathsAndNames) + "...");
      validator.validate(new StreamSource(new File(xmlFilePathAndName)));
   }
   catch (IOException | SAXException exception)  // JDK 7 multi-exception catch
   {
      out.println(
           "ERROR: Unable to validate " + xmlFilePathAndName
         + " against XSDs " + Arrays.toString(xsdFilesPathsAndNames)
         + " - " + exception);
   }
   out.println("Validation process completed.");
}

The code listing for the validateXmlAgainstXsds(String, String[]) method shows how a SchemaFactory instance can be obtained with the specified type of schema (XMLConstants.W3C_XML_SCHEMA_NS_URI). This method also handles the various types of exceptions that might be thrown during the validation process. As the comment in the code states, the JDK 7 language change supporting catching of multiple exceptions in a single catch clause is used in this method but could be replaced with separate catch clauses or catching of a single more general exception for code bases earlier than JDK 7.

The method just shown calls a method called generateStreamSourcesFromXsdPathsJdk8(String[]) and the next listing is of that invoked method.

generateStreamSourcesFromXsdPathsJdk8(String[])
/**
 * Generates array of StreamSource instances representing XSDs
 * associated with the file paths/names provided and use JDK 8
 * Stream API.
 *
 * This method can be commented out if using a version of
 * Java prior to JDK 8.
 *
 * @param xsdFilesPaths String representations of paths/names
 *    of XSD files.
 * @return StreamSource instances representing XSDs.
 */
private static StreamSource[] generateStreamSourcesFromXsdPathsJdk8(
   final String[] xsdFilesPaths)
{
   return Arrays.stream(xsdFilesPaths)
                .map(StreamSource::new)
                .collect(Collectors.toList())
                .toArray(new StreamSource[xsdFilesPaths.length]);
}

The method just shown uses JDK 8 stream support to convert the array of Strings representing paths/names of XSD files to instances of StreamSource based on the contents of the XSDs pointed to by the path/name Strings. In the class's complete code listing, there is also a deprecated method generateStreamSourcesFromXsdPathsJdk7(final String[]) that could be used instead of this method for code bases on a version of Java earlier than JDK 8.

This single-class Java application is most useful when it's executed from the command line. To enable this, a main function is defined as shown in the next code listing.

Executable main(String[]) Function
/**
 * Validates provided XML against provided XSD.
 *
 * @param arguments XML file to be validated (first argument) and
 *    XSD against which it should be validated (second and later
 *    arguments).
 */
public static void main(final String[] arguments)
{
   if (arguments.length < 2)
   {
      out.println("\nUSAGE: java XmlValidator <xmlFile> <xsdFile1> ... <xsdFileN>\n");
      out.println("\tOrder of XSDs can be significant (place XSDs that are");
      out.println("\tdependent on other XSDs after those they depend on)");
      System.exit(-1);
   }
   // Arrays.copyOfRange requires JDK 6; see
   // http://stackoverflow.com/questions/7970486/porting-arrays-copyofrange-from-java-6-to-java-5
   // for additional details for versions of Java prior to JDK 6.
   final String[] schemas = Arrays.copyOfRange(arguments, 1, arguments.length);
   validateXmlAgainstXsds(arguments[0], schemas);
}

The executable main(String[]) function prints a usage statement if fewer than two command line arguments are passed to it because it expects at least the name/path of the XML file to be validated and the name/path of an XSD to validate the XML against.

The main function takes the first command line argument and treats that as the XML file's path/name and then treats all remaining command lin arguments as the paths/names of one or more XSDs.

The simple Java tool for validating XML against one or more XSDs has now been shown (complete code listing is at bottom of post). With it in place, we can run it against an example XML file and associated XSDs. For this demonstration, I'm using a very simple manifestation of a Servlet 2.5 web.xml deployment descriptor.

Sample Valid Servlet 2.5 web.xml
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
         version="2.5"> 

    <display-name>Sample Java Servlet 2.5 Web Application</display-name>
</web-app>

The simple web.xml file just shown is valid per the Servlet 2.5 XSDs and the output of running this simple Java-based XSD validation tool prove that by not reporting any validation errors.

An XSD-valid XML file does not lead to very interesting results with this tool. The next code listing shows an intentionally invalid web.xml file that has a "title" element not specified in the associated Servlet 2.5 XSD. The output with the most significant portions of the error message highlighted is shown after the code listing.

Sample Invalid Servlet 2.5 web.xml (web-invalid.xml)
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
         version="2.5">

    <display-name>Java Servlet 2.5 Web Application</display-name>
    <title>A handy example</title>
</web-app>

As the last output shows, things are more interesting in terms of output when the provided XML is not XSD valid.

There is one important caveat I wish to emphasize here. The XSDs provided to this Java-based tool sometimes need to be specified in a particular order. In particular, XSDs with "include" dependencies on other XSDs should be listed on the command line AFTER the XSD they include. In other words, XSDs with no "include" dependencies will generally be provided on the command line before those XSDs that include them.

The next code listing is for the complete XmlValidator class.

XmlValidator.java (Complete Class Listing)
package dustin.examples.xmlvalidation;

import org.xml.sax.SAXException;

import javax.xml.XMLConstants;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

import static java.lang.System.out;

/**
 * Validate provided XML against the provided XSDs.
 */
public class XmlValidator
{
   /**
    * Validate provided XML against the provided XSD schema files.
    *
    * @param xmlFilePathAndName Path/name of XML file to be validated;
    *    should not be null or empty.
    * @param xsdFilesPathsAndNames XSDs against which to validate the XML;
    *    should not be null or empty.
    */
   public static void validateXmlAgainstXsds(
      final String xmlFilePathAndName, final String[] xsdFilesPathsAndNames)
   {
      if (xmlFilePathAndName == null || xmlFilePathAndName.isEmpty())
      {
         out.println("ERROR: Path/name of XML to be validated cannot be null.");
         return;
      }
      if (xsdFilesPathsAndNames == null || xsdFilesPathsAndNames.length < 1)
      {
         out.println("ERROR: At least one XSD must be provided to validate XML against.");
         return;
      }
      final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

      final StreamSource[] xsdSources = generateStreamSourcesFromXsdPathsJdk8(xsdFilesPathsAndNames);

      try
      {
         final Schema schema = schemaFactory.newSchema(xsdSources);
         final Validator validator = schema.newValidator();
         out.println("Validating " + xmlFilePathAndName + " against XSDs "
            + Arrays.toString(xsdFilesPathsAndNames) + "...");
         validator.validate(new StreamSource(new File(xmlFilePathAndName)));
      }
      catch (IOException | SAXException exception)  // JDK 7 multi-exception catch
      {
         out.println(
            "ERROR: Unable to validate " + xmlFilePathAndName
            + " against XSDs " + Arrays.toString(xsdFilesPathsAndNames)
            + " - " + exception);
      }
      out.println("Validation process completed.");
   }

   /**
    * Generates array of StreamSource instances representing XSDs
    * associated with the file paths/names provided and use JDK 8
    * Stream API.
    *
    * This method can be commented out if using a version of
    * Java prior to JDK 8.
    *
    * @param xsdFilesPaths String representations of paths/names
    *    of XSD files.
    * @return StreamSource instances representing XSDs.
    */
   private static StreamSource[] generateStreamSourcesFromXsdPathsJdk8(
      final String[] xsdFilesPaths)
   {
      return Arrays.stream(xsdFilesPaths)
                   .map(StreamSource::new)
                   .collect(Collectors.toList())
                   .toArray(new StreamSource[xsdFilesPaths.length]);
   }

   /**
    * Generates array of StreamSource instances representing XSDs
    * associated with the file paths/names provided and uses
    * pre-JDK 8 Java APIs.
    *
    * This method can be commented out (or better yet, removed
    * altogether) if using JDK 8 or later.
    *
    * @param xsdFilesPaths String representations of paths/names
    *    of XSD files.
    * @return StreamSource instances representing XSDs.
    * @deprecated Use generateStreamSourcesFromXsdPathsJdk8 instead
    *    when JDK 8 or later is available.
    */
   @Deprecated
   private static StreamSource[] generateStreamSourcesFromXsdPathsJdk7(
      final String[] xsdFilesPaths)
   {
      // Diamond operator used here requires JDK 7; add type of
      // StreamSource to generic specification of ArrayList for
      // JDK 5 or JDK 6
      final List<StreamSource> streamSources = new ArrayList<>();
      for (final String xsdPath : xsdFilesPaths)
      {
         streamSources.add(new StreamSource(xsdPath));
      }
      return streamSources.toArray(new StreamSource[xsdFilesPaths.length]);
   }

   /**
    * Validates provided XML against provided XSD.
    *
    * @param arguments XML file to be validated (first argument) and
    *    XSD against which it should be validated (second and later
    *    arguments).
    */
   public static void main(final String[] arguments)
   {
      if (arguments.length < 2)
      {
         out.println("\nUSAGE: java XmlValidator <xmlFile> <xsdFile1> ... <xsdFileN>\n");
         out.println("\tOrder of XSDs can be significant (place XSDs that are");
         out.println("\tdependent on other XSDs after those they depend on)");
         System.exit(-1);
      }
      // Arrays.copyOfRange requires JDK 6; see
      // http://stackoverflow.com/questions/7970486/porting-arrays-copyofrange-from-java-6-to-java-5
      // for additional details for versions of Java prior to JDK 6.
      final String[] schemas = Arrays.copyOfRange(arguments, 1, arguments.length);
      validateXmlAgainstXsds(arguments[0], schemas);
   }
}

Despite what the length of this post might initially suggest, using Java to validate XML against an XSD is fairly straightforward. The sample application shown and explained here attempts to demonstrate that and is a useful tool for simple command line validation of XML documents against specified XSDs. One could easily port this to Groovy to be even more script-friendly. As mentioned earlier, this simple tool requires JDK 8 as currently written but could be easily adapted to work on JDK 5, JDK 6, or JDK 7.

UPDATE (20 March 2015): I have pushed the Java class shown in this post (XmlValidator.java) onto the GitHub repository dustinmarx/xmlutilities.

No comments: