Quantcast
Channel: Eric White's Blog
Viewing all articles
Browse latest Browse all 35

Validate Open XML Documents using the Open XML SDK 2.0

$
0
0

[Blog Map]  This blog is inactive.  New blog: EricWhite.com/blog

Open XML developers create new documents in a variety of ways – either through transforming from an existing document to a new one, or by programmatically altering an existing document and saving it back to disk.  It is valuable to use the Open XML SDK 2.0 to determine if the new or altered document, spreadsheet, or presentation contains invalid markup.

This was particularly useful when I was writing the code to accept tracked revisions, and the Open XML WordprocessingML markup simplifier.  I wrote a small program to iterate through all documents in a directory tree and programmatically alter or transform each document, and then validate.  This allowed me to run the code on thousands of documents, making sure that the code would not create invalid documents.

The use of the validator is simple:

  • Open your document/spreadsheet/presentation as usual using the Open XML SDK.
  • Instantiate an OpenXmlValidator object (from the DocumentFormat.OpenXml.Validation namespace).
  • Call the OpenXmlValidator.Validate method, passing the open document.  This method returns a collection of ValidationErrorInfo objects.  If the collection is empty, then the document is valid.  You can validate before and after modifying the document.

Here is the simplest code to validate a document.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Validation;
using DocumentFormat.OpenXml.Wordprocessing;

classProgram
{
    staticvoid Main(string[] args)
    {
        using (WordprocessingDocument wordDoc =
            WordprocessingDocument.Open("Test.docx", false))
        {
            OpenXmlValidator validator = newOpenXmlValidator();
            var errors = validator.Validate(wordDoc);
            if (errors.Count() == 0)
                Console.WriteLine("Document is valid");
            else
                Console.WriteLine("Document is not valid");
        }
    }
}

While debugging your code, it is helpful to know exactly where each error is.  You can iterate through the errors, printing:

  • The content type for the part that contains the error.
  • An XPath expression that identifies the element that caused the error.
  • An error message.

Here is code to do that:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Validation;
using DocumentFormat.OpenXml.Wordprocessing;

classProgram
{
    staticvoid Main(string[] args)
    {
        using (WordprocessingDocument wordDoc =
            WordprocessingDocument.Open("Test.docx", false))
        {
            OpenXmlValidator validator = newOpenXmlValidator();
            var errors = validator.Validate(wordDoc);
            if (errors.Count() == 0)
                Console.WriteLine("Document is valid");
            else
                Console.WriteLine("Document is not valid");
            Console.WriteLine();
            foreach (var error in errors)
            {
                Console.WriteLine("Error description: {0}", error.Description);
                Console.WriteLine("Content type of part with error: {0}",
                    error.Part.ContentType);
                Console.WriteLine("Location of error: {0}", error.Path.XPath);
            }
        }
    }
}

As a developer, you will want to open a document, modify it in some fashion, and then validate that your modifications were correct.  The following example opens a document for writing, modifies it to make it invalid, and then validates.  To make an invalid document, it adds a text element (w:t) as a child element of a paragraph (w:p) instead of a run (w:r).

This approach to document validation works if you are using the Open XML SDK strongly-typed object model.  It also works if you are using another XML programming technology, such as LINQ to XML.  The following example shows the document modification code written using two approaches.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Validation;
using DocumentFormat.OpenXml.Wordprocessing;

publicstaticclassMyExtensions
{
    publicstaticXDocument GetXDocument(thisOpenXmlPart part)
    {
        XDocument partXDocument = part.Annotation<XDocument>();
        if (partXDocument != null)
            return partXDocument;
        using (Stream partStream = part.GetStream())
        using (XmlReader partXmlReader = XmlReader.Create(partStream))
            partXDocument = XDocument.Load(partXmlReader);
        part.AddAnnotation(partXDocument);
        return partXDocument;
    }

    publicstaticvoid PutXDocument(thisOpenXmlPart part)
    {
        XDocument partXDocument = part.GetXDocument();
        if (partXDocument != null)
        {
            using (Stream partStream = part.GetStream(FileMode.Create, FileAccess.Write))
            using (XmlWriter partXmlWriter = XmlWriter.Create(partStream))
                partXDocument.Save(partXmlWriter);
        }
    }
}

classProgram
{
    staticvoid Main(string[] args)
    {
        using (WordprocessingDocument wordDoc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            // Open XML SDK strongly-typed object model code that modifies a document,
            // making it invalid.
            wordDoc.MainDocumentPart.Document.Body.InsertAt(
                newParagraph(
                    newText("Test")), 0);

            // LINQ to XML code that modifies a document, making it invalid.
            XDocument d = wordDoc.MainDocumentPart.GetXDocument();
            XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
            d.Descendants(w + "body").First().AddFirst(
                newXElement(w + "p",
                    newXElement(w + "t", "Test")));
            wordDoc.MainDocumentPart.PutXDocument();

            OpenXmlValidator validator = newOpenXmlValidator();
            var errors = validator.Validate(wordDoc);
            if (errors.Count() == 0)
                Console.WriteLine("Document is valid");
            else
                Console.WriteLine("Document is not valid");
            Console.WriteLine();
            foreach (var error in errors)
            {
                Console.WriteLine("Error description: {0}", error.Description);
                Console.WriteLine("Content type of part with error: {0}",
                    error.Part.ContentType);
                Console.WriteLine("Location of error: {0}", error.Path.XPath);
            }
        }
    }
}

When you run this example, it produces the following output:


Document is not valid

Error description: The element has invalid child element
  'http://schemas.openxmlformats.org/wordprocessingml/2006/main:t'.
  List of possible elements expected:
    <http://schemas.openxmlformats.org/wordprocessingml/2006/main:pPr>.
Content type of part with error:
  application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml
Location of error: /w:document[1]/w:body[1]/w:p[1]


Viewing all articles
Browse latest Browse all 35

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>