15th September, 2008

Quick and easy XML creation

Generating XML is easy to get wrong. Unescaped entities, mismatched tags, Unicode characters create a minefield for a developer. XHTML is a subset of XML so the same rules apply, however I've noticed not many "web frameworks" generate guaranteed pure XHTML.

Looking around at templating solutions, I've seen content injected into XML directly. For example:

output = "<input name=\"" + name + "\" value=\"" + value + "\" />"

This breaks one of the fundamental rules of XML: you should generate XML using an XML parser - this way character encoding and escaping is done automagically.

I was envious of the C# XmlTextWriter object where you can generate XML in a linear fashion, one element at a time. Nobody seems to have a simple Python answer so I created the following while cooking some chicken.

import os
import libxml2
import libxslt

class Writer(object):
    "Simple class for easily building XML documents in a linear fashion."

    def __init__(self):
        self._libxml = None
        self._selectednode = None

    def StartDocument(self, root_element):
        "Start an XML document."
        self._libxml = libxml2.readDoc("<?xml version=\"1.0\" encoding=\"utf-8\" ?><%s/>" % root_element, None, None, 0)
        self._selectednode = self._libxml.getRootElement()

    def StartElement(self, element_name):
        "Create an element"
        new_node = libxml2.newNode(element_name)
        self._selectednode.addChild(new_node)
        self._selectednode = new_node

    def WriteElement(self, element_name, element_value):
        "Write an element, set it's contents to element_value and close it."
        if element_value == None:
            element_value = ""
        new_node = libxml2.newNode(element_name)
        new_node.addContent(element_value)
        self._selectednode.addChild(new_node)

    def WriteAttribute(self, attribute_name, attribute_value):
        """Create an attribute on the current working node, set its value to [value].
            If the attribute exists, append the value to existing value"""
        if attribute_value == None:
            attribute_value = ""
        if self._selectednode.prop(attribute_name) == None:
            self._selectednode.setProp(
                attribute_name,
                attribute_value
            )
        else:
            self._selectednode.setProp(
                attribute_name,
                self.setProp(
                    self._selectednode.prop(attribute_name),
                    self._selectednode.prop(attribute_name) + attribute_value
                )
            )

    def WriteAttributes(self, attributes):
        """Write the attributes and values passed in as parameters to this function.
        Pass either as parameters or a dictionary. """
        for loop_argument in attributes:
            value = attributes[loop_argument]
            if value == None:
                value = ""
            self._selectednode.setProp(
                loop_argument,
                value
            )

    def WriteContent(self, value):
        "Append the contents of value to the current working node"
        if value == None:
            value = ""
        self._selectednode.addContent(value)

    def EndElement(self):
        "Close the current working element"
        self._selectednode = self._selectednode.parent

    def WriteRaw(self, xml):
        "Write raw XML into the dom"
        if type(xml) == unicode():
            xml.encode("utf-8")
        xmlDoc = libxml2.parseDoc("<?xml version=\"1.0\" encoding=\"utf-8\" ?>" + xml)
        self._selectednode.addChild(xmlDoc.getRootElement())

    def Output(self):
        "Return the document XML"
        return self._libxml.getRootElement().serialize()

    def Save(self, file_name):
        "Save the document to the filesystem"
        self._libxml.saveFile(file_name)

    def Transform(self, transform_filename):
        "Transform the document using the filesystem and returns the result as a string"
        style = None
        result = None
        styledoc = libxml2.parseFile(transform_filename)
        style = libxslt.parseStylesheetDoc(styledoc)
        transformed = style.applyStylesheet(self._libxml, None)
        result = style.saveResultToString(transformed)

        if style != None: style.freeStylesheet()            
        if transformed != None: transformed.freeDoc()       
        return result

    def __del__(self):
        if self._libxml != None:
            self._libxml.freeDoc()

Usage:

Create a new writer

xmlwriter = Writer()

Start a new document with a root element:

xmlwriter.StartDocument("root")

Write a one-off element complete with content:

xmlwriter.WriteElement("legend", "This is the title here")

Start a new element and prepare to write into it.

xmlwriter.StartElement("newnode")

We can now either write attributes one at a time:

xmlwriter.WriteAttribute("attribute1", "some old text")

or write them all using WriteAttributes (plural!)

xmlwriter.WriteAttributes(
    attribute1 = "some old text", 
    attribute2 = "stuff", 
    it_doesnt_matter = "what they're called"
)

To end an open element simply:

xmlwriter.EndElement()

When we're done with generating the XML, get the entire document as a string using:

result = xmlwriter.Output()

Save it to disk with:

xmlwriter.Save("/path_to/filename.xml")

or transform it using:

xmlwriter.Transform("/path_to/transform.xslt")

(this will return a string containing the transformed XML)

This method will guarantee you have a perfectly valid XML (or XHTML) document at the end. All entities will be escaped and any nastyness fixed up.

 

The opinions expressed here are my own and not those of my employer.