Convert XML to JSON using XSLT

With increasing use of separate services on the same data, the need for portable data formats aroused. XML was one of the first widely used, but lately JSON is blooming.

I don’t have a particular bias here, both serve well in appropriate environment, although JSON carrying the same data could result in about 20% size reduction.

So can they be interchangeable? Just recently, I needed to convert XML data into a JSON format for easier consumption on the client.

The fastest and (sometimes) easiest way to process XML to another format is XSLT.

The full XSLT code is at the bottom of this post and on GitHub.

Performing the transformation

Using XSLT to transform XML to another format is pretty easy, as it’s meant to be. :)

Depending on the environment you are running when you need this, there are different ways you can perform the transformation – so here are some.

What’s important to note here is that the same XSLT is used in all of these methods.

Specifying the stylesheet in XML

The easiest would be to just add a stylesheet to you XML document, like this.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet type="text/xsl" href="xml2json.xsl"?>
<!-- ... XML ... -->

Just open such a XML file in a browser, and the JSON will be there. Here’s an example page, you will see JSON when you open it, but if you view the source code, you will see XML only. The browser is applying the transformation.

Use an executable to transform XML to JSON

Microsoft provides msxsl.exe, a free command line utility to perform transformations, but it works only with MSXML 4.0 libraries (link). So it’s not really usable on Windows 7, for example.

I created a similar, but .NET based command line utility and, here is xsltr.exe that you can download.

C# code excerpt

It boils down to this…

doc = new XPathDocument(xmlfile);
XslCompiledTransform transform = new XslCompiledTransform(true);
transform.Load(xslfile);
XmlTextWriter writer = new XmlTextWriter(outfile, null);
transform.Transform(doc, null, writer);

Compiling the XSLT

But if you need the performance, here is a command line utility  together with the compiled XSLT.

I used the xsltc.exe to create a compiled xslt from the source code. It will compile the XSLT code to IL assembly and it will perform the transformation much faster.

Transform XML to JSON in a browser using JavaScript

To work with XML, DOMParser can be used in all the modern browsers – Firefox, Opera, Chrome, Safari… Of course, Internet Explorer has it’s own Microsoft.XMLDOM class.

Here’s a demo page that performs the transformation. There are a couple of XML files that you can transform, but you can also enter arbitrary XML and transform it.

If you prefer to work with libraries, I tried jsxml and it worked flawlessly.

The pure JavaScript code boils down to these pieces.

Load a string into an XML DOM JavaScript code excerpt

// code for regular browsers
if (window.DOMParser) {
    var parser = new DOMParser();
    demo.xml = parser.parseFromString(xmlString, "application/xml");
}
// code for IE
if (window.ActiveXObject) {
    demo.xml = new ActiveXObject("Microsoft.XMLDOM");
    demo.xml.async = false;
    demo.xml.loadXML(xmlString);
}

Apply the XSLT JavaScript code excerpt

// code for regular browsers
if (document.implementation && document.implementation.createDocument)
{
    var xsltProcessor = new XSLTProcessor();
    xsltProcessor.importStylesheet(demo.xslt);
    result = xsltProcessor.transformToFragment(demo.xml, document);
}
else if (window.ActiveXObject) {
    // code for IE
    result = demo.xml.transformNode(demo.xslt);
}

You can see this in action on the demo page.

XSLT Code

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="utf-8"/>

    <xsl:template match="/*[node()]">
        <xsl:text>{</xsl:text>
        <xsl:apply-templates select="." mode="detect" />
        <xsl:text>}</xsl:text>
    </xsl:template>

    <xsl:template match="*" mode="detect">
        <xsl:choose>
            <xsl:when test="name(preceding-sibling::*[1]) = name(current()) and name(following-sibling::*[1]) != name(current())">
                    <xsl:apply-templates select="." mode="obj-content" />
                <xsl:text>]</xsl:text>
                <xsl:if test="count(following-sibling::*[name() != name(current())]) &gt; 0">, </xsl:if>
            </xsl:when>
            <xsl:when test="name(preceding-sibling::*[1]) = name(current())">
                    <xsl:apply-templates select="." mode="obj-content" />
                    <xsl:if test="name(following-sibling::*) = name(current())">, </xsl:if>
            </xsl:when>
            <xsl:when test="following-sibling::*[1][name() = name(current())]">
                <xsl:text>"</xsl:text><xsl:value-of select="name()"/><xsl:text>" : [</xsl:text>
                    <xsl:apply-templates select="." mode="obj-content" /><xsl:text>, </xsl:text>
            </xsl:when>
            <xsl:when test="count(./child::*) > 0 or count(@*) > 0">
                <xsl:text>"</xsl:text><xsl:value-of select="name()"/>" : <xsl:apply-templates select="." mode="obj-content" />
                <xsl:if test="count(following-sibling::*) &gt; 0">, </xsl:if>
            </xsl:when>
            <xsl:when test="count(./child::*) = 0">
                <xsl:text>"</xsl:text><xsl:value-of select="name()"/>" : "<xsl:apply-templates select="."/><xsl:text>"</xsl:text>
                <xsl:if test="count(following-sibling::*) &gt; 0">, </xsl:if>
            </xsl:when>
        </xsl:choose>
    </xsl:template>

    <xsl:template match="*" mode="obj-content">
        <xsl:text>{</xsl:text>
            <xsl:apply-templates select="@*" mode="attr" />
            <xsl:if test="count(@*) &gt; 0 and (count(child::*) &gt; 0 or text())">, </xsl:if>
            <xsl:apply-templates select="./*" mode="detect" />
            <xsl:if test="count(child::*) = 0 and text() and not(@*)">
                <xsl:text>"</xsl:text><xsl:value-of select="name()"/>" : "<xsl:value-of select="text()"/><xsl:text>"</xsl:text>
            </xsl:if>
            <xsl:if test="count(child::*) = 0 and text() and @*">
                <xsl:text>"text" : "</xsl:text><xsl:value-of select="text()"/><xsl:text>"</xsl:text>
            </xsl:if>
        <xsl:text>}</xsl:text>
        <xsl:if test="position() &lt; last()">, </xsl:if>
    </xsl:template>

    <xsl:template match="@*" mode="attr">
        <xsl:text>"</xsl:text><xsl:value-of select="name()"/>" : "<xsl:value-of select="."/><xsl:text>"</xsl:text>
        <xsl:if test="position() &lt; last()">,</xsl:if>
    </xsl:template>

    <xsl:template match="node/@TEXT | text()" name="removeBreaks">
        <xsl:param name="pText" select="normalize-space(.)"/>
        <xsl:choose>
            <xsl:when test="not(contains($pText, '&#xA;'))"><xsl:copy-of select="$pText"/></xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="concat(substring-before($pText, '&#xD;&#xA;'), ' ')"/>
                <xsl:call-template name="removeBreaks">
                    <xsl:with-param name="pText" select="substring-after($pText, '&#xD;&#xA;')"/>
                </xsl:call-template>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

The XSLT code turned out to be more complicated then I thought – I imagined that the transformation would be more natural, not so case based, but it just isn’t possible (or I don’t see the way).

Resources

Links to some resources I used in the process.

Bojan Bjelić
Working hard on bunch of stuff, positive future above all. I'm blogging mostly about software, productivity and digital world.

46 Comments

  1. Great script !
    I made a modification : On the line 48 :

    +

    Because my xml contained carriage returns in the CDATA : ex:

    1. Oups my code got stripped !
      I added a normalize-space around text() on line 48.

      My CDATA was :
      ![CDATA[ObjDesCadres_Type3]]

      1. Hey Jean-Phillpe, glad to see you found good use of it. The removeBreaks template should be taking care of the line breaks…
        If you send me the XML you’re processing to my email, I’ll take a look.
        Best, Bojan

  2. Thank you for this Bojan, but it looks like your dropbox file or folder has disappeared. I have a lot of xml files that I’d like to convert to json (on the command line), and the .NET utility that you refer to cannot be found. ie, from this, above:

    I created a similar, but .NET based command line utility and, here is xsltr.exe that you can download.

    Or am I missing something? Thanks!

    ( As far as I can tell yours would be the only command line utility “out there”… )

    1. Hi Michael, the xsltr.exe is back in the dropbox – sorry about that.

      If you can use the xslt as-is, and need to get done quickly, I suggest you use the xml2json.zip, the transformation is compiled there.

  3. I am totally new to this, and thus dont know how to fix when my xml contains some text: including our “Ready4Retail” standard: a 90-plus-point inspection. Any ideas?
    ITs not properly parsing and giving an error:
    Error: Parse error on line 1:
    …ards including our “Ready4Retail” standa
    ———————–^
    Expecting ‘EOF’, ‘}’, ‘:’, ‘,’, ‘]’

    Thanks

  4. Bojan,

    This script is awesome! Thanks for publishing it.

    Is there a way to strip the root node from the JSON output? Appreciate your input on this.

    1. Hi Vivek,
      You’re very welcome.
      Didn’t try this, but to skip the root, the solution should be to select whatever node comes first under the root. To achieve this, the value for match=”/*[node()]” attribute in the line 5 should be replaced by another xpath. Let me know if this works out.
      Cheers, Bojan

  5. Hello,
    nice template, I’m trying modify it to work this way:

    xml:

    <Fruit>
    <Apple>xx</Apple>
    <Orange>aa</Orange>
    <Orange>bb</Orange>
    </Fruit>

    to be converted in:

    {
    Fruit: {
    "Apple":"xx",
    "Orange":["aa","bb"]
    }
    }

    I’m having quite a bit difficulty with this (very basic xslt knowledge), help or tip maybe?

    1. Hi Ernest,
      I tried to compose the XSLT in a very generic fashion, so it fits different markup and use cases.

      The change you want would require checking if the nodes of same name have no attributes and then render just the text content.

      This case is starting on the line 21:

      <xsl:when test="following-sibling::*[1][name() = name(current())]">

      So you could start working from there. Hope this helps.

  6. Managed to get it work in the meantime. Actual changes were made in obj-content template, here is changed version:

    <?xml version="1.0" encoding="UTF-8" ?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="utf-8"/>

    <xsl:template match="/*[node()]">
    <xsl:text>{</xsl:text>
    <xsl:apply-templates select="." mode="detect" />
    <xsl:text>}</xsl:text>
    </xsl:template>

    <xsl:template match="*" mode="detect">
    <xsl:choose>
    <xsl:when test="name(preceding-sibling::*[1]) = name(current()) and name(following-sibling::*[1]) != name(current())">
    <xsl:apply-templates select="." mode="obj-content" />
    <xsl:text>]</xsl:text>
    <xsl:if test="count(following-sibling::*[name() != name(current())]) &gt; 0">, </xsl:if>
    </xsl:when>
    <xsl:when test="name(preceding-sibling::*[1]) = name(current())">
    <xsl:apply-templates select="." mode="obj-content" />
    <xsl:if test="name(following-sibling::*) = name(current())">, </xsl:if>
    </xsl:when>
    <xsl:when test="following-sibling::*[1][name() = name(current())]">
    <xsl:text>"</xsl:text><xsl:value-of select="name()"/><xsl:text>" : [</xsl:text>
    <xsl:apply-templates select="." mode="obj-content" /><xsl:text>, </xsl:text>
    </xsl:when>
    <xsl:when test="count(./child::*) > 0 or count(@*) > 0">
    <xsl:text>"</xsl:text><xsl:value-of select="name()"/>" : <xsl:apply-templates select="." mode="obj-content" />
    <xsl:if test="count(following-sibling::*) &gt; 0">, </xsl:if>
    </xsl:when>
    <xsl:when test="count(./child::*) = 0">
    <xsl:text>"</xsl:text><xsl:value-of select="name()"/>" : "<xsl:apply-templates select="."/><xsl:text>"</xsl:text>
    <xsl:if test="count(following-sibling::*) &gt; 0">, </xsl:if>
    </xsl:when>
    </xsl:choose>
    </xsl:template>

    <xsl:template match="*" mode="obj-content">
    <xsl:apply-templates select="@*" mode="attr" />
    <xsl:if test="count(@*) &gt; 0 and (count(child::*) &gt; 0 or text())">, </xsl:if>
    <xsl:apply-templates select="./*" mode="detect" />
    <xsl:if test="count(child::*) = 0 and text() and not(@*) and (name(preceding-sibling::*[1]) = name(current()) or name(following-sibling::*[1]) = name(current()))">
    <xsl:text>"</xsl:text><xsl:value-of select="text()"/><xsl:text>"</xsl:text>
    </xsl:if>
    <xsl:if test="count(child::*) = 0 and text() and not(@*) and name(preceding-sibling::*[1]) != name(current()) and name(following-sibling::*[1]) != name(current())">
    <xsl:text>{"</xsl:text><xsl:value-of select="name()"/>" : "<xsl:value-of select="text()"/><xsl:text>"}</xsl:text>
    </xsl:if>
    <xsl:if test="count(child::*) = 0 and text() and @*">
    <xsl:text>{"text" : "</xsl:text><xsl:value-of select="text()"/><xsl:text>"}</xsl:text>
    </xsl:if>
    <xsl:if test="position() &lt; last()">, </xsl:if>
    </xsl:template>

    <xsl:template match="@*" mode="attr">
    <xsl:text>"</xsl:text><xsl:value-of select="name()"/>" : "<xsl:value-of select="."/><xsl:text>"</xsl:text>
    <xsl:if test="position() &lt; last()">,</xsl:if>
    </xsl:template>

    <xsl:template match="node/@TEXT | text()" name="removeBreaks">
    <xsl:param name="pText" select="normalize-space(.)"/>
    <xsl:choose>
    <xsl:when test="not(contains($pText, ' '))"><xsl:copy-of select="$pText"/></xsl:when>
    <xsl:otherwise>
    <xsl:value-of select="concat(substring-before($pText, ' '), ' ')"/>
    <xsl:call-template name="removeBreaks">
    <xsl:with-param name="pText" select="substring-after($pText, ' ')"/>
    </xsl:call-template>
    </xsl:otherwise>
    </xsl:choose>
    </xsl:template>

    </xsl:stylesheet>

  7. Can anybody help me in acheiving this,
    I want a similar logic of converting xml to text file but not json so
    when input:

    RNILF04
    true
    479

    output
    should be:

    scores:
    score:
    sourcedFrom: SCO
    scoreLabel : RNILF04
    positive : true
    value : 479

  8. This ends up with a stack overflow for xml containing a single element, any ideas how this might be fixed?

    i.e.

    There is an error

  9. This converter doesn’t seem to capture mixed text+tag content. E.g. if I feed it

    Text A Text Asub Text A2 Text A2sub

    the result is

    {“root” : {“node” : {“subnode” : [{“subnode” : “Text Asub”}
    , {“subnode” : ” Text A2sub “}
    ]}
    }
    }

    and the “Text A” / “Text A2” content is lost. Any thoughts on this? At first this looked like a good solution, but this limitation means loss of information from my actual XML content.

    Also, I’m a bit curious on a couple of choices you made which seem to make inverting the transformation more difficult: first, XML attributes and nested tags aren’t distinguished in the output (aside from the attributes being emitted first), and second, when multiple tags of the same name occur in order, they’re converted to a single JSON array. I understand these are essentially arbitrary choices but am curious what the rationale was. In particular, emitting the attributes inside an “xml-attrib”: array or something like that seems like a reasonable option.

    I probably can remind myself of enough XSL to make the last two changes, but I’m not sure I’ve ever been familiar enough with XSL to address the mixed text/tag content issue :-(

  10. name spaces are not getting populated in JSON by using above XSLT..Do u have any other templates which will work for namespaces..thanks Mahesh

  11. Hi,
    I get the following transformation error

    LPX-00118: Warning: undefined entity “egrave”
    Error at line 1

    if a node contain the following text:

    “This is a html formatted text with è character”

    Maybe it is raised by the removeBreaks template.

    BR
    Costantino

      1. Hi Bojan,
        this is what I wrote but that character has been re-rendered by your web site.
        If you put what you wrote exactly in an xml you can get the error I mentioned.

        BR
        Costantino

        1. I know – seems the site rendered it in my comment as well :)
          &egrave; is an unknown XML entity, the XML is not valid with that.
          You would need to escape it, like so &amp;egrave;

  12. Hello Bojan,
    I do need often to convert xml2json at work but I didn’t find a good answer to a question: how to preserve arrays?
    In the second example below, I would like to preserve the structure of the produced json having an array node: suggestions?

    Example – 2 ‘b’ nodes produce a json with ‘”b”: [ … ]”

    1st a node
    1st b node
    2nd b node

    {
    “root”: {
    “a”: “1st a node”,
    “b”: [{ “b”: “1st b node” }, { “b”: “2nd b node” }]
    }
    }

    Example – 1 ‘b’ nodes produce a json with ‘”b”: { … }”

    1st a node
    1st b node

    {
    “root”: {
    “a”: “1st a node”,
    “b”: “1st b node”
    }
    }

    Regards,
    Gas

    1. Sorry Gas,
      The xslt in the current form treats each XML separately. The arrays appear only when there is more than one element with the same name as immediate siblings.
      If you want the output to treat “b” always as array I would suggest modifying the XSLT.

  13. I’m dealing with an existing XSLT stylesheet that outputs (line breaks I presume) in the XML to JSON transformation. Whitespace is allowed in JSON, and the stylesheet and file input work okay in xmlsh. I am using Chrome and trying to get this to work:

    if (document.implementation && document.implementation.createDocument) {
    // code for regular browsers
    var xsltProcessor = new XSLTProcessor();
    xsltProcessor.importStylesheet(xslt);
    result = xsltProcessor.transformToFragment(xml, document);
    } else if (window.ActiveXObject) {
    // code for IE
    result = xml.transformNode(xslt);
    }

    but the result only contains the first line of the stylesheet output. I presume I’ll have to add a parameter to the stylesheet so that line breaks are not added to the JSON, possibly enabling something similar to your removeBreaks template–does removeBreaks work on output JSON? I am not that familiar with XSLT. Alternatively, could you try to put line breaks in your output JSON (say right after the beginning of a JSON object–I can try it too) and see what happens? Is there different output line break we should be using to process this stylesheet in a browser? #xA?

    1. I was able to use your stylesheet with my code to generate JSON. So the problem appears to be with our stylesheet. Do you know a way for browsers to notify you of errors? I can only assume that the errors are caused stylesheet, and the stylesheet needs to be fixed so the browser can recognize it. Is there a special incantation not listed on this page for XSLT 2.0?

      1. I used SaxonCE available here: http://saxonica.com/ce/index.xml The code is very similar. I am not sure if this works with IE, you may have to track that down:

        function onSaxonLoad() {
        $.get(“X3dToJson.xslt”, function(xslt) {
        var xmlString = document.getElementById(‘xml’).value;
        console.log(“VAL”, xmlString);
        var demo = { xslt: xslt};

        // code for regular browsers
        if (window.DOMParser) {
        var parser = new DOMParser();
        demo.xml = parser.parseFromString(xmlString, “application/xml”);
        }
        // code for IE
        if (window.ActiveXObject) {
        demo.xml = new ActiveXObject(“Microsoft.XMLDOM”);
        demo.xml.async = false;
        demo.xml.loadXML(xmlString);
        }
        console.log(“PARSED XML”, demo.xml);

        // code for regular browsers
        if (document.implementation && document.implementation.createDocument)
        {
        var xsltProcessor = Saxon.newXSLT20Processor();
        xsltProcessor.importStylesheet(demo.xslt);
        result = xsltProcessor.transformToFragment(demo.xml, document);
        }
        else if (window.ActiveXObject) {
        // code for IE
        result = demo.xml.transformNode(demo.xslt);
        }

        console.log(‘JSON’, result);
        document.getElementById(‘json’).value = getXmlString(result);
        }, “xml”);
        };

  14. Hi,
    I need to pass a xml to your published xslt,so that the passed xml is converted into JSON.

    Let me know how can I pass it to your xslt and what are the changes required in your xslt so that it will change the passed xml to JSON.

    Appreciating your efforts

  15. http://www.bjelic.net/2012/08/01/coding/convert-xml-to-json-using-xslt/
    https://github.com/bojanbjelic/xml2json/blob/master/xml2json.xsl

    Hi Bojan,
    This XSLT is awesome. I’m not great with XSLT but I was wondering, what if my XML had some actual HTML in it that I didn’t want converted to JSON? How would I just do a xsl:copy-of on certain element(s) while still removing new lines, like all “HTMLContent” elements? My sample XML is in this Google doc:
    https://docs.google.com/document/d/1bYTakrCV58SRgQlt6lIToPkcjw0IDwp6Un5PcFN-_FM/edit?usp=sharing

    Also, for some reason this Sample XML isn’t getting all the new lines removed when transformed to JSON.

    PS: Do you have a JSON to XML transform too?

  16. Hi Dwayne,
    You could extend the xslt to recognize particular element and then apply a different template, for example:

        <xsl:template match="*" mode="detect">
            <xsl:choose>
                <xsl:when test="name()='HTMLContent'">
                    <xsl:value-of select="current()"/>
                </xsl:when>
                ...
    

    Unfortunately, I don’t have a solution that would work the other way around (JSON to XML).

Leave a Reply