LINQ to XML

Seems that LINQ to XML does not get near as much attention as LINQ to SQL, but that’s a shame since there is a lot going on here too. The big improvements are in the XML document navigation, working with namespaces and document construction.

I recommend to download any RSS feed, and play around with these examples. If you like learning by example as much as me, that’s the way to go. I often use LINQPad to quickly test code snippets and play around without starting a heavyweight VisualStudio project. In this case, one more plus is the excellent output formatting LINQPad provides via Dump extension method.

Please note that although working, the examples here are simplified for readability, not production material.

Start

Here’s one basic example for the start, together with loading an RSS document and using LINQPad Dump extension method to provide the output.

XDocument feed = XDocument.Load(@"c:\demo\rss.xml");
IEnumerable<XElement> channels = 
            feed.Element("rss").Elements("channel").Elements("title");
channels.Dump();

I will be skipping loading and dump parts in the other examples, but you get the idea.

XML document navigation

First of all, it’s much easier to navigate XML since you can use the chaining syntax.

IEnumerable<XElement> items = 
                    feed.Element("rss").Element("channel").Elements("item");
// or quickly
IEnumerable<XElement> items2 = feed.Descendants("item");

Or quickly extract nodes.

IEnumerable<XElement> items = feed.Descendants("item");

The next big thing is that we are not limited to XPath anymore, we can make a C# queries now – after all, that’s what LINQ stands for, right? :)

XElement channel = 
            feed.Element("rss").
            Elements("channel").
            Where(c => c.Element("title").Value == "bjelic.net").First();
// or if you prefer
XElement channel2 = (
            from c in feed.Element("rss").Elements("channel") 
            where c.Element("title").Value == "bjelic.net"
            select c
            ).First();

However, XPath is still there.

object channel = feed.XPathEvaluate("rss/channel[title = 'bjelic.net']");

But we’ll have to do a bit of casting if we want XElements as the output.

XElement channel = (
            (IEnumerable)feed.
            XPathEvaluate("rss/channel[title = 'bjelic.net']")
            ).Cast<XElement>().First();

And now it’s possible to easily use proper types, DateTime for example.

IEnumerable<XElement> items = 
    feed.Element("rss").Elements("channel").Elements("item").
            Where(i => 
                    {
                    var dateTime = DateTime.Parse(i.Element("pubDate").Value);
                    return dateTime.Year >= 2010 && dateTime.Month >= 11;
                    }
                );

Whole lot of cool stuff is packed here, especially mixed with other C# features. For example if you do a lot of checking if the nodes exist, then if the attribute exists and so on, you can shorten it by using some extension methods, so the chaining doesn’t fail.

public static class XElementExtensions
{
    public static XElement SafeElement(this XElement parent, string elementName)
    {
        return parent.Element(elementName) ?? new XElement(elementName);
    }
}
// ...
// No item / x element.
 
// NullReferenceException: Object reference not set to an instance of an object.
foreach(XElement item in items)
    item.Element("x").Attribute("y").Dump();
 
// But this won't fail.
foreach(XElement item in items)
    item.SafeElement("x").Attribute("y").Dump();

So even if the element named x does not exist,

Working with namespaces

If you are working with an XML namespace, you have to add it when selecting nodes within, but it’s really easy to create and use a namespace now.

XNamespace dc = feed.Element("rss").GetNamespaceOfPrefix("dc");
IEnumerable<XElement> creators = feed.Descendants(dc + "creator");

Really simple and elegant compared to the old syntax.

Creating objects from XML

Creating objects works in the same way as in LINQ to SQL.

XNamespace dcNs = feed.Element("rss").GetNamespaceOfPrefix("dc");
var items = from i in feed.Descendants("item")
            select new {
                        Title = i.Element("title").Value,
                        Published = DateTime.Parse(i.Element("pubDate").Value),
                        Creator = i.Element(dcNs + "creator").Value,
                        Permalink = i.Element("link").Value
                        };

And you can go further, by selecting posts together with categories as a IEnumerable Categories, and then filtering out only tutorials.

XNamespace dcNs = feed.Element("rss").GetNamespaceOfPrefix("dc");
var items = from i in feed.Descendants("item")
            select new {
                        Title = i.Element("title").Value,
                        Published = DateTime.Parse(i.Element("pubDate").Value),
                        Creator = i.Element(dcNs + "creator").Value,
                        Permalink = i.Element("link").Value,
                        Categories = i.Elements("category").Select(c => c.Value)
                        };
var tutorials = from i in items where i.Categories.Contains("Tutorials") select i;

Pushing it a bit further, let’s make categories analysis. This example will extract all the categories, titles of posts within each, and order by the number of posts.

var categories = from i in feed.Descendants("item")
                group i by i.Element("category").Value into grouped
                orderby grouped.Count() descending
                select new {
                            Category = grouped.Key,
                            NumberOfPosts = grouped.Count(),
                            Posts = grouped.Select(g => g.Element("title").Value)
                            };

Creating XML

What is also really cool is that we can create XML structures using the new syntax.

XNamespace atomNs = "http://www.w3.org/2005/Atom";
XNamespace syNs = "http://purl.org/rss/1.0/modules/syndication/";
     
XElement doc = new XElement(
    "rss",
    new XAttribute("version", "2.0"),
    new XAttribute(XNamespace.Xmlns + "sy", syNs.NamespaceName),
    new XAttribute(XNamespace.Xmlns + "atom", atomNs.NamespaceName),
    new XElement(
        "channel",
        new XElement("title", "bjelic.net"),
        new XElement(
            atomNs + "link",
            new XAttribute("href", "http://www.bjelic.net/feed/"),
            new XAttribute("rel", "self"),
            new XAttribute("type", "application/rss+xml")
            ),
        new XElement("link", "http://www.bjelic.net/feed/"),
        new XElement("lastBuildDate", DateTime.Now.ToString("R")),
        new XElement(syNs + "updatePeriod", "hourly")
        )
    );

Further reading

There is a MSDN LINQ to XML Programming Guide, and I have learned a lot about LINQ from 101 LINQ Samples.


Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.