XML has been a pervasive tool in software development for over a decade. It provides a way to communicate data in a manner that is simple to understand and free of platform dependencies. Also pervasive in software development is what I consider to be the anti-pattern of using string manipulation to create XML. This usually starts with a “quick and dirty” approach because you need an XML document and looks like (for all of the examples here, we’ll assume we’re writing the body of a method intended to take a Contact object and return an XML string):
return string.Format("<Contact><BusinessName>{0}</BusinessName></Contact>", contact.BusinessName);
In the code example, I created (or at least believe I created) an XML document representing a simple contact object in one line of code with very little overhead. Work’s done, right? No it’s not. You see, what I didn’t realize was that this code would be used in the real world instead of my fantasy world where I own all the data and can prevent any of it containing problematic values. If I use this code to create a contact record for the business “Sanford & Son”, any XML parser will be incapable of processing the data because the ampersand is special in XML and should have been encoded as &.
Following the pattern that I have seen many times over, my next step as a developer is going to be to do what any developer in his right mind would do – instruct the user that ampersands are “bad” and they cannot be used without breaking computers. This may work in many cases and is often accompanied by logic at the UI layer of applications to block these “bad” characters, but sooner or later someone is going to figure out that other applications allow for them and will want the same. This often leads to the creation of “cleaner” functions that perform a replace on the strings for every special character that the person writing the function can think of. The cleaner function will usually grow over time as support requests reveal characters that were missed in the initial cut. Sooner or later you end up writing your own somewhat functional XML engine.
I have never been told by anyone paying me to write code that they would like to buy a somewhat functional XML engine. My employer/customer’s needs have always been for something that may use XML, but ultimately is functionality that drives business value. I’m not going to build an XML engine.
So how can I generate XML that is always well-formed without writing my own engine? Easy – use one of the ones provided to you for free! If you’re in a shop that still supports VB6 applications, you can use the DomDocument or MXXMLWriter object (of the two I prefer MXXMLWriter, but I’m not going to fully describe either here). For .Net Framework applications prior to the 3.5 framework, the code is a little more verbose than I would like, but easy once you understand what pieces are required:
using (StringWriter sw = new StringWriter())
{
using (XmlTextWriter writer = new XmlTextWriter(sw))
{
writer.WriteStartDocument();
writer.WriteStartElement("Contact");
writer.WriteElementString("BusinessName", contact.BusinessName);
writer.WriteEndElement(); // end Contact element
writer.WriteEndDocument();
writer.Flush();
return sw.ToString();
}
}
Looking at that code, it’s easy to understand why people are drawn to the initial one-liner. Lucky for us, the 3.5 .Net Framework added the System.Xml.Linq.XElement object. This object takes away a lot of the complexity present in the XmlTextWriter approach and allows us to generate the document as follows:
return new XElement("Contact", new XElement("BusinessName", contact.BusinessName)).ToString();
While it is very common for people to use string manipulation to create XML, I’ve discussed here reasons not to use this method and introduced powerful APIs that are built into the .Net Framework as an alternative. I’ve given a very simplistic example here to highlight the most basic XML generation task. For more information on the XmlTextWriter and XElement APIs, check out the MSDN library.