[Note from Pinal]: This is a 125th episode of Notes from the Field series. Handing XML documents is not an easy task. Trust me till today I see lots of experts struggling when they have to deal with XML. In my early career I used to be scared with the XML as well. Well, now I have proper understanding of the XML documents and I am no longer worried about this subject.
In this episode of the Notes from the Field series database expert Stuart Ainsworth explains about how to work with XML documents.
Recently, I was working with a client that was dealing with importing XML documents into a staging table for a reporting project; I’ve previously explored the XQuery methods for SQL Server at my blog (http://codegumbo.com), so I thought I could give him relatively straightforward advice on how to import his data. The XQuery methods are very powerful, and allow you to do a lot of complex transformations (including the use of FLWOR in each of the methods).
However, when he provided me with some sample data and the queries he was trying, his XML wasn’t really that complex. It was a single document that he was importing daily, with several values and multiple hierarchical nodes; a simplified example looked something like the following:
<root> <Doc> <ID>77</ID> <Ratings> <Rating> <Attribute0>B</Attribute0> <Attribute1>A</Attribute1> <Attribute2>C</Attribute2> </Rating> <Rating> <Attribute1>X</Attribute1> <Attribute2>Y</Attribute2> </Rating> </Ratings> </Doc> <Doc> <ID>78</ID> <Ratings> <Rating> <Attribute0>K</Attribute0> <Attribute1>L</Attribute1> <Attribute2>M</Attribute2> </Rating> </Ratings> </Doc> </root>
In his real data file, there were several <Attribute> nodes defined under each <Rating> node (in some cases as many as 100). One of the limitations of the XQuery methods is that they only return one value per query, so he was writing a SQL statement like the following (@x is the variable holding the XML):
SELECT ID = t.c.value('../../ID', 'int') , Attribute0 = t.c.value ('(./Attribute0)', 'char(1)') , Attribute1 = t.c.value ('(./Attribute1)', 'char(1)') , Attribute2 = t.c.value ('(./Attribute2)', 'char(1)') FROM @x.nodes('/root/Doc/Ratings/Rating') t(c)
It was manageable, but ugly. It also generated an ugly execution plan:
Performance wasn’t great, particularly when the XML document was large. The client was starting to stress because tuning the query was painful with all of the method statements; I thought about it for a bit, and then proposed that he actually use an older statement for handling XML documents in SQL Server: OPENXML.
OPENXML has actually been a part of SQL Server since SQL Server 2000; it’s different than the methods in that it’s designed to be a rowset provider against a single document. While there’s variations of syntax for it, a version that provides the same output as the client’s attempt is below:
DECLARE @idoc INT; EXEC sys.sp_xml_preparedocument @idoc OUTPUT, @x; SELECT * FROM OPENXML(@idoc, '/root/Doc/Ratings/Rating',2) WITH (ID int '../../ID' , Attribute0 char(1) './Attribute0' , Attribute1 char(1) './Attribute1' , Attribute2 char(1) './Attribute2' ) EXEC sys.sp_xml_removedocument @idoc;
Note the simplicity of column definitions: Name, type, path. While it is syntactic sugar, it does make maintenance a bit easier (in my opinion). However, the real benefit comes from the performance aspect; after several tests of various file sizes, it became clear that the OPENXML method performed much better over large data sets than the xquery methods. For small data sets, the performance of the xquery methods was slightly better than OPENXML, with most of the time being consumed by the preparation of the document (sys.sp_xml_preparedocument). The end result was that the older, simpler syntax was easier to maintain and performed better. So why use one instead of the other?
OPENXML vs XQuery methods
I like rules of thumb; they help me make quick decisions about ways to go. While guidelines are not always 100% accurate, they can help speed up development by suggesting appropriate methods of problem resolution. So, here’s my guidelines for choosing between OPENXML and the newer XQuery methods:
|XML handling||Processes one document at a time.||Can be used against every row in a table with an xml column.|
|XQuery support||Supports XPath syntax.||Supports complete XQuery syntax, including XPath and FLWOR.|
|Parsing||Single-pass parsing of complete document.||Every outputted column requires a method call.|
|Performance||Document size does not impact performance. Small documents are parsed in about the same time as a large document.||Large documents take much more time to parse than small documents.|
SQL Server is a complex product, with lots of different methods to solve development problems. In this specific case, solving a performance problem was as simple as choosing a different method to parse data. In the end, the client was able to reduce his load times significantly, and found a stable method in which to handle a single document.
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
Reference: Pinal Dave (https://blog.sqlauthority.com)
The biggest issue here that XML document does not have Schema that allow such data for node one – what happened if you got 2 identical attributes in s single documents under different Rating nodes, especially if you Sql table has a unique key? As a result we are not addressing a root cause of problem, just have to be very inventive with performance issues triggered by structure of original data.
XSD’s are great, but they’re not always available (I usually consider myself lucky if they are); the xquery methods can validate and parse in a single pass, but there are methods to validate the XML schema prior to using the OPENXML method.
Agree – there are multiple ways to skin the same cat. After all of hype about XML/XQUERY it still a marginal technnology, Now with SQL 2016 JSON is a KING!
Hi, in this code, have a little problem.
DECLARE @idoc INT;
EXEC sys.sp_xml_preparedocument @idoc OUTPUT, @x;
FROM OPENXML(@idoc, ‘/root/Doc/Ratings/Rating’,2)
WITH (ID int ‘../../ID’
, Attribute0 char(1) ‘./Attribute0’
, Attribute1 char(1) ‘./Attribute1’
, Attribute2 char(1) ‘./Attribute2’
EXEC sys.sp_xml_removedocument @idoc;
You need, declar the @x to varchar(max) in my test.