Parsing XML with XMLParse and XPath
If you need to extract data from XML using Macro Scheduler you can use the built in XMLParse function.
XMLParse allows you to use XPath syntax to specify which piece of information from the XML you want to retrieve.
To demonstrate how it works lets start with a simple XML example. Consider this XML:
<?xml version="1.0" encoding="UTF-8"?> <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>
XMLParse takes this syntax:
XMLParse>XML,XMLPath,result_string,num_items
XMLParse takes an XPath specification to identify the element we want to get information from, a result variable to put the result in, and another variable to store the number of items found. A good place to find out how XPath works is here. I'm not going to replicate XPath documentation here. Instead we'll look at some examples which should help understand it, which you can look it while also referring to the w3schools link above.
So let's see how we could retrieve all the book titles in the above XML. Before we do anything of course we need to have retrieved this XML from somewhere. It may be the result of an HTTPRequest call, but for now let's assume it's in a file and we've retrieved it to a string variable called strXML using the ReadFile command.
First we need to get the number of books:
XMLParse>strXML,/bookstore/book,val,numBooks
Notice how we construct a path to the node (or nodes) we are interested in starting at the root with /. Our "book" elements are within the "bookstore" node. So to get to them we use /bookstore/book. For more info on XPath Syntax look here.
numBooks now contains the number of elements that match. Once we have that we can loop through each one:
Let>k=0 Repeat>k Let>k=k+1 XMLParse>strXML,/bookstore/book[%k%]/title/text(),val,len MessageModal>val Until>k=numBooks
Notice we specify the index within [ and ] after the node and use text() to tell XPath to retrieve the text of the element - that's the bit in between the opening and closing tags. If we just wanted to retrieve the first book we could have used /bookstore/book[1] but here we've constructed a loop for each book so we are specifying our loop counter k: /bookstore/book[%k%]
Now, this is all very well, but what if we want to specify a bit more detail. Let's say we only want to retrieve only books in the "WEB" category (category="WEB"). To do this we can use the attribute specifier:
XMLParse>strXML,/bookstore/book[@category='WEB'],val,numBooks
Notice how we use the [@attribute='value'] syntax after the node name. So here we are looking for the number of book elements with the "category" attribute set to "WEB". In other words, all the books in the web category. Note that the value (WEB) needs to be in single quotes.
So we modify our loop in a similar way:
Let>k=0 Repeat>k Let>k=k+1 XMLParse>strXML,/bookstore/book[@category='WEB'][%k%]/title/text(),val,len MessageModal>val Until>k=numBooks
We're still using a loop counter - after the [@category='WEB'] identifier we specify the index of the item to retrieve.
We can use the @attribute syntax in a similar way if we want to retrieve the value of an attribute, instead of the text() component as above. E.g. look at the title nodes in the XML. Notice they have a "lang" attribute for the language of the book. Let's say we want to retrieve this value to see what language our books are, as well as the title. We could add:
XMLParse>strXML,/bookstore/book[@category='WEB'][%k%]/title/@lang,sLang,len
So instead of text() we have @lang. Our result variable sLang will now contain the value of this attribute (in our case always "en").
So our full code becomes:
XMLParse>strXML,/bookstore/book[@category='WEB'],val,numBooks Let>k=0 Repeat>k Let>k=k+1 XMLParse>strXML,/bookstore/book[@category='WEB'][%k%]/title/text(),val,len XMLParse>strXML,/bookstore/book[@category='WEB'][%k%]/title/@lang,sLang,len MessageModal>%val% (%sLang%) Until>k=numBooks
For another example using this kind of syntax take a look at this recent forum post.
