Friday, February 15, 2013

Integrating XML - Past, Present and Future


In my entry on IBM i 7.1 TR6, I called out querying the web with new HTTP functions and using XML to do more with DB2 for i. It just so happens that the very cool example on developerWorks came from Nick Lawrence.

Nick is one of our hotshot software engineers working on DB2 for i in Rochester Minnesota.  He has been involved with putting good things in your tool bag since 1999.  His most recent set of goodies include full text search (think OmniFind), SQL/XML and XMLTABLE.

I asked Nick if he can share some additional insight and give us some guidance on XML and DB2 integration - past, present and future.

Ladies and gentlemen, Mr. Nick Lawrence...

____


Integrating SQL and XML


How do you process XML data?  In particular, how do you publish your relational data in an XML format?  If you’re like most DB2 for i developers, you’ve found working with XML both a necessity and a challenge.  The conversion from rows and columns into an XML document can be a daunting task, but one that cannot be avoided.  While rows and columns are optimal for the transactional and data warehouse workloads that characteristically run on IBM i, XML is the predominate standard for information exchange between web-based applications.  The desire to have web-accessible applications makes publishing relational data as XML data a business imperative.  Development communities can tackle the issue one of two ways.

    1.    Purchase or develop non-integrated and non-standardized tools
    2.    Utilize the integrated SQL/XML publishing functions available in DB2 for i 7.1

For customers not yet running on IBM i 7.1, the database has no native understanding of XML.  In other words, those folks are forced to take option one. 

For some customers, option one means purchasing the DB2 XML Extender portion of the IBM DB2 Extenders product (5761DE1 and 5770DE1).  The DB2 XML Extender option defines a set of user-defined types to represent XML, as well as many user-defined functions and stored procedures for performing common XML tasks. 

For other customers, option one means writing an in-house solution to process XML.   Unfortunately, this set of customers typically ends up reinventing a subset of the capabilities that are already available in the XML Extender option.  Worse yet, these XML solutions are usually not reusable and will need to be reinvented (again) when new XML requirements emerge.

Customers that have moved up to IBM i 7.1 have a built-in and standardized solution available to them for working with XML data using SQL/XML.  It shouldn’t be a surprise that IBM recommends the DB2 for i SQL/XML solution over in-house solutions and the XML Extender option.
 

But why is it better to have an integrated and standardized solution?


To answer the question, let’s look at a specific scenario where relational data is to be published as XML.  A comparison can then be made between a solution that uses SQL/XML and a similar solution that uses the DB2 XML Extender.  Because DB2 XML Extender is a chargeable product, we’d expect it to have more impressive features and capabilities than an average in-house solution.

Figure 1 shows a simplistic and normalized SQL schema that contains customer orders.  Each order has exactly one customer associated with it, and one or more kinds of parts that have been ordered.

Figure 1 - Sample schema





Figure 2 shows a query that produces one row for each type of part in an order. 

Figure 2 - Inner Join

        SELECT           order_id,
                                order_time,
                                customer_name,
                                customer_address,
                                part_code,
                                quantity
        FROM              order
        INNER JOIN    customer
             USING (cust_id)
        INNER JOIN    order_part
             USING (order_id);


The output of the join from Figure 2 (using a set of sample data) is shown in Figure 3.

Figure 3 - Output of inner join




From this data, assume that we wish to produce an XML document that represents order number two, as shown in Figure 4.

Figure 4 - Order document for order number 2

<?xml version="1.0" encoding="UTF-8"?>
<order>
    <id>2</id>
    <time>2012-12-16T10:00:00.000000</time>
    <customer>
        <name>Wally</name>
        <address>Minneapolis</address>
    </customer>
    <part_list>
        <part_list>
            <part>
                <code>Soda</code>
                <quantity>200</quantity>
            </part>
            <part>
                <code>Lunch</code>
                <quantity>10</quantity>
            </part>
            <part>
                <code>XML Book 123</code>
                <quantity>1</quantity>
            </part>
        </part_list>
    </part_list>
</order>


With SQL/XML, this can be accomplished with the query shown in Figure 5.  In this query, the common table expression (order_part_list) is used to aggregate the repeating part elements of an order into a single XML value.  The rest of the XML document is then built around that value using a join.  The join is very similar to what was used in Figure 2, except the join is to order_part_list (the aggregation of order_part), instead of the order_part table.

Figure 5 -SQL/XML query

        WITH order_part_list AS (
         SELECT order_id,
                XMLGROUP(part_code   AS "code",
                         quantity    AS "quantity"
                         ORDER BY quantity DESC
                         OPTION ROW "part"
                                ROOT "part_list"
                         )
                      AS part_list
         FROM        order_part
         GROUP BY    order_id
        )

        SELECT XMLDOCUMENT( 
                XMLELEMENT(NAME "order",
                XMLFOREST(order_id                   AS "id",
                           order_time                 AS "time",
                           XMLFOREST(customer_name    AS "name",
                                     customer_address AS "address")    
                           AS "customer",
                           opl.part_list     AS "part_list"
                          )
                )
               ) AS order_doc
        FROM               order
        INNER JOIN    customer
                    USING (cust_id)
        INNER JOIN    order_part_list opl
                    USING (order_id)
        WHERE           order_id = 2;


The XML Extender approach will encounter many obstacles in performing the same task.  The only way to build an XML document similar to Figure 4 with the XML Extender is to use a document access definition (DAD) file that defines an SQL mapping.  A DAD file that uses an RDB mapping appears to be simpler at first glance, but the RDB mapping’s lack of support for predicates (other than the join condition) makes the RDB mapping approach impossible for this scenario (order_id = 2 is not part of a join condition).  The SQL mapping maps the columns of an SQL statement to elements and attributes in the XML document.  In addition, the condition is allowed to include additional predicates in the WHERE clause.  A DAD file that contains an SQL mapping is shown in Figure 6.

Figure 6 - DAD file with SQL mapping

<?xml version="1.0" encoding="UTF-8"?>
<DAD>
    <validation>NO</validation>
    <Xcollection>
        <SQL_stmt>
         SELECT        order_id,
                        order_time,
                        customer_name,
                        customer_address,
                        part_code,
                        quantity
         FROM        order
         INNER JOIN    customer
             USING (cust_id)
         INNER JOIN    order_part
             USING (order_id)
         WHERE        order_id = 2
         ORDER BY    order_id,
                order_time,
                customer_name,
                customer_address,
                part_code,
                quantity
        </SQL_stmt>
        <prolog>?xml version="1.0"?</prolog>
        <root_node>
            <element_node name="order">
                <element_node name="id">
                    <text_node>
                        <column name="order_id" />
                    </text_node>
                </element_node>

                <element_node name="time">
                    <text_node>
                        <column name="order_time" />
                    </text_node>
                </element_node>

                <element_node name="customer">
                    <element_node name="name">
                        <text_node>
                            <column name="customer_name" />
                        </text_node>
                    </element_node>

                    <element_node name="address">
                        <text_node>
                          <column name="customer_address"/>
                        </text_node>
                    </element_node>

                </element_node>
                <element_node name="part_list">
                    <element_node name="part"
                                  multi_occurrence="YES">
                        <element_node name="code">
                          <text_node>
                                <column name="part_code" />
                           </text_node>
                        </element_node>
                        <element_node name="quantity">
                          <text_node>
                                <column name="quantity" />
                          </text_node>
                        </element_node>
                    </element_node>
                </element_node>

            </element_node>

        </root_node>

    </Xcollection>
</DAD>


One thing to keep in mind is that, when using the SQL mapping support in XML Extender, the columns in the SELECT list and ORDER BY clause of the SQL_stmt element must appear in the exact order they are presented in Figure 6.  The XML Extender product depends on these orderings to correctly generate the XML document.  An SQL/XML solution follows the normal SQL rules and does not have these less apparent restrictions.

The DAD file is considerably longer and more complicated than the SQL query that was shown in Figure 5.  In addition, it hasn’t simplified anything.  The join between the tables still needs to be coded explicitly in the condition element.  Additionally, the multi_occurrence attribute is needed when defining the part element; this option constructs an aggregation that is similar to the aggregation used by the common table expression in Figure 5. 

The "simple" mapping offered by the XML Extender has very quickly turned into a complicated solution!

An XML document can be published by passing the DAD file shown in Figure 6  as a parameter to the dxxGenXMLClob stored procedure.  For comparison, the XML document that is generated from the DAD is shown in Figure 7.

Figure 7 - XML Document created by dxxGenXMLClob

<?xml version="1.0"?>
<order>
  <id>2</id>
  <time>2012-12-16-10.00.00.000000</time>
  <customer>
    <name>Wally</name>
    <address>Minneapolis</address>
  </customer>
  <part_list>
    <part>
      <code>Lunch</code>
      <quantity>10</quantity>
    </part>
    <part>
      <code>Soda</code>
      <quantity>200</quantity>
    </part>
    <part>
      <code>XML Book 123</code>
      <quantity>1</quantity>
    </part>
  </part_list>
</order>


There are a few important differences between the document in Figure 7 and the document that was shown in Figure 4.

The order of the part elements is not the same.  In the SQL/XML query shown in Figure 5, the ORDER BY clause is used within the XMLGROUP function to guarantee a particular order of part elements, however the DAD file solution is not capable of specifying this ordering.

In Figure 7, the content of the time element is not stored in the industry standard xs:dateTime data type.  When SQL/XML publishing functions are used, the conversion happens automatically (as can be seen in Figure 4).

There are some other limitations with the dxxGenXMLClob that are important to mention.

The dxxGenXMLClob stored procedure is capable of returning only a single XML document.  An SQL/XML query returns a result set that can contain many rows and columns.  This means that the dxxGenXMLClob may be inadequate if it became necessary to generate one row for each of the order documents, instead of a single XML document for a specific order.

There is no way to use a parameter marker or host variable when using a DAD file and the dxxGenXMLClob stored procedure.  When using a solution based on SQL/XML, host variables and parameter markers work in a very similar way as any other SQL query.

The DB2 XML Extender has NOT been modernized to support XML namespaces.  In contrast, SQL/XML fully supports namespaces.

Using a DAD file and the dxxGenXMLClob procedure has an advantage over a writing an in-house RPG procedure to publish XML document; the DAD file can be changed to produce a structurally different XML document without recompiling any programs or procedures.  The SQL/XML query in Figure 5 offers the same advantage if the query is encapsulated within a view.  A view allows the query (and the resulting XML document) to be changed without the need to modify the programs that are using the view.  Another idea to keep in mind is that data produced by a view can be reused in many different queries, which is something an XML extender solution cannot accomplish as easily.

A performance comparison between the XML Extender solution and the SQL/XML query is not something that can be easily assessed, given that every use case has unique characteristics.  However, the dxxGenXMLClob procedure needs to run multiple SQL queries to perform the required aggregation(s), and will construct the XML document using the results of the queries, rather than as a part of the query.  The SQL/XML solution in Figure 5 is a single SQL query that does everything.  Figure 8 shows the SQL statements that were logged in the SQL Performance Monitor during a single call to the dxxGenXMLClob procedure.

Figure 8 - SQL statements used by dxxGenXMLClob

SELECT "ORDER_ID", "ORDER_TIME", "CUSTOMER_NAME" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "CUSTOMER_NAME"

SELECT "ORDER_ID", "ORDER_TIME", "CUSTOMER_NAME", "CUSTOMER_ADDRESS" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "CUSTOMER_ADDRESS"

SELECT "ORDER_ID", "ORDER_TIME", "CUSTOMER_NAME", "CUSTOMER_ADDRESS", "PART_CODE" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "PART_CODE"

SELECT "ORDER_ID" FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp ORDER BY "ORDER_ID"

SELECT "ORDER_ID", "ORDER_TIME" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "ORDER_TIME"



Hopefully, a comparison between the query in Figure 5 and the ones in Figure 8 provides convincing evidence that the SQL/XML solution (Figure 5) is going to be less work for DB2 for i to optimize and evaluate. 


This blog entry has only very briefly touched on some of the advantages of publishing XML using SQL/XML functions; there’s also support in DB2 for i 7.1 for decomposing XML, validating XML, querying XML, transforming XML, and full text search of XML data.  Similar to the publishing functions, there are significant advantages to using an integrated and standardized solution for each of these tasks.

The SQL/XML function in DB2 for i 7.1 is standardized, has more capabilities, and is simpler to code and maintain.  There really isn’t any convincing motivation to choose a non-integrated solution over the built in support in 7.1.  In fact, DB2 for z/OS and DB2 for LUW have already discontinued their support for the XML Extender option; instructing customers to modify their applications to use an SQL/XML solution as a replacement. 

As with any new function, learning to use SQL/XML effectively will require some investment of time and education, but in this case the return on investment is worth it.

The really great news is that because the SQL/XML functionality is standardized across the industry and DB2 family, there is a significant amount of support and educational opportunities available.


More Sources of Information


We’ve recently published a whitepaper here that describes the differences between XML Extender and the built-in XML data type and support in great detail.  The paper discusses many practical use cases for XML and includes a number of RPG examples.  This is well worth a read by anyone interested in SQL and XML technology.


Be sure to also check out the SQL/XML programmer’s guide in the IBM i information center here.  This document has a number of tutorials included in it.


See also several articles in developerWorks and IBM Systems Magazine...

XML Overview
http://www.ibmsystemsmag.com/ibmi/developer/general/xml_db2_part1/
http://www.ibmsystemsmag.com/ibmi/developer/general/xml_db2_part2/

Using XML and RPG
http://www.ibm.com/developerworks/ibmi/library/i-using-rpg/index.html

i Can Use XML in a Relational World
http://ibmsystemsmag.blogs.com/i_can/2012/05/i-can-use-xml-in-a-relational-world.html

If you are currently using IBM DB2 XML Extender option, you may also find the manual in the IBM Information Center for this product helpful here.

Finally, the IBM STG Lab Services DB2 for i Center of Excellence team provides education and consulting on a wide range of database topics.  This team works closely with the DB2 for i Development team and is a great resource.

____


Fantastic, thanks Nick!  This really helps to put the XML integration options in perspective.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.