XML and Java
Most web developers are intimately familiar with HTML, which is a language for presenting information on-screen so that it can be read by a human. A new markup language is rapidly gaining attention, however. XML allows for the presentation of information which can be read by a computer program. It is likely that the future of web development includes the creation of increasing numbers of programs, which make intelligent use of the data on XML-based web pages. And Java is a very good language for creating those programs.
There has been a close relationship between Java and XML since the earliest mention of XML. John Bosak of Sun Microsystems, Chair of the XML Working Group has said that “XML gives Java something to do” (Web Techniques, pg. 43). Since there has been a decision to provide a standard Java API for manipulating XML (WT Pg. 43), the use of Java to manipulate XML documents is likely to continue, and increase over time.
What is XML?
XML stands for eXtensible Markup Language. It looks a lot like HTML. In fact, both HTML and XML are commonly viewed as a subset of SGML (Standard Generalized Markup Language). SGML is very complicated, a fact that has lead to it’s failure to gain widespread usage. HTML, its greatly simplified descendant, has been a resounding success, but it is beginning to demonstrate some significant limitations. XML is in between these markup languages in terms of complexity. It is more complex than HTML, but still significantly less complex than SGML. (Dynamic Web Publishing Unleashed – Pg. 744-745). It is essentially an attempt to define a common ground between HTML and SGML.
Like SGML, XML is a metalanguage for defining markup languages. XML allows you to define your own markup language consisting of new ‘tags’ which you can use to encode the information in your web documents far more precisely than can be done with HTML. XML is not a replacement for HTML. It is, instead, a supplement to HTML. While HTML will continue to be used for standard web pages, XML will be useful for applications that need more intelligent documents and more processing ability (DWP – Pg. 745).
The main limitations of HTML are lack of extensibility, structure, and validation. (http:// metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html).
Extensibility. HTML has a fixed number of tags. While the W3C and browser developers can (and frequently do) add tags, users cannot create their own tags to more accurately describe their data.
Structure. HTML does not support the creation of nested tags, which would be used to describe and represent databases or object hierarchies.
Validation. HTML does not support document validation. It has no means of allowing an application to check the data for validity, or to ensure that the markup is correct and well formed.
XML differs from HTML in all three of these major areas:
It allows developers to define new tags and attributes as needed
It allows document tags to be nested as deeply as needed.
Any XML document can include or make reference to a description of its grammar and syntax for use by applications that need to validate the structure of a document.
XML will be most widely used in applications, which cannot be accomplished within the limitations of HTML. According to Jon Bosak of Sun Microsystems (Future of the web)
“These applications can be divided into four broad categories:
Applications that require the Web client to mediate between two or more heterogeneous databases
Applications that require the Web client to present different views of the same data to different users.
Applications that require the Web client to present different views of the same data to different users.
Applications in which intelligent Web agents attempt to tailor information discovery to the needs of the individual users.
One more quote from Mr. Bosak: “XML can do for data what Java has done for programs, which is to make the data both platform-independent and vendor-independent”.
It can be argued that Java is an ideal language to use to create the applications listed above. At the most obvious level, both have been promoted almost exclusively for use in Web environments. But several features of the Java language make it particularly well suited for working with XML.
One feature is support for Unicode. Most commonly used languages still favor use of ASCII to represent strings. This choice, however, makes it difficult to represent the character sets of many non-English languages. Unicode, on the other hand, has some 39,000 characters, and has plenty of room for expansion. Java supports Unicode from the bottom up (WT – Pg. 44).
Other features of Java made it very easy to share the code needed to build XML processing applications. The most important of these are packages, dynamic class loading, and Javabeans (WT – Pg. 44). Packages and package naming allow for the sharing of code across the Internet without name clashes. Dynamic class loading allows applications to ship with a minimum configuration, and retrieve additional components as needed. And JavaBeans are useful for expressing XML because they can have a straightforward data model, and can be subclassed to exhibit specific behaviors. (WT Pg. 44).
Since the early days, Java has been used to create XML parsers. One of the first such efforts, originating in the XML-Dev mailing list used Java to create a parser which is now known as SAX (Simple API for XML). Because the developers were from all over the world, they needed a language, which made it easy for developers using much different system to share their work. Java is much better for this than other languages such as C or C++, since it is totally machine independent, and Java executables don’t need to be re-compiled to run on different platforms. (XML and Java Technologies)
Sun has promoted the concept of portable data and portable code. Sun is adding XML extension to the Java language to further promote its use for this type of application. The proposed Sun extension will allow developers to use standard API functions to read, manipulate, and generate XML text. This will make it much easier for developers to use XML technologies in Java. It will also provide a standard, which helps ensure compatible and consistent implementations. (XML & Java Technologies)
Since business forms and other documents in the future are likely to use XML in order to be more portable (machine and platform and application independent), it makes sense that the language that will be used to process these documents will also be platform independent. If purchase orders, insurance forms, invoices and other documents can be sent as XML documents, then virtually any user on any platform can create them. And the organizations, which receive and process these documents, will need platform-independent and standards-based tools for validating and processing them.
What kinds of applications are suited to XML and Java?
According to JP Morgenthal, Director of Research at NC.Focus, XML and Java are particularly well suited to the following applications. We will discuss each of these in some detail to expand on his thoughts:
Electronic Data Interchange and E-Commerce
Electronic Data Interchange (EDI)
Enterprise Application Integration (EAI)
Electronic Data Interchange and E-Commerce – The Internet has triggered a virtual explosion of web-based business transactions. Consumers are increasingly buying products from e-stores, and businesses are rapidly implementing purchasing and supply-chain systems using Internet standards. Today, much of this data is moving in the form of proprietary data files and web-based forms, which are closely tied to back-end programs such as CGI/Perl scripts or Java Servlets.
It has been argued that these systems would be much more flexible if they shared data using a standard XML format. This would allow for more flexible formatting of data, easier maintenance as requirements change, and the separation of the data from the programs and screens which create it, and the programs which process it. When the data is in an XML format, it describes itself. This allows programs to interpret this description, and act accordingly.
Without XML, each electronic transaction requires the creation of proprietary parsers for each data format. In addition, it is necessary to implement elaborate validation routines to insure that all required data is provided, is consistent, and meets the rules of the transaction. This can be a significant effort, requiring custom programming to create the data, and to process it at its destination.
XML alleviates this problem by providing a standard way to encode data, validate it, and parse it. This function requires the availability of a DTD (Document Type Definition). While DTDs are not required for all XML documents, when one is present a standard parser can refer to it and use it to determine whether or not an XML document is, in fact, valid. If application, which creates the XML data, uses the same DTD as the receiving application, we can be virtually guaranteed that all interested parties will be honoring the same ‘rules of the game’. And no custom programming and proprietary data formats will be needed. This is all based on standards and widely available java-based XML parsers and generators.
Another benefit is that this content and format validation can be separated from the processing application. It can even be run on a completely different machine. This reduces the requirements for the application, which does the final processing of the XML data, and may significantly speed such applications, since they no longer need to include elaborate validation logic.
While theoretically XML parsers can be created with any language, Java is well suited to this partly because of its support of Unicode. These types of business transactions are likely to be international in many cases, and it is well known that ASCII data doesn’t support enough characters to represent many foreign languages. Unicode is much better suited to representing all languages. And to repeat an earlier point, Java has supported Unicode from the start.
It should be noted that this type of processing would work best when DTD ‘vocabularies’ are developed and shared by all. If there is a common definition of an invoice, then many companies can easily send and receive these ‘e-invoices’. Unfortunately, there has not been rapid progress to create shared ‘DTD repositories’ yet. According to Jeff Walsh, published in Info World (July 19, 1999 Pg. 48):
“’While it is true that it would be nice to have an agreed-upon set of vocabularies which everybody can adhere to, this is not the case yet and probably will not be the case in the near future’ said Norber H. Mikula, chief technology officer of both Data channel, in Seattle, and the Organization for the Advancement of Structured Information Standards (OASIS) industry consortium, which acts as a repository for XML schemas.
Mikula said many of the proposals need to be put through their paces and then revised before they are worthwhile, which does not quite match today’s business models.”
There has been some headway made, however. According to the same article in Info World (pg. 48), several vendors are currently developing these specifications, including:
CommerceNet(eCo Framework working Group): eCo is teaming with OBI Consortium, RosettaNet, and other industry-specific groups to come up with an umbrella framework to make the various XMNL specifications co-exist and intercorperate.
cXML.org(commerceXML): this standard specifies HTTP-based protocols for information exchanges and defines a DTD for various documents such as purchase orders, order acknowledgments, and catalogs specifically targeting MRO (maintenance, repair and operations) purchases.
The OBI (Open Buying on the Internet) Consortium: OBI relies heavily upon existing standards; buying and selling organizations establish an OBI “trading web” or extranet using the Internet for communicating MRO purchasing transactions.
Microsoft (BizTalk Framework): BizTalk’s charter is to provide guidelines for creating XML schemas so that developers can create DTDs and XML vocabularies in standardized and interoperable ways.
RosettaNet (eConcert): eConcert outlines a set of Partner Interface Process (PIP) specifications. Central to PIP is an XML document based on specifically developed framework DTDs that specify PIP services, transactions and messages. When used with the standardized data dictionary, you can create catalog entries and e-commerce documents describing products.
XML/EDI Group (Various): This group promotes XML for electronic Data interchange applications, creating dictionaries, frameworks, and implementation guidelines for vertical industries.
Electronic Data Interchange (EDI) – EDI is a fairly widely used system for conducting business electronically. It nearly always uses a VAN (Value Added Network) for transmission, and uses either X12 or EDIFACT standards to define the encoding of data. EDI transactions have traditionally been reserved for larger companies and large transaction volumes because of the costs involved in setting up EDI relationships and operating the network. EDI transactions often involve customization, especially when one or both parties want to extend or modify the data, which is being transferred between parties.
It is thought that XML-based EDI can reduce the difficulty of dealing with these customizations. It is also likely to be much less costly than traditional EDI, because it can use the Internet, rather than an expensive VAN. Also, it should be much easier to validate and parse XML data, since there are standards for this process, and easily shared Java code, which implements this. And of course, this Java code runs on virtually any platform without modification. XML and Java together provide portable data, and portable code to process it.
Enterprise Application Integration (EAI)- One other application that has been proposed for XML is the passing of data between applications in a large enterprise. It has been suggested that XML is a better solution than proprietary data formats for the movement of this data between systems.
In the article by JP Morgenthal, he suggested that XML could be used to send information about sales orders from the sales to the accounting department, and information about invoices from accounting to sales and collections. He proposed that such an XML-based system could generate all the necessary communications for a transaction using this standards-based data format.
Perhaps we don’t fully understand his vision, but we feel that this may not be an appropriate use of the technology in most cases. Current shared database technology provides for a ‘two-phased commit’ process which insures that all parts of a transaction are properly recorded, or a complete rollback occurs which prevents inconsistent data.
While the idea of XML for passing data around the enterprise sounds interesting, we don’t feel that this is going to be widely used until techniques are in place to insure that this same capability is available.
The introduction of XML has allowed for the creation of internet-based documents, which can be understood by software, instead of humans. XML allows the information on a web document to be described in terms of a hierarchy of descriptive tags, making it well suited to the sharing of data between individuals and organizations. But for this information to be processed, programs will need to be able to handle this data in a standard way. If these programs can be developed and run on a wide variety of computer platforms, they will be available for use worldwide, on virtually any type of computer system.
This is leading to great interest in the standard, and will result in development of many programs, which create and process XML data. For many of the reasons, which we have discussed, Java is likely to be the language used to create many of these systems. It will be very interesting to see which of the many ideas out there become viable business solutions. The next ‘Microsoft’ is probably just getting started right now.
1. Bosak, Jon (1997). XML, Java, and the future of the Web, http://metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
2. Fuchs, Matthew, (1999). Why XML Is Meant for Java Exploring the XML/Java Connection, Web Techniques, June 1999.
3. Morgenthal, JP. Portable Data/Portable Code: XML & JavaTM Technologies
Word Count: 2595