Companies are always trying to use the Internet for communication internally and externally, and therefore they are always trying to make data transfer and communication as efficient and as easy to use as possible. Lately, enterprises want to be able to create a network between their suppliers, customers and themselves and therefore create a medium to exchange data and information effortlessly and resourcefully between partners.
The problem with this is that each company has a different way in which their data is formatted, and they have different types of databases and DBMS’s. This makes it very complex to exchange data between partners because of the different formats. One way that companies got around this was to write custom made programs that would transform the data into a format agreed upon by the partners that are exchanging data. This solution incurs high maintenance costs and therefore is not an ideal situation.
A solution to this problem is XML. It not only sends the “raw” data to be transported, it also sends all the information about the data (metadata) that is being sent. Therefore, all the information that the receiver needs when they receive the data is all there, in one package, and it can be read in any format that needs be.
The reason that this works is because of its loosely coupled architecture. XML gives the data and its information without setting the requirements on how the data will be handled or displayed. Because of this, it can be freely transported between multiple partners with diverse platforms and whichever applications can speak XML can open and work with this data in whichever format they choose.
The World Wide Web Consortium (WC3) recommended in 1998 for XML (Exentsible Markup Language) to be a format for exchanging data over the web, and since then, it is being used all over the world. The reason that XML is popular is because it provides a common method for identifying data. Sending data in an XML format makes it easy to retrieve, view, and use data.
XML is an abbreviated version of SGML (Standard Generalized Markup Language). SGML is the standard for defining descriptions of the structure of electronic documents, but it is very complex and hard to use. XML has enough functionality of SGML to make it useful, but it does not have the complexity that comes with SGML. XML was created to make it easier to program for a web environment. It also gives programmers the ability to be not so dependant on HTML, which up to now has been pushed to (and over) its limits of what it can do on the web.
A markup language allows one to identify structures in a document. Therefore, XML is a “metalanguage” which means it describes the format of the data in a document. It is an open standard for describing documents containing structured information. (For this paper, we will be referring to structured information as relational database schemas (records, files, etc), and we will refer to documents as either regular text documents or an XML data format which can contain structured information like relational database schemas).
Overview of XML
There are 2 types of documents when referring to them in XML content. They are called “data-centric” documents and “document-centric” documents. We will mostly be referring mostly to data-centric documents, which are documents that use XML as a way to transport data. For the most part, the data in these documents comes from databases. Examples are sales orders, invoices, etc. A document-centric type document is one that is hand-written in XML format. They do not usually originate in the database. An example of a document-centric document is an email.
XML looks like HTML because of the tags that are used, but the tags in XML are not pre-defined as they are in HTML. Also, HTML defines how elements are presented, whereas XML defines what the elements contain. For example, the <p> tag in HTML means paragraph, but in XML it could mean paragraph, person, product, etc – it depends on the author of that specific code. Also, XML is more strict than HTML – if a tag is opened, it must be closed, otherwise the code will not work. Any kind of data types can be defined, and therefore can be easily used to exchange information from databases.
Below is an example of XML adapted from “XML and Databases”, by Ronald Bourret. It is a “data-centric” sales order document written in XML.
<CustName>St. Francis Xavier University</CustName>
<Street>St. FX Street.</Street>
<p><b> Advanced Database Book</b><br>
Large, useful, lifetime guarantee.</p></br>
As one can see from the example above, XML looks very much like HTML, except XML tags can be related to field names in a database, and therefore the words that the tags encapsulate are the data. For example, in the line: <CustName> St. Francis Xavier University</CustName>; CustName is the field, and the data that it holds is “St. Francis Xavier University”.
We also noticed that XML can also use HTML tags like <p> and <br>, but unlike HTML, XML enforces that all the tags be properly closed after use.
The difference between “data-centric” documents and “document-centric” documents is that the latter looks more like HTML than the example above. Document-centric documents also have tags that define what the data is, but the data is written out in paragraphs, unlike the example above where each tag is only “one field”. Document-centric documents would look more like web pages or emails rather than fields out of a database.
A key feature of XML is that it is extensible. This means that a programmer does not have to use “CustName” to define a customer name, other variations of the tag can be used. For example, the tag can be called “CustomerName” or “Cname”, etc. XML does not pre-define its tags, and as a result the programmer is allowed to chose their own definitions depending on the DTD and/or schemas that they are using.
Schemas are descriptions of how a document type should be displayed. They are only new, and
therefore have not been around long enough to become standard. As of now, an alternative is being used. It is called Document Type Definitions (DTDs) and it is used to define what to do with data. A DTD provides applications and people with information about what names and structures are defined in a particular document type. Using a DTD means you can be certain that all documents belonging to a particular type will be constructed and named in the same manner. DTDs aren’t as powerful as schemas.
XML also has a validation feature that gives it an error checking capability. XML provides with the data a description of the grammar that it uses. This is for use by applications like DBMS’s that have to validate the information before it can insert the data into the database. It will check for things like making sure the data is in between the specified domain range and it is also the correct data type. Only schemas have this validation feature set up in their specifications; DTD’s cannot perform validation and error checking.
Benefits and Typical Applications
XML was originally created to overcome the problem of sending documents that are richly structured over the web. HTML and SGML, the possible alternatives, were not able to fulfill this objective. HTML does not provide an arbitrary format; it comes with a specific collection of semantics. Meanwhile, SGML does provide a standard format, but as mentioned before it is too difficult and complex to implement over the web. XML, as mentioned earlier, is a cut-down version of SGML and so it can be used without having to deal with the complexity of SGML. Of course, XML does not have all the functionality of SGML, but for the purposes that XML will be used, all the items that SGML offers will probably not be needed.
XML overcomes the problem of utilizing richly structured documents over the web because the data is totally self-contained. That is, it doesn’t simply just transmit the data that needs to be transferred; it also includes metadata (data formats) about the information that it is sending. Because there is no pre-defined tags, all of the semantics will be identified by either the applications or the people that use them, converting the data from any format to any other format. This allows users to view the data without the need to use the program that created it, making it independent of both vendors and systems. In addition it is platform independent, is Unicode -compliant meaning that it is portable, and can describe data in tree or graph structures.
Another benefit to using XML documents is that because of its structure, any user can pick it up and read it, no matter if they have the correct software or not. That is, if someone doesn’t have a browser that supports XML, they should be capable of opening the text in a text editor, like Notepad, and be able to comprehend what the content means.
Even though the initial reason for creating XML was to make available structured documents over the web, businesses have started using XML for data transfer. Enterprise application integration, electric data interchange, enterprise portals and web services have all begun using XML in their applications. The reasons why these applications have started to use XML is because of its extensibility, structure, validation feature and loosely coupled architecture, all of which are defined above.
Real World Case
The real world case that we chose was of the Ford Motor Company’s Internal Corporate Portal. It involved the implementation of an internal corporate portal using Plumtree. Plumtree uses technology that exports portal resources to other devices and platforms using syndicated tools to circulate data to the portals of suppliers and specialists around the world. They used an
XML-based technology that takes unstructured data in a text index created by the portal and normalizes the metadata from the documents generated by different applications. This offers a shared foundation for an increasingly accurate classification and search. The use of XML helped the company become more efficient in that employees can create documents in any platform using any application and using the portal, other employees can access and use these documents, whether they have the program that created the document or not. With 200,000 employees worldwide this becomes a very powerful tool.
When considering whether to make use of XML in a company there are some issues to consider. One of these issues is that the files using XML are normally bigger than those created using binary formats. This is because XML is wordy due to the fact that it is a text format and it utilizes tags to delimit the data.
Another major disadvantage to XML is the applications that are used to process data that are in XML format have to be able to “speak” the XML language. There are not enough tools around to be able to create things in XML. A couple of years ago this was a big problem, but companies are moving to solve this quickly. Companies like SoftQuad and Extensibility Inc. are in the process of create tools to use with XML.
Browsers also have to be able to read XML since it is mostly a web-based programming language. Microsoft has been on board with XML since the beginning, so their Internet Explorer has been XML compliant for a while. Netscape is a bit behind, but their Netscape 6 does have some XML capabilities. We predict in the next couple of years there will be plenty of XML compliant tools on the web and elsewhere.
XML does not define how information will be structured or what it can mean. This becomes both its strength and its weakness. There’s a lot of room to do what you like, and therefore there are so many different combinations. This also means that anyone and everyone can create their own ways to structure the information in an XML document. Almost every industry that uses the web for data transfers has their own structure. So the question for regular users is: which one do I use?
XML has proved to be a good solution to the problem that arises when companies want to exchange information between themselves and their partners like suppliers and customers.
XML offers them a way to take data that is structured, format in the way that they need to, and then transport to whomever needs it. Any user who “speaks” XML can retrieve, use and view the XML documents easily and efficiently. This solution saves on the cost of writing custom programs or using an application like EDI, and it’s obviously easier to use than SGML and more efficient than HTML. XML is a relatively new product, but it is picking up quickly in the computer world. The Internet is becoming the backbone for communication, and XML is just making it easier.