Browse by Domains

What is XML | XML Tutorial and XML Elements, Attributes & Documents

  1. XML Introduction
  2. History of XML
  3. Features of XML
  4. XML Syntax
  5. Advantages of XML/ Disadvantages of XML
  6. XML Tree
  7. XML Documents
  8. XML Declaration
  9. XML Tags
  10. XML Elements
  11. XML Attributes
  12. XML Comments
  13. XML Character Entities
  14. XML CDATA Sections
  15. XML White Spaces
  16. XML Processing
  17. XML Encoding
  18. XML Validation
  19. XML Namespaces
  20. XML Parser
  21. XML DTD
  22. XML Schema
  23. XML DOM
  24. XML Database
  25. XML Example
  26. XML CSS
  27. DTD vs XSD
  28. CDATA vs PCDATA
  29. SAX XML
  30. XML Data Binding
  31. XML Editors
  32. XML Viewer
  33. XML Processors
  34. XML Vs HTML
  35. JSON vs XML

XML Introduction

What is Markup?

XML as a markup language characterizes a set of rules for encoding scripts (documents) in an arrangement that is both comprehensible and machine-decipherable. Thus, a developer would love to ask know, what precisely is a markup language? Markup is data added to a script that upgrades its significance in so many ways, it distinguishes, however, the components and how they identify with one another. Particularly, a markup language is a bunch of emblems that can be put in the content of a script to differentiate and name the pieces of that script. 

Consider the following example of XML markup when put together in a bit of text:

<message> 
<text>Hello, world!</text>  
</message>

This piece incorporates the markup emblem, or the labels, for example, <message>…</message> also, <text>… </text>. The labels (tags) <message> and </message> mark the beginning and the end of the XML code section. The labels (tags) <text> and </text> encompass the content Hello, world!.

The process of the invention of XML started around the 1990s with the sole aim of integrating the definition of new text elements. XML Working Group (Initially known as the SGML Editorial Review Board) created XML in the year 1996. The group was led by Jon Bosak of Sun Microsystems with the dynamic cooperation of an XML Special Interest Group (recently known as the SGML Working Group) likewise sorted out by the W3C. Don Connolly who filled in as the Working Group’s contact was among the team as a contact with the World Wide Web Consortium (W3W). 

Extensible markup language (XML) is a script (document) formatting language consumed by a few websites. The extensibility of XML makes it imperative, however, extremely useful language. More precisely, XML is a disentangled type of standard generalized markup language (SGML) expected to target scripts that are circulated on the Internet. Similar to SGML, XML utilizes document type definitions (DTDs) for characterization of documents and also the implications of tags utilized in them. XML gives a larger number of sorts of hypertext joins than HTML, for example, bidirectional connections and connections comparative with a script subsection. Furthermore, the ability of XML to adopt conventions allows it fully interpret and decipher text elements. For example, script elements are set by start and end tags, <BEGIN>… </BEGIN>.  

The plan objectives for XML are: 

  • XML will be direct-usable over the Web. 
  • XML will uphold a broad assortment of uses. 
  • XML will be viable with SGML. 
  • It will be anything but difficult to compose programs which measure XML scripts (documents). 
  • The quantity of alternative highlights in XML is meant to be indisputably the base, preferably zero. 
  • The XML documents ought to be human-intelligible and sensibly clear. 
  • The XML configuration ought to be arranged rapidly. 
  • XML document shall be any means be straightforward and simple to create.

By far, XML is immensely significant. Dr. Charles Goldfarb who was individually engaged during its innovation said, “the sacred goal of computing, tackling the issue of general data trade between unique frameworks.” It is likewise a helpful organization for virtually all things ranging from circumscribing files to data and scripts of practically any sort. XML is a user-friendly language and automatically produced that you would not need to be very much familiar with everything commands/specifications so as to execute programs or technically benefit from it. What makes a difference is to comprehend the main logic behind XML and what it does, and consequently, you can perceive how to manoeuvre it in your own activities.

History of XML

Here are some significant XML historical milestones:

  • SGML was also the source of XML.
  • In February 1998, XML version 1.0 was released.
  • IETF Proposed Standard: XML Media Types, January 2001
  • The Extensible Markup Language (XML) is an acronym for Extensible Markup Language.
  • GML was invented in 1970 by Charles Goldfarb, Ed Mosher, and Ray Lorie.
  • Sun Microsystems pioneered the invention of XML in 1996.

Features of XML

  • It represents an extensible markup language.
  • It was invented to be naturally engaging.
  • There are no predefined XML tags. You must define your personalized tags.
  • XML was created to transport data, not to display that data.
  • The mark-up code of XML is simple for a human to understand.
  • The structured format, on the other hand, is simple to read and write from programmes.
  • XML, like HTML, is an extensible markup language.

HTML and XML

  • XML is extensible, while HTML is not.
  • Both XML and HTML are markup languages. 
  • XML was invented to backlog and convey data, while HTML is meant for publishing and visualizing data.
  • HTML tags are more defined than XML tags. 

Advantages of XML/ Disadvantages of XML 

Here are some key advantages of utilizing XML:

  • Documents can now be moved between systems and applications. We can swiftly communicate data between platforms with the help of XML.
  • XML decouples data from HTML.
  • The platform switching procedure is speeded up with the help of XML.
  • User-defined tags / Customised tags can be created. 

Here are some disadvantages of utilizing XML:

  • The use of a processing application is necessary for XML
  • The XML syntax is quite similar to other ‘text-based’ data transfer protocols, which might be perplexing at times.
  • There is no intrinsic data type support.
  • The XML syntax is superfluous.

XML Tree

Elements trees are used to create XML documents.

An XML tree begins with a root element and branches to child elements.

All elements can have child elements (sub-elements):

<root>

  <child>

    <subchild>…..</subchild>

  </child>

</root>

To describe the relationships between elements, the terms parent, child, and sibling are utilized.

Parents have kids. Parents exist for children. Siblings are children who are on the same grade level (brothers and sisters).

Text content (Harry Potter) and attributes (category=”cooking”) are allowed for all elements.

XML Syntax

The syntax rules of XML are exceptionally basic and cogent. The standards are anything but straightforward to learn and simple to utilize. Under this part, we are going to simply explore the basic syntax rule for writing an uncomplicated XML document. 

The question is: are you ready? Consider the following example of making a complete XML document

<?xml version = "1.0"?>
<contact-info>
   <name>Anil Kumar</name>
   <company>GreatLearning</company>
   <phone>(91) 987-3679</phone>
</contact-info>

You can see there are two sorts of data in the above model – 

  • Markup, as <contact-info> 
  • The text, or the character information, Great Learning and (91) 987-3679. 

Self – Describing Syntax:

XML has a very self-descriptive syntax.

A prologue specifies the XML version as well as the character encoding:

<?xml version=”1.0″ encoding=”UTF-8?>

The following line is the document’s main component:

<bookstore>

The following line begins a <book> element:

<book category=”cooking”>

The elements <book> have four child elements: <title>, <author>, <year>, and <price>

<title lang=”en”>Two States</title>
<author>Chethan Bhagath</author>
<year>2003</year>
<price>200.00</price>

The following line brings the book element to a close:

</book>

Example of XML Document:

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="novel">
    <title lang="en">Two States</title>
    <author>Chetan Bhagath</author>
    <year>2005</year>
    <price>300.00</price>
  </book>
  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>295.99</price>
  </book>
  <book category="web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>339.95</price>
  </book>
</bookstore>

XML syntax alludes to the principles that decide how an XML application can be composed. The XML syntax is extremely direct. Thereupon, this makes XML exceptionally simple to learn. The following are the central matters to recollect while making XML script.

  • XML components/elements must have an end tag. 
  • XML labels/tags are case touchy. 
  • All XML components must be appropriately nested. 
  • All XML scripts must have a root component. 
  • Attributes esteem should consistently be cited.

XML Documents

As characterized in this particular, a data object becomes an XML document once it is well-formed. A very much formed XML document may, moreover, be legitimate if some precise requirements are met. Physical structure and logics exist in every single XML document. Actually, the document is formed of divisions that are named substances (entities). An entity may allude to different entities to push their integration and consideration in the document. That said, a document begins in a document entity. Coherently, the document is made out of declaration, component or elements, comments, character references, and processing instruction, which are all shown by explicit markup. Properly, the physical structure and logic must nest ultimately. 

For every XML document, it must have a solitary tag-pair to characterize a root element. All different elements must be inside this root element. Also, all elements can now have sub-elements. The so-called sub-elements must be appropriately nested inside their parent element. 

Now, take a look at this example:

<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>

XML Document Rules

In the event that you’ve seen HTML documents, you’re acquainted with the essential ideas of utilizing tags to markup the content of a document. This segment examines the contrasts between HTML records and XML documents. It goes over the essential principles of XML documents and talks about the phrasing used to depict them.

One significant point about XML documents is: The XML detail requires a parser to dismiss any XML document that doesn’t adhere to the fundamental principles. Virtually all HTML parsers will acknowledge messy markup, thereby, making a theory with respect to what the developer of the document proposed. To dodge the approximately organized wreck found in the normal HTML document, the makers of XML chose to uphold document structure from the earliest starting point. 

Note: A parser is a bit of code that endeavors to pursue a document and decipher its substance/contents.

There are three types of XML documents: 

1. Valid Document: Valid documents observe both the XML syntax structure rules and the standards characterized in their DTD or composition (schema). 

2. Invalid Document: Invalid documents don’t keep the syntax structure rules characterized by the XML particular. Once a developer characterizes some certain rules for what a document may contain in a DTD or schema, and the document doest observe those rules of a developer, then, that document remains invalid. 

3. Well-formed documents keep the XML syntax structure rules yet don’t have a DTD or pattern (schema). 

The root component 

Accurately, an XML document must be enclosed in a solitary element. That solitary element is known as the root element, and it encloses all the content and some other elements in the documents. In the accompanying instance, the XML document is enclosed in a solitary element, the <greeting> element. Kindly, notice the document has a remark that is outside the root element; that is totally legitimate.

Are you excited to explore examples? Let us roll on this one

-	<?xml version="1.0"?>
-	<!-- A well-formed document -->
-	<greeting>
-	Hello, World!
-	</greeting>

Here comes a document that doesn’t contain a single root element:

-	<?xml version="1.0"?>
-	<!-- An invalid document -->
-	<greeting>
-	Hello, World!
-	</greeting>
-	<greeting>
-	Namaste, Duniya!
-	</greeting>

An XML parser is designed to dismiss this document, nonetheless, of the data, it might contain. 

Well-Formed XML Documents

An object becomes a well-formed XML document if it possesses the following characteristics:

i. If taken overall, it coordinates the creation named document.

ii. If it plays catch-up with all the well-formedness requirements given in this detail. 

iii. For each of the parsed elements which are referred to in a direct or indirect way in the document is well-formed

 For instance,

 document ::= prolog element 

Coordinating the document creation infers that: 

i. It has at least one or more than one element. 

ii. Or, there is actually one element, called the root, document element, no portion of which shows up in the substance of some other element. For every other element, if the beginning tag is in the substance of another element, the end-tag is in the substance of a similar element. All the more just expressed, the elements, delimited by the beginning-and end-tags, nest appropriately inside one another. 

As an outcome of this, for each non-root component X in the document, there is one other element Y in the document with the end goal that X is in the substance of Y, however, isn’t in the substance of whatever other elements that are in the substance of Y. Y is alluded to as the parent of X, and X as an offspring of Y.

XML data consists of an essential unit called an XML document, and this particular unit is made out of elements, plus another markup in an old package. In precise, an XML document has a wide vast assortment of data. For instance, raw data of numbers, numbers in the textual representation of molecular-structure, or numerical equations. 

The following fully display sections of an XML document

1. Document Prolog Section: This singular part of the document reign at the top of the document, before the document element (root element). It contains the XML declaration and Declaration type of a document. 

2. Document Elements Section: The document elements are the backbone of XML. It divides the document into some sort of segment, each filling a particular need. Therein, you can isolate a document into numerous segments; hence, they can be delivered in an unexpected way, or utilized through a search engine. The elements can be described as containers with a mix of texts, and different elements.

XML Document Example

<?xml version = "1.0"?>		  Document Prolog	
<contact-info>
   <name>Anil Kumar</name>
   <company>GreatLearning</company> 		Document Elements
   <phone>(91) 987-3679</phone>
</contact-info>

XML Declaration

The XML declaration demonstrates that the document is written in XML and determines which variant of XML. The XML declaration, whenever included, must be on the first line of the document. Likewise, the XML declaration can indicate the language encoding for the document (discretionary) and if the application alludes to external entities (discretionary). For example, let us determine that the document utilizes UTF-8 encoding (in spite of the fact that we don’t generally need to as UTF-8 is the default), and we indicate that the document alludes to external entities by utilizing standalone=”no”. This isn’t an independent document as it depends on an external resource (for example the DTD). Despite the fact that the XML declaration is discretionary, the W3C suggests that you remember it for your XML documents. Regardless, you will need the XML declaration to effectively validate your document.

Virtually all XML documents begin with an XML declaration that gives fundamental data about the document to the parser. An XML declaration is suggested, however not needed. Whenever it is being used in the document, it must be the first thing. The declaration shall have up-to three name-value sets (some people call the name-value attributes, albeit actually, they’re most certainly not). The adaptation is the rendition of XML utilized; at present, this worth must be 1.0. The encoding is the character set utilized in this document. The ISO-8859-1 character set referred to in this statement incorporates the entirety of the characters utilized by most Western European dialects. In the event that no encoding is indicated, the XML parser expects that the characters are in the UTF-8 set, a Unicode standard that underpins essentially every character and ideograph from the world’s dialects.

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> 

At last, standalone, which is either yes or no, characterizes whether this document can be handled without perusing some other files. For instance, if the XML document doesn’t reference some other documents, you would determine standalone=”yes”. On the off chance that the XML document references different records that depict what the document can enclose (more about those files in a moment), you could indicate standalone=”no”. Since standalone=”no” is the default, you infrequently observe independence in XML declarations.

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>

Rules of governing XML Declaration

An XML declaration ought to follow the accompanying guidelines: 

i. Once the XML declaration is available in the XML, it must be put as the mainline in the XML document. 

ii. If the XML declaration is incorporated, it must have an attribute version number. 

iii. Case sensitive is the name of the parameters and values and must start with “<?xml>” where “xml” is written in lower-case.

iv. The names are consistently contained in lower case. 

v. The request for putting the parameters is significant. The right request is version, encoding, and standalone. 

vi. Either single or twofold statements (quotes) may be utilized. 

vii. The XML declaration has no end tag, for example </?xml>

viii. An encoding can be overrun by an HTTP protocol that you included in the declaration of XML.  

XML Declaration Examples 

The steps:

1. XML declaration with no parameter:

<?xml>

2. XML declaration with version definition:

<?xml version=”1”>

3. XML declaration with all parameters defined: 

<?xml version=”1” encoding=”UTF-8” standalone=”no” ?>

4. XML declaration with all parameters defined in single quotes:

<?xml version=’1’ encoding=’iso-8859-1’ standalone=’no’ ?>

Root Element

For every XML document, it must contain one root element ONLY. Eventually, other root elements will be situated within the one root element. 

For instance 

<root>
  <child>Data</child>
  <child>More Data</child>
</root>

XML statement without any boundaries

As stated above, an XML Declaration shows up as the principal line of an XML document. Its utilization is discretionary. Find the below example of declaration: encoding demonstrates how the individual pieces relate to a character set, version demonstrates the XML version, and standalone shows whether an external sort definition must be counselled so as to accurately handle the document. 

An XML document can alternatively be written as follows –

<?xml version = "1.0" encoding = "UTF-8"?>

Note: In the above, version is the XML version and encoding describes the character encoding used inside the document.

XML Tags

Evidently, the XML tags are one of the most significant parts of XML. Tags establish and building rock of XML. They characterize the extent of a component in XML. They can likewise be utilized to embed comment, declare settings needed for parsing the environment, and to embed uncommon instructions.

That said, we can extensively classify XML tags as follows: 

  1. Start tag: The start of each non empty XML element is set apart by a start tag. Consider an example below: 
 <address>
  1. End tag: Each element that has start-tag must have an end tag. Consider an example below:
</address>

Note:  The end-tags incorporate a solidus (“/”) right before the name of an element.

  1. Empty tag: When a text appears between the start-tag and eng-tag, it is called content. An element is called empty when it has no content. An empty-tag can be written in the following ways: 
  • A start tag quickly followed by an end-tag: <hr></hr>
  • A total empty element tag: <hr />

Empty-element tags might be utilized for any component which has no content. 

Elements/Tags 

Elements are demarcated with < and >. Like we said, element names are case sensitive and can’t incorporate spaces (the full character set can be found in the specification). Therefore, attributes can be included as space-isolated name or value pairs with values encased in quotes. (either single or double quotes). 

<sometag attrname="attrvalue"> 

The structure of XML

In addition to text, elements may also contain different elements. 

• Start-tag starts with “<” and end with “>”. 

• End-tag starts with “<” and end with “>”. 

• Empty tags (for example tags with no content, and the start-tag is quickly trailed by an end-tag) can on the other hand be spoken to by a single-tag. These empty-tags start with “<” and end with “/>”. As such, empty-tags are handwriting. For instance: <br><br> is equivalent to <br/>. This means that, while changing HTML over to XHTML, all <br> tags must be in both of the permitted types of the empty tags. 

• Every start-tag must contain an end-tag and should be appropriately nested. For instance, coming up next isn’t very much formed, since it isn’t appropriately nested.

<x><a>mmm<b>mmm</a>mmm</b></x>

Below is well-formed: 

<x><a>mmm<b>mmm</b></a><b>mmm</b></x> 

To Do:

Most current HTML web-browsers can effectively deal with inappropriately nested documents. Is this piece of the HTML detail? Attempt to discover more about the likenesses and contrasts among XML and HTML tags. End tags can’t be left out. In the example beneath, the markup isn’t legitimate on the grounds that there are no end section ( </p>) tags. While this is worthy in HTML (and, at times, SGML), a XML parser will dismiss it. 

  1. <!– NOT legal XML markup –>
  2. <p>Yada yada yada…
  3. <p>Yada yada yada…
  4. <p>…

In the event that a component contains no markup at all it is called an unfilled component; the HTML break ( <br>) and picture ( <img>) components are two models. In empty elements in XML documents, you can place the end cut in the start-tag. The two break elements and the two images elements underneath mean something very similar to a XML parser:

  1. <!– Two equivalent break elements –>
  2. <br></br>
  3. <br />
  4. <!– Two equivalent image elements –>
  5. <img src=”../img/c.gif”></img>
  6. <img src=”../img/c.gif” />

XML Elements

A XML document is organized by a few XML elements, additionally called XML-nodes or XML tags. The names of XML-elements are encased in triangular brackets < > as appeared below

<element>

Syntax Rules for Elements and Tags

Element syntax: Each XML element should be closed either with the start elements or end elements as appeared below

<element>....</element> 

or on the other hand in basic cases, simply along these lines − 

<element/> 

Elements nesting: XML element may contain various XML elements as its kids, however, the kids elements must not in any way over-lap – that’s an end-tag of an element must contain a similar name as that of the latest unrivaled start-tag.    

Below example shows inaccurate nested tags:  

<?xml version = "1.0"?>
<contact-info>
<company>GreatLearning
</contact-info>
</company>


Below example shows accurate nested tags:

<?xml variant = "1.0"?> 
<contact-info> 
<company>GreatLearning</company> 
<contact-info> 

Root Element 

An XML document can contain just one root element. For instance, an example given below isn’t correct XML-document, in light of the fact that both a and b elements happen at the high level without a root element.

<a>...</a>
<b>...</b>

The correct syntax is as follows 

<root>
   <a>...</a>
   <b>...</b>
</root>

Case Sensitivity: XML-elements names are case sensitive. The names of XML-components are case-touchy. That implies the name of the start-elements and end-elements should actually be in a similar case. 

For instance, <contact-info> is not the same as <Contact-Info> 

As rightly stated above, XML is case sensitive.  XML is a case sensitive language.

For instance:

This is correct

<from>Deepak</from>

This is incorrect

The first letter of the start-tag is in small letter, while the first letter of the end-tag is in capital letter, and hence, this is an incorrect/invalid XML. 

<from>Deepak</From>

Root Element is mandatory in XML: XML-document must contain a root-element. A root-element can contain child-elements have and sub-child elements. 

For instance: In the accompanying XML-document, <message> is the root-element and <to>, <from>, <subject> and <text> are child elements. 

<?xml version="1.0" encoding="UTF-8"?>
<message>
   <to>Anuj</to>
   <from>Deepak</from>
   <subject>Message from teacher to Student</subject>
   <text>You have an exam tomorrow at 8:00 AM</text>
</message>


The accompanying XML document is invalid, for there exists no root-element.

 
 <?xml version="1.0" encoding="UTF-8"?>
<to>Anuj</to>
<from>Deepak</from>
<subject>Message from teacher to Student</subject>
<text>You have an exam tomorrow at 8:00 AM</text>

XML elements must contain an end-tag. Most XML documents must contain an end-tag.

<text classification = message>hello</text> - >correct 
<text classification = message>hello - >wrong 

It’s invalid to discard the end-tag when you’re making XML syntax. XML-elements must contain an end tag.

Invalid syntax:

<body>See Spot run. 
<body>See Spot get the ball. 

Valid syntax:
 
<body>See Spot run.</body> 
<body>See Spot get the ball.</body>

Element Type Declarations

An element declaration type takes the form. The element-structure of an XML-document can, for approval intentions, be obliged utilizing element-type and attribute list declaration. An element-type declaration obliges the element’s declarations. Element-type declarations regularly compel which element types can show up as children of the element. At user choice, an XML processor MAY give an admonition when a declaration makes reference to an element-type for which no declaration is given, however, this isn’t a mistake.

Find examples of element type declarations:

<!ELEMENT br EMPTY>
<!ELEMENT p (#PCDATA|emph)* >
<!ELEMENT %name.para; %content.para; >
<!ELEMENT container ANY>

Element-type mustn’t be declared more than one time.

Element Content

An element-type has element content when elements of that type SHOULD have only child elements, alternatively isolated by white-space (characters matching the non-terminal S). Definition: For this situation, the limitation incorporates a content model, a Basic English structure influencing the permitted types of the child elements and the order in which they are permitted to appear. 

The grammar is built-on content particles, which has names, choice-lists-of-content-particles, or sequence-lists-of-content-particles:

Element-content Models

  • children   ::=   (choice | seq) (‘?’ | ‘*’ | ‘+’)?
  • cp   ::=   (Name | choice | seq) (‘?’ | ‘*’ | ‘+’)?
  • choice   ::=   ‘(‘ S? cp ( S? ‘|’ S? cp )+ S? ‘)’

Proper Group/PE Nesting

  • seq   ::=   ‘(‘ S? cp ( S? ‘,’ S? cp )* S? ‘)’

 Proper Group/PE Nesting

Where each Name is the kind of an element which may show up as a child. Because, any content-particle in a choice-list may show up in the element-content at the area where the choice list shows up in the grammar; content particles happening in a succession list MUST each show up in the element-content in the order given in the list. The discretionary character following a name or list administers whether the element or the content particles in the list may happen at least one or more (+), at least zero or more (*), or zero or one times (?). The absence of such an operator implies that the element or content particle MUST show up precisely once. This syntax and meaning are indistinguishable from those utilized in the productions in this specification. The content of an element coordinates a content-model if and just on the off chance that it is conceivable to follow-out a way through the content-model, complying with the sequence, decision, and repetition operators and marching every element in the content against an element-type in the content-model. For similarity, it is a mistake if the content-model permits an element to match more than one occurrence of an element-type in the content-model.

XML Attributes

Generally, an attribute determines a solitary property for the element, using a value pair. An XML element can have at least one or more attributes. For instance − 

<a href = "http://www.greatlearning.com/">Greatlearning!</a> 

Here href is the quality name and http://www.greatlearning.com/ is attribute value. 

Normally, attribute names are portrayed without quotes. In the same vein, attribute values ought to reliably appear in the quotes. The following example displays invalid xml linguistic-structure 

<a b = x>....</a> 

In the above accentuation, the property assessment isn’t portrayed in quotes. 

Talking about XML attributes, let us see the sentence structure of properties. An underlying-tag in XML can have credits, the traits are name and worth sets. 

Check this out!

  • The trait-names are case-sensitive and shouldn’t be in quotes.  
  • The trait-esteems must be in at one or double reference. 
<text grouping = "message">You have a test tomorrow at 8:00 AM</text> 

Here grouping is the quality name and message is the property assessment. 

Let us take hardly additional guides for see authentic and invalid cases of qualities. 

A tag can at least contain or more name and worth sets, at any rate two property names cannot be same.

  1. <text class = message>hello</text> – >wrong 
  2. <text “class” = message>hello</text> – >wrong 
  3.  <text class = “message”>hello</text> – >correct 
  4. <text class = “message” reason = “greet”>hello</text> – >correct 
  5. <text class = “message” classification =”greet”>hello</text> – >wrong

XML Attributes Syntax Rules

  • Unlike HTML, attribute names in XML are case-sensitive, i.e. HREF and href are viewed as two distinctive XML attributes.
  • In syntax, two values can’t have the same attributes. The accompanying example shows invalid syntax in light of the fact that the attribute b is indicated twice.  
<a b = "x" c = "y" b = "z">....</a>

Attribute names are characterized without quotes, while quality qualities should consistently show up in quotes. Following model exhibits wrong xml linguistic structure 

<a b = x>....</a> 

In the above punctuation, the property estimation isn’t characterized in quotes. 

Rule: Attribute should always be quoted

It is not proper to discard quotes marks attribute values. Additionally, XML elements should have attributes in name/value pairs: in any case, the attribute-value should consistently be quoted. 

Invalid syntax: 
<?xml version= “1.0” encoding=“ISO-8859-1”?>
<note date=02/02/02>
<to>Deepak</to>
<from>Spoorthi</from>
</note> 
Valid syntax:
<?xml version= “1.0” encoding=“ISO-8859-1”?>
<note date=”02/02/02”>
<to>Deepak</to>
<from>Spoorthi</from>
</note> 

It will make a wrong document; the date attribute in the note isn’t quoted. 

Declarations of Attribute-List 

Attributes can be used to relate name-value pairs with elements. Specifications of attribute mustn’t appear outside of start tags and empty tags; consequently, the productions used to remember start tags, end tags, and empty element tags.  

Attribute list declaration 

• To characterize the set of attributes relating to a given element-type. 

• To set up type constraints for these attributes. 

• To give default esteems to attributes. 

Attribute list declarations determine the name, data-type, and default-value (if any) of each attribute related with a given element-type. 

Attribute List Declaration Example

  • AttlistDecl ::= ‘<!ATTLIST’ S Name AttDef* S? ‘>’ 
  • AttDef ::= S Name S AttType S DefaultDecl 

The Name in the AttlistDecl rule is the kind of an element. At user-choice, an XML-processor may give a warning if attributes are declared for an element-type not itself declared, but rather this isn’t a blunder. The Name in the AttDef rule is the name of the attribute. 

  • AttlistDecl   ::=   ‘<!ATTLIST’ S Name AttDef* S? ‘>’
  • AttDef   ::=   S Name S AttType S DefaultDecl

Assume that, when at least one or more AttlistDecl is provided with a given element-type, the contents of each element type provided will be merged. Again, when at least one or more definition is provided for a similar attribute of a given element-type, the first-declaration is mandatory and the subsequent declaration is disregarded. For flexibility, the coders of DTDs can decided to give at-most one attribute list declaration for a given attribute-name, at-most one attribute definition for a given attribute-name in an declaration of attribute-list, plus at-least one attribute definition in every attribute-list declaration. More so, for flexibility, an XML-processor may at user choice issue a cautioning when more-than one attribute-list declaration is provided for a given element-type, or one or more attribute-definition is provided for a given attribute, yet this isn’t a blunder.      

Types of Attributes 

XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types are more constrained. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3.3 Attribute-Value Normalization.

We have three types of XML attributes namely: 

  1. String type: The string-type may accept any literal-string as a value.
  2. Set of tokenized type: This particular type of attribute is more obliged, however, constrained.   The validity obligations noted in the grammar are applied after the attribute-value has been standardized.  
  3. Enumerated types: 

AttType   ::=   StringType | TokenizedType | EnumeratedType

StringType   ::=   ‘CDATA’

TokenizedType   ::=   ‘ID’

Attribute-declaration gives data on whether the attribute’s essence #REQUIRED, and if not, how an XML-processor is to respond once an attribute declared is missing in a document. 

Attribute Defaults

  • DefaultDecl ::= ‘#REQUIRED’ | ‘#IMPLIED’ 

In an attribute-declaration, #REQUIRED implies that the attribute must consistently be given; #IMPLIED that no default value is given. [Note: If the declaration is neither #REQUIRED nor #IMPLIED, at that point the AttValue value contains the declared default-value; the #FIXED main-word expresses that the attribute must consistently have the default value. At the point when an XML processor experiences an element without a particular for an attribute for which it has perused a default value-declaration, it should report the attribute with the declared default and value to the application.

Value Normalization of Attribute

Right before a certain value of an attribute is moved to the application or crosschecked for accuracy (validity), the XML-processor should normalize the attribute value by applying the algorithm underneath, or by utilizing some other technique with the end goal that the value passed to the application is equivalent to that delivered by the algorithm. 

  1. All-line breaks should have been normalized on input to #xA.
  2. Start with a normalized-value comprising of the unfilled (empty) string.
  3. Every character, entity-reference or character-reference in the un-normalized attribute-value, starting with the first and preceding to the last do the accompanying:   

– For instance, character-reference, add the referred (referenced) character to the normalized value. 

– Again, for entity-reference, recursively apply stage 3 of this algorithm to the substitution of the text of the entity.

– Also, for a white-space-character (#x20, #xD, #xA, #x9), append a space-character (#x20) to the normalized-value. 

  • Finally, add the character to the normalized-value. 

On the off chance that the attribute-type isn’t CDATA, at that point the XML-processor must farther deal with the normalized-attribute value by disposing of any leading and trailing space (#x20) characters, and by supplanting sequences of space (#x20) characters by a single-space (#x20) character. 

Take note: If the un-normalized attribute-value has a reference character to a white-space character other-than space (#x20), the normalized-value has the reference character itself (#xD, #xA or #x9). This differences with the situation where the un-normalized value has a white-space character (not a reference), which is supplanted with a space-character (#x20) in the normalized-value and furthermore appears differently in relation to the situation where the un-normalized-value has a entity-reference whose substitution text has a white-character; being recursively processed, the white-space character is supplanted with a space character (#x20) in the normalized-value. 

Eventually, all attributes for which no declaration has thoroughly been perused must be treated by a non validating XML-processor as though declared by CDATA. It is, however, a huge mistake if an attribute value has a reference to an entity for which no declaration has been perused. Following are instances of attribute normalization. Given the accompanying declaration: 

<!ENTITY d “&#xD;”>

<!ENTITY a “&#xA;”>

<!ENTITY da “&#xD;&#xA;”>

The attribute specifications in the left column beneath would be normalized to the character sequences of the center column if the attribute a is declared NMTOKENS and to those of the right columns if a is declared CDATA.

Attribute specificationa is NMTOKENSa is CDATA
a=” xyz”x y z#x20 #x20 x y z
a=”&d;&d;A&a;&#x20;&a;B&da;”A #x20 B#x20 #x20 A #x20 #x20 #x20 B #x20 #x20
a=”&#xd;&#xd;A&#xa;&#xa;B&#xd;&#xa;”#xD #xD A #xA #xA B #xD #xA#xD #xD A #xA #xA B #xD #xA

Another thing to notice is: The previous example isn’t correct/invalid (but rather well-formed), if a is declared to remain type of NMTOKENS. 

Special Attributes

An element-tag may show extra-properties for its contents. For instance, xml:space is utilized to show if white-space is critical. When all is said in done, it is accepted that all white-space outside of the tag-structure is critical. 

Another special attributes is xml:lang which can be utilized to show the language of the content. For instance: 

<p xml:lang=”en”>I do not speak</p> Hindi

<p xml:lang=”es”>Main nahin bolata</p> Hindi

Attributes must have quoted values

There are two rules for attributes in XML documents:

• Attributes MUST CONTAIN values.

• Those values MUST BE enclosed within quotation marks.

Compare the two examples below. The markup at the top is legal in HTML, but not in XML. To do the equivalent in XML, you have to give the attribute a value, and you have to enclose it in quotes. Look at the two examples beneath. The mark-up at the top is valid in HTML, yet not in XML. To do the identical in XML, you need to give the attribute a value, and you need to encase it in “quotes”. 

  1. <!– NOT legal XML markup –> Example 1
  2. <ol compact>
  3. <!– legal XML markup –> Example 2
  4. <ol compact=”yes”>

You can utilize either single or double quotes, similarly insofar as you’re consciously steady. In the event that the attribute has a single or double quote, you could utilize the other sort of quote to encompass the value (as in name=”Deepak’s vehicle”), or utilize the elements &quot; for a double quote and &apos; for a single-quote. An entity is a symbol, for example, &quot;, that the XML parser replaces with other text, for example, “.

We might not have fully covered in details the concept of DTDs and how it works, yet there’s one more essential topic to cover here: Defining attributes. You can characterize attributes for the elements that will show up in your XML-document. Using an DTD, you can likewise: 

• Interpret which of the attributes are required. 

• Interpret default values for attributes. 

• List the entirety of the valid values for a given attribute.

Assume that you need to change the DTD to make state an attribute of the <city> element. Here’s the means by which to do that: 

1

2 <!ELEMENT city (#PCDATA)>

<!ATTLIST city state CDATA #REQUIRED>

This characterizes the <city> element as in the past, yet the reviewed example additionally utilizes an ATTLIST declaration to list the attributes of the elements. The name city inside the attribute-list tells the parser that these attributes are characterized for the <city> element. The name-state is the name of the attribute, and the watchwords CDATA and #REQUIRED tell the parser that the state attribute contains text and is required (if it’s discretionary, CDATA #IMPLIED will work). 

To characterize various attributes for an element, compose the ATTLIST like this: 

<!ELEMENT city (#PCDATA)>

<!ATTLIST city state CDATA #REQUIRED

               postal-code CDATA #REQUIRED>

The above example characterizes both state and postal-code as attributes of the <city> element. 

At last, DTDs permit you to characterize default values for attributes and identify the entirety of the correct values for an attribute: 

<!ELEMENT city (#PCDATA)>

<!ATTLIST city state CDATA (AZ|CA|NV|OR|UT|WA) “CA”>

To cap it all, the example here demonstrates that it just backings addresses from the conditions of Arizona (AZ), California (CA), Nevada (NV), Oregon (OR), Utah (UT), and Washington (WA), and that the default state is California. Consequently, you can do a restricted type of data-validation. While this is a valuable function, it’s a little subset of how you can deal with XML-schemas. 

XML Comments

Comments may show up anyplace in a document outside other mark-up; moreover, they may show up inside the document-type declaration at places permitted by the grammar.  They’re not part of the document’s character data; an XM- processor may, yet needn’t, make it feasible for an application to recover the text of comments. For similarity, the string ” – ” (double-hyphen) mustn’t happen inside comments.] Parameter substance references mustn’t be perceived inside comments. 

Comment   ::=   ‘<!–‘ ((Char – ‘-‘) | (‘-‘ (Char – ‘-‘)))* ‘–>’

This is another means by which a comment should look-like in XML-document.

<!– This is just a comment –>

A case for a comment: 

  • <!– declarations for <head> & <body> –>
  • Note that the grammar doesn’t permit a comment ending-in — >. The accompanying example isn’t well framed. 

– <!– B+, B, or B—> 

Comments can show up anyplace in the document; they can even show up before or after the root element. A comment starts with <!- – and closes with – >. A comment cannot contain a double hyphen ( — ) aside from toward the end; with that special case, a comment can contain anything. Above all, any mark-up inside a comment is overlooked; only if you need to eliminate a huge section of a XML-document, essentially enclose that section by a comment. (To reestablish the commented-out section, essentially eliminate the comment tags.)  

Here comes a mark-up that contains a remark: 

1

2 <!– Here’s a PI for Cocoon: –>

<?cocoon-process type=”sql”?>

XML Character Entities

Entities

Anyplace the XML processor finds the string &dw;, it replaces the entity with the string developerWorks. The XML-spec additionally characterizes five entities you can use instead of different special characters.

An entity reference must not contain the name of an unparsed entity. Unparsed entities maybe referred to just in attribute values declared to be of type entity or entities.

The entities are: 

• &lt; for the less than sign 

• &gt; for the greater than sign 

• &quot; for a double-quote 

• &apos; for a single quote (or apostrophe) 

• &amp; for an ampersand.

Character and Entity References

A character reference alludes to a particular character in the ISO/IEC 10646 character set, for instance one not straightforwardly open from accessible info devices. 

Character Reference

   CharRef   ::=   ‘&#’ [0-9]+ ‘;’

| ‘&#x’ [0-9a-fA-F]+ ‘;’

Well-formed-ness limitation: Legal Character 

Characters alluded to utilizing character references MUST match the production for Char. 

On the off chance that the character reference starts with ” &#x “, the digits and letters up to the ending ; give a hexadecimal representation of the character’s code point in ISO/IEC 10646. Again, if it starts just with ” &# “, the digits up to the ending ; give a decimal representation of the character’s code point. 

Entity reference: An entity reference alludes to the content of a named entity. References to parsed general elements use ampersand (and) and semicolon (;) as delimiters. Parameter entity references use percent-sign (%) and semicolon (;) as delimiters.

 Entity Reference
Reference    ::=    EntityRef | CharRef

EntityRef    ::=    '&' Name ';'
[WFC: Entity Declared]
[VC: Entity Declared]
[WFC: Parsed Entity]
[WFC: No Recursion]
PEReference    ::=    '%' Name ';'
[VC: Entity Declared]
[WFC: No Recursion]
[WFC: In DTD]
Case 1:  Character and entity references example
Type <key>less-than</key> (<) to save options.
This document was prepared on &docdate; and is classified &security-level;.
Case 2: Parameter-entity reference example
<!-- declare the parameter entity "ISOLat2"... -->
<!ENTITY % ISOLat2
         SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" >
<!-- ... now reference it. -->
%ISOLat2;

 

CDATA Sections

CDATA Sections: CDATA sections may happen anyplace where character-data may happen; they are utilized to get away from blocks of text containing characters which would somehow or be perceived as mark-up. The sections of CDATA start with the string ” <![CDATA[ ” and end with the string ” ]]> “:].

CDATA Sections:

  1. CDSect   ::=   CDStart CData CDEnd
  2. CDStart   ::=   ‘<![CDATA[‘
  3. CData   ::=   (Char* – (Char* ‘]]>’ Char*))
  4. CDEnd   ::=   ‘]]>’

Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using ” &lt; ” and ” &amp; “. CDATA sections cannot nest.

Inside a CDATA-section, just the CDEnd string is perceived as mark-up, so that left angle brackets and ampersands may happen in their exacting form; they needn’t (and can’t) be avoided utilizing ” &lt; ” and ” &amp; “. CDATA sections can’t nest.

Consider an example of a CDATA sections, ” <greeting> ” and ” </greeting> ” are perceived as character-data, not mark-up: 

<![CDATA[<greeting>Hello, world!</greeting>]]> 

The CDATASection object 

The CDATASection object represents a CDATA-segment in a document. A CDATA-section contains text that won’t be parsed by a parser. Tags within a CDATA-section won’t be treated as mark-up and elements won’t be extended. The basic role is for including material, for example, XML-fragments, without expecting to get away from all the delimiters. 

The main delimiter that is perceived in a CDATA area is “]]>” – which demonstrates the finish of the CDATA section. CDATA areas can’t be nested.

Processing XML

The processing instructions start with <? and, end with ?>. Processing instructions are guidelines for the XML-processor. Processing instructions aren’t incorporated with the XML-recommendation. Or maybe, they’re processor-dependant so not all processors see all processing instructions. Our example is a typical processing-instruction that numerous processors understand. The instructions to the processor are to utilize an external style-sheet. 

Processing Instructions: Processing directions (PIs) permit documents to contain instructions for applications. 

Processing Instructions Example

  • PI   ::=   ‘<?’ PITarget (S (Char* – (Char* ‘?>’ Char*)))? ‘?>’
  • PITarget   ::=   Name – ((‘X’ | ‘x’) (‘M’ | ‘m’) (‘L’ | ‘l’))

Processing instructions (PIs) aren’t part of the document’s character-data, but rather must be gone through to the application. The PI starts with a target (PITarget) used to recognize the application to which the instruction is directed. The target names ” XML “, ” xml “,, etc are saved for standardization in this or future versions of this specifications. The XML Notation mechanism maybe utilized for formal declaration of PI targets. Parameter entity references mustn’t be perceived inside processing instructions (PIs).

White Space XML

White-space is essentially clear/blank space made via carriage returns, line feeds, tabs, or potentially spaces. White-space doesn’t influence the processing of the document, so you can decide to incorporate white-space or not. Actually, the XML recommendation determines that XML-documents utilize the UNIX convention for line endings. This implies that you should utilize a linefeed character just (ASCII code 10) to indicate the end of a line. 

Discussing white-space, there’s an special attribute (xml:whitespace) that you can use to preserve white-space inside your elements (however we won’t fret about that a few seconds ago). 

White-spaces are preserved in XML

Dissimilar to HTML that doesn’t preserve white-space, the XML-document preserves white-spaces.    

White Space Handling

In altering XML-documents, it is frequently advantageous to utilize “white-space” (spaces, tabs, and blank lines) to separate the mark-up for more prominent readability. Such white-space is ordinarily not proposed for inclusion in the delivered version of the document. Then again, “huge” white-space that ought to be preserved in the delivered form is normal, for instance in poetry and source code.

An XML-processor must consistently pass all characters in a document that aren’t mark-up through to the application. A validating XML-processor must likewise inform the application which of these characters constitutes white-space appearing in element content. An exceptional attribute named xml:space maybe joined to an element to single an expectation that in that element, white-space ought to be preserved by applications. In correct documents, this attribute, similar to some other, must be declared if it’s used. At the point when declared, it must be given as a counted type whose values are either of “default” and “preserve”.

For instance: 

<!ATTLIST poem  xml:space (default|preserve) ‘preserve’>

<!ATTLIST pre xml:space (preserve) #FIXED ‘preserve’>

The value “default” signals that applications’ default white-space processing-modes are worthy for this element; the value “preserve” shows the purpose that applications preserve all the white-space. This declared goal is considered to apply to all elements inside the content of the element where it is specified, except if superseded with another example of the xml:space property. This determination doesn’t offer significance to any estimation of xml:space other than “default” and “preserve”. It is a blunder for different specification to be specified; the XML-processor may report the mistake or may recoup by overlooking the attribute specification or by reporting the (mistaken) value to the application. Wrong values may be overlooked or rejected by application.

Encoding XML

Encoding is the way toward converting unicode characters into their identical binary representation. At the point when the XML-processor peruses a XML-document, it encodes the document contingent upon the type of encoding. Consequently, we have to indicate the type of encoding in the XML declaration. 

Types of encoding 

There are essentially two types of encoding:

  1. UTF-8
  2. UTF-16

UTF represents UCS Transformation Format, and UCS itself implies Universal Character Set. The number 8 or 16 alludes to the number of bits used to represent a character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents without encoding data, UTF-8 is set by default.

Validation in XML

Validation is defined as a process by which an XML-document is validated. An XML-document is said to be valid if its contents coordinate with the elements, attributes and related-document type declaration (DTD), and if the document conforms to the limitations expressed in it. Validation is managed in two different ways by the XML parser. 

They are:

  1. Well-formed XML document
  2. Valid XML document

Well-formed XML Document: An XML document is supposed to be well-formed in the event that it clings to the accompanying guidelines:

  • Non-DTD XML files must utilize the predefined character entities for amp(&), apos(single quote), gt(>), lt(<), quot(double quote).
  • It also must follow the ordering of the tag. i.e., the internal tag must be encased prior to the shutting the external tag. 

Every one of its starting tag must’ve an end tag or it must be a self-ending tag. (<title>….</title> or <title/>). 

It must’ve just one attribute in a start-tag, which should be quoted. 

amp(&), apos(single quote), gt(>), lt(<), quot(double quote) entities other than these must be declared.

Example:

Following is a case of a well-formed XML-document: 
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address
[
   <!ELEMENT address (name,company,phone)>
   <!ELEMENT name (#PCDATA)>
   <!ELEMENT company (#PCDATA)>
   <!ELEMENT phone (#PCDATA)>
]>
<address>
   <name>Deepak Kumar</name>
   <company>GreatLearning</company>
   <phone>91 123-4567</phone>
</address>

The above example is said to be well-formed as –

  • It characterizes the type of document. Here, the document-type is an element-type.
  • It incorporates a root-element named as address.

Every one of the kid elements among name, company and phone is encased in its self simple-tag.

Maintained is the order of the tags.

Valid XML Document: 

In the event that an XML document is well-formed and has a related Document Type Declaration (DTD), at that point it is supposed to be a valid XML document. 

XML Namespaces

In XML, the names of the tags used are defined by the developer. While mixing the XML documents from different XML applications, this naming might result in conflicts. So, XML namespaces provide a method to avoid this issue of element name conflicts.

Name Conflict Example:

The following XML code carries information of HTML table:

<table>
   <tr>
     <td>Table</td>
     <td>Chair</td>
   </tr>
</table>

The following XML code carries the information about a table (Shape):

<table>
   <name>Rectangle</name>
   <length>100</length>
   <width>60</width>
</table>

If the above XML code fragments were to be added together, it would result in a name conflict as both contain an element , but the content and meaning of both the elements are different.
An XML application or a user will not be able to know how to handle such differences.

Using Prefix to Solve Name Conflict

Name prefix can be used in XML to avoid name conflicts.

The following code carries the data of both HTML Table and Shape Table:

<t:table>
   <t:tr>
     <t:td>Table</t:td>
     <t:td>Chair</t:td>
   </t:tr>
</t:table>
 
 <s:table>
   <s:name>Rectangle</s:name>
   <s:length>100</s:length>
   <s:width>60</s:width>
</s:table>

The example given above will have no conflict as both the <table> elements have different names.

XML Parser

The XML parser is a package or a software library that provides an interface for the applications of clients to work with XML documents. It may validate the XML documents and checks for a proper format for the XML document. Programs use XML with the help of an XML parser.

Types of parsers:

  1. DOM Parser
  2. SAX Parser
  3. JDOM Parser
  4. StAX Parser
  5. XPath Parser
  6. DOM4J Parser

DOM Parser

The Document Object Model (DOM) parser loads the document’s complete contents and creates its entire hierarchical tree in the memory to parse a document. DOM parser is officially recommended by the World Wide Web Consortium (W3C).

Make use of a DOM parser when :

  • A lot of information regarding the structure of a document is required.
  • Movement of parts of an XML document is required.
  • Data in an XML document is to be used more than once

Advantages:

  • API is simple to use.
  • DOM Parser supports both read and write operations.
  • When random access to widely separated parts of a document is required, DOM Parser is preferred.

Disadvantages:

  • As the whole XML document requires to be loaded into memory, DOM Parser consumes excess memory; hence, it is memory efficient.
  • It is slower in comparison to other parsers.

SAX Parser

Simple API for XML (SAX) does not load the complete document in the memory; instead, it parses the document on event-based triggers. No parse trees are created by SAX Parser. SAX is a streaming interface for XML, i.e. that when the XML document being processed an element and attribute, applications using SAX receive event notifications at a time in chronological order, starting from the beginning of the XML document and ending with the closing of the ROOT element.

  • SAX Parser recognizes the tokens that make up a well-formed XML document by reading the XML document from top to bottom.
  • The way the tokens appear in the document, they get processed in that exact order.
  • An “event” handler is provided by the application program that must be registered with the parser.
  • Callback methods in the handler are invoked as the tokens are identified with the relevant information.

Use a SAX Parser when:

  • The XML document is not deeply nested.
  • The XML document can be processed linearly from top to down.
  • A massive XML document is being processed whose DOM tree would be consuming too much memory. (Ten bytes of memory is used to represent one byte of XML while implementing DOM.)
  • Only a part of the XML document is involved while solving the problem.
  • An XML document arrives over a stream as the data is available as soon as the parser sees it.

Advantages:

  • SAX Parser is simple to use and memory efficient.
  • It works well for huge documents.
  • It works very fast.

Disadvantages:

  • Its API is less intuitive as it is event-based.
  • As the data is broken into pieces, the client never knows the complete information.
  • You need to write the code and store the data on your own to keep track of data the parser has seen or change the items’ order.

JDOM Parser

JDOM Parser is a Java developer-friendly API, Java-optimised and uses Java collections like Lists and Arrays. It works along with DOM and SAX APIs, combining the best of the two. It uses less memory and is as fast as SAX.

StAX Parser

It parses in the same way as the SAX parser but in a more efficient manner.

XPath Parser

It parses an XML document based on the expression. It is extensively used in conjunction with XSLT.

DOM4J Parser

It is a java library that uses Java Collections Framework to parse XML, XPath and XSLT. DOM4J parser also provides support for DOM, SAX and JAXP

Text String Parsing:

<!DOCTYPE html>
<html>
<body>

<p id="example"></p>

<script>
var text, parser, xmlDoc;

<!--define text string-->
text = "<mall><shop>" +
"<name>Everyday Items</name>" +
"<item>bucket</item>" +
"<price>50</price>" +
"</shop></mall>";

<--create XML DOM parser-->
parser = new DOMParser();
<--parser creates a new XML DOM object using the text string-->

xmlDoc = parser.parseFromString(text,"text/xml");

document.getElementById("example").innerHTML =
xmlDoc.getElementsByTagName("name")[0].childNodes[0].nodeValue;
</script>

</body>
</html>

XML DTD

Document Type Definition (DTD) defines the legal attributes and elements along with the structure of an XML document. An XML document is well-informed if the syntax is correct, but the XML Document that gets validated against a DTD is both well-informed and valid.

Valid XML Documents
A valid XML document is not only well-informed but also conforms to the rules of a DTD.
Example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "Note.dtd">
<note>
<to>Chanchal</to>
<from>Harshit</from>
<heading>Message</heading>
<body>Hi! How are you doing?</body>
</note>

The DOCTYPE declared above contains a reference to the DTD file whose content has been shown and explained below.

XML DTD

Note.dtd:

<!DOCTYPE note <!—defines the element of the document as note-->
 [
 <!ELEMENT note (to,from,heading,body)> <!—defines note element must contain the elements - to, from, heading and body-->
 <!ELEMENT to (#PCDATA)> <!—defines to element of type ‘#PCDATA’-->
 <!ELEMENT from (#PCDATA)> <!—defines from element of type ‘#PCDATA’-->
 <!ELEMENT heading (#PCDATA)> <!—defines heading element of type ‘#PCDATA’-->
 <!ELEMENT body (#PCDATA)> <!—defines body element of type ‘#PCDATA’-->
 ]>

XML Schema

XML Schema, also known as XML Schema Definition (XSD), is used to describe and validate the structure and content of XML data. It defines attributes, elements and data types. It is similar to DTD but provides more control over the XML structure.

Having the correct syntax makes an XML document well-informed. Being validated against schema means that the XML document is both well-informed and valid.

XML Schema as an alternative to DTD:

<xs:element name="note"> <!--defines the element “note”-->
 
 <xs:complexType> <!--element note is a complex type-->
   <xs:sequence> <!--complex type is a sequence of elements-->
     <xs:element name="to" type="xs:string"/> <!--element “to” is of type string (text)-->
     <xs:element name="from" type="xs:string"/> <!--element “from” is of type string-->
     <xs:element name="heading" type="xs:string"/><!--element “heading” is of type string-->
     <xs:element name="body" type="xs:string"/><!--element “body” is of type string-->
   </xs:sequence>
 </xs:complexType>
 
 </xs:element>

XML Schema Data Types

XML schemas have two types of data types: 

  1. simpleType: It allows you to have text-based elements. It cannot be left empty and contains fewer attributes and child elements.
  2. complexType: You are allowed to hold multiple elements and attributes in complexType. It can be left empty and can have additional sub-elements.

Why are XML Schemas more potent than DTD?

  • XML Schemas are written in XML.
  • XML Schemas are extendible to additions.
  • Data Types are supported by XML Schemas.
  • Namespaces are supported by XML Schemas.

XML DOM

The Document Object Model (DOM) is XML’s foundation. XML documents contain a hierarchy of informative units known as nodes; DOM defines those nodes and their relationships.

A DOM document is a hierarchical collection of nodes or pieces of information. This hierarchy enables a developer to search the tree for specific information. The DOM is said to be tree-based since it is based on a hierarchy of information.

The XML DOM, on the other hand, includes an API that allows a developer to add, modify, move, or remove nodes in the tree at any time throughout the development process.

Example of XML DOM

The sample.html example parses an XML document (“address.xml”) into an XML DOM object and then extracts some information from it using JavaScript.

Contents of sample.html

<!DOCTYPE html>
<html>
   <body>
      <h1>Example for DOM </h1>
      <div>
         <b>Name:</b> <span id = "name"></span><br>
         <b>Company:</b> <span id = "company"></span><br>
         <b>Phone:</b> <span id = "phone"></span>
      </div>
      <script>
         if (window.XMLHttpRequest) 
         {// code for IE7+, Firefox, Chrome, Opera, Safari search engines 
            xmlhttp = new XMLHttpRequest();
         }
         else
         {// code for IE6, IE5 search engines 
            xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
         }
         xmlhttp.open("GET","/xml/address.xml",false); // used to fetch data
         xmlhttp.send();
         xmlDoc = xmlhttp.responseXML;
 
         document.getElementById("name").innerHTML=
            xmlDoc.getElementsByTagName("name")[0].childNodes[0].nodeValue;
         document.getElementById("company").innerHTML=
              
         xmlDoc.getElementsByTagName("company")[0].childNodes[0].nodeValue;
         document.getElementById("phone").innerHTML=
            xmlDoc.getElementsByTagName("phone")[0].childNodes[0].nodeValue;
      </script>
   </body>
</html>

Content in address, the XML file is as follows:

<?xml version = "1.0"?>
<contact-info>
   <name>Karuna </name>
   <company>Cerner</company>
   <phone>(91)8364682929</phone>
</contact-info>

Now, retain these two files, sample.html and address.xml, in the same directory /XML and run the sample.html file in any browser. This should result in the output shown below.

Output:

Example for DOM

Name: Karuna
Company: Cerner
Phone: (91) 8364682929

XML Database

The XML Database is used to store large amounts of data in the XML format. As the use of XML expands in all fields, it is necessary to have a secure location to store XML documents. The data in the database can be queried with XQuery, serialized, and exported in any format required.

Types of XML Database:

There are two different types of XML databases −

  • XML- Enabled
  • Native XML (NXD)

XML – Enabled Database 

The extension offered for the conversion of XML documents is known as an XML-enabled database. This is a relational database, which stores data in tables made up of rows and columns. The tables include a collection of records, which are made up of fields.

Native XML Database (NXD)

The container, rather than the table structure, is the foundation of a native XML database. It can hold a large number of XML documents and data. The XPath-expressions query the native XML database.

Native XML databases have a competitive edge over XML-enabled databases. It is more capable than an XML-enabled database of storing, querying, and maintaining XML documents.

Example for XML Database:

<?xml version = "1.0"?>
<contact-info>
   <contact1>
      <name>Tanmay </name>
      <company>Accenture</company>
      <phone>(91) 9926253728</phone>
   </contact1>
   <contact2>
      <name>Monisha</name>
      <company>IBM</company>
      <phone>(91) 9373628930</phone>
   </contact2>
</contact-info>

XML Example

Let’s have a look at another XML example. We have the information for a few people in this XML document. 

The root element is students>, the child element is <student>, 

and the sub-child elements are name, age, subject, and gender.

<?xml version="1.0" encoding="UTF-8"?>
<students>
 <student>
   <name>Jimmy</name>
   <age>22</age>
   <subject>Computer Science</subject>
   <gender>Male</gender>
 </student>
 <student>
   <name>Darla</name>
   <age>21</age>
   <subject>Pysics</subject>
   <gender>Female</gender>
 </student>
 <student>
   <name>Manju</name>
   <age>26</age>
   <subject>Chemistry</subject>
   <gender>Male</gender>
 </student>
</students>

XML CSS

What is the purpose of CSS in XML? 

CSS (Cascading Style Sheets) can be used to style and display an XML document. It has the ability to format the entire XML document.

To link XML with CSS

One should have the line mentioned below in their XML document<?xml-stylesheet type=”text/css” href=”filename.css”?>

Example for XML CSS

employee.css file contents

employee  
{  
background-color: pink;  
}  
firstname,lastname,email  
{  
font-size:25px;  
display:block;  
color: blue;  
margin-left: 50px;  
}   

employee.dtd file contents

<!ELEMENT employee (firstname,lastname,email)>  

<!ELEMENT firstname (#PCDATA)>  

<!ELEMENT lastname (#PCDATA)>  

<!ELEMENT email (#PCDATA)>

employee.xml file contents

<?xml version="1.0"?>  
<?xml-stylesheet type="text/css" href="cssemployee.css"?>  
<!DOCTYPE employee SYSTEM "employee.dtd">  
<employee>  
  <firstname>Ram</firstname>  
  <lastname>Prasad</lastname>  
  <email>ram@gmail.com </email>  
</employee> 

Output:

Ram Prasad ram@gmail.com

DTD vs XSD 

There are numerous distinctions between DTD (Document Type Definition) and XSD (XML Schema Definition). In short, DTD allows for less control over XML structure, but XSD (XML schema) allows for more.

DTDXSD
DTD is abbreviated as Document Type Definition.XSD is abbreviated as XML Schema Definition.
They are derived from SGML syntax.They are written in XML.
Datatypes are not supported. Datatypes for elements and attributes are supported
It doesn’t support namespace.It supports namespace.
It  doesn’t define child elements orderIt defines child elements order
It is not extensible.It is extensible.
Not so simple to learn.It is simple to learn as it does not require learning new language 
It provides less control on XML structure.It provides more control on XML structure.

CDATA and PCDATA 

CDATA: (Unparsed Character Data): CDATA contains content that is not further parsed in an XML document. Tags contained within the CDATA text are not considered as markup, and entities are not expanded.

Example:

<?xml version=”1.0″?>  

<!DOCTYPE employee SYSTEM “employee.dtd”>  

<employee>  

<![CDATA[  

  <firstname>sneha </firstname> 

  <lastname>shiv</lastname> 

  <email>snehashiv@gmail.com </email> 

]]>   

</employee>   

CDATA is used directly after the element employee in the preceding CDATA example to make the data/text unparsed, resulting in the value employee:

Output:

<firstname>sneha</firstname><lastname>shiv</lastname><email>snehashiv@gmail.com </email>

PCDATA: (Parsed Character Data): XML parsers parse all of the text in an XML document. PCDATA is an abbreviation for Parsed Character Data. The text that a parser will parse is referred to as PCDATA. Tags within the PCDATA are considered as markup, and entities are extended.

In other words, parsed character data signifies that the XML parser examined the data and determined that it did not contain a content entity, which will be replaced if it did.

Example:

<?xml version=”1.0″?>  

<!DOCTYPE employee SYSTEM “employee.dtd”>  

<employee>  

  <firstname>sneha</firstname>  

  <lastname>vasanth</lastname>  

  <email>snehavasanth@gmail.com</email>  

</employee>   

In the above example, the employee element contains three further elements: ‘firstname,’ ‘lastname,’ and ’email,’ therefore it parses further to obtain the data/text of firstname, lastname, and email to deliver the value of employee as:

Output:sneha vasanth snehavasanth@gmail.com

SAX XML

What is XML Parser? 

A software library or package that provides interfaces for client programmes to operate with an XML document is known as an XML parser. The XML Parser is intended to read XML and provide a means for programmes to use XML.

The XML parser checks the document and ensures that it is properly formatted.

Let’s look at the diagram below to see how an XML parser works:

A SAX parser is a program that implements the SAX API. This API is event-based and less user-friendly.

Features of SAX Parser:

  • It doesn’t build any internal structure.
  • Clients do not know which methods to call; they just overwrite the API’s methods and insert their own code within them.
  • It is an event-based parser that operates similarly to an event handler in Java.

Advantages:

1) It is simple and uses little memory.

2) It is highly fast and can handle large documents.

Disadvantages:

1) Because it is event-based, its API is less user-friendly.

2) Because the data is fragmented, clients never have access to the entire picture.

 XML Data binding

The encoding of XML document content as an object in computer memory is known as XML binding. A schema driver examines the XSD at compile time and generates an Execution Object Model (XOM). An XML data driver generates an XML object from the XML document at run time.

The method of encoding information in an XML document as an object in computer memory is known as XML data binding (deserialization). XML data binding allows applications to access XML data directly from an object rather than using the Document Object Model (DOM) to retrieve it from an XML file. You can integrate XML data into a business rule application using XML binding.

XML data binding is performed in two stages in the Decision Server:

  1. A schema driver processes the XML Schema Definition (XSD) at build time. This procedure produces an Execution Object Model (XOM). The XOM explains the structure of the XML document and your objects. The data model is the XSD, and the XML document is an implementation of that model. See Stage 1: XML Schema Process for further information.
  1. An XML data driver processes the XML document at run time to build an XML object. For more information, see Stage 2: XML document processing.

 XML Editors

The XML Editor program is a markup language editor. XML documents can be edited or generated using standard text editors such as Notepad, WordPad, or any other equivalent text editor. A professional XML editor with more advanced editing tools, such as

  • It automatically closes any tags that have not been closed.
  • It tightly enforces syntax checks.
  • It uses color to highlight XML syntax for easier reading.
  • It aids in the creation of correct XML code.
  • It validates XML documents against DTDs and Schemas automatically.

XML Editors that are Open Source

The following are some free and open-source XML editors:

  • Online XML Editor This is a simple XML editor that you may use online
  • Xerlin: Xerlin is an Apache-licensed open-source XML editor for the Java 2 platform. It is a Java-based XML modeling program that allows you to create and edit XML files simply.
  • CAM – Content Assembly Mechanism CAM XML Editor utility includes Oracle-sponsored XML+JSON+SQL Open-XDX.

 XML Viewer

A simple text editor or any browser can be used to view an XML document. The majority of major browsers support XML. XML files can be opened in the browser by double-clicking the XML document (if it is a local file) or by typing the URL path in the address bar (if the file is on the server), just like any other file. XML files are saved with the extension “.xml.”

Text Editors

As seen below, any simple text editor such as Notepad, TextPad, or TextEdit can be used to generate or browse an XML document.

Firefox Browser

Double-click the XML code above to open it in Chrome. The XML code uses color to display coding, making it more readable. The XML element displays a plus (+) or negative (-) sign on the left side. The code is hidden when we click the minus sign (-). The code lines are extended when we click the plus (+) sign. The result with Firefox is displayed below.

Chrome Browser

Open the XML code above in Chrome. As stated below, the code is displayed.

 XML Processors 

When a software application reads an XML document and takes appropriate actions, this is referred to as processing the XML. An XML processor is any program that can read and process XML documents. An XML processor reads an XML file and converts it into in-memory structures that the remainder of the application can use.

The simplest basic XML processor reads an XML document and translates it into an internal representation used by other programmes or subroutines. This is known as a parser, and it is an essential component of any XML processing application.

Types:

XML processors are classed as validating or non-validating based on whether or not they validate XML documents. When a processor discovers a validity error, it must disclose it while continuing with regular processing.

xml4c (IBM, in C++), xml4j (IBM, in Java), MSXML (Microsoft, in Java), TclXML (TCL), xmlproc (Python), and XML::Parser are a few validating parsers (Perl)

XML Vs HTML

XMLHTML
XML stands for Extensible Markup Language.HTML stands for Hypertext Markup Language.
The main focus of XML is on data transfer.The main focus of HTML is on data presentation.
It is content-driven.It is format-driven.
It provides the support of namespaces.It does not provide support of namespaces.
Compulsory to add the closing tag.Not compulsory to add the closing tag.
XML tags are not predefined.HTML tags are predefined.
XML has extensible tags.HTML has limited tags.

JSON vs XML

What is JSON? 

JSON is a file format that stores and transmits data objects with attribute-value pairs and arrays using human-readable text. JSON is a standard for storing information that is ordered and quick to obtain. The abbreviation for JSON is JavaScript Object Notation. It provides a collection of data that can be accessed logically.

What is XML? 

XML is designed to store data which is popularly used for data transfer. It is highly case-sensitive. It allows us to define new tags similar to HTML, but those tags would be pre-defined in XML user has to define the customized tags. XML uses file extension as .xml, and the basic component of XML Language is ELEMENT. 

Listed some basic differences between these two: 

JSONXML
JSON object has a definite type XML data’s type is unpredictable 
String, Array, Number, Boolean are the data types of JSON Any XML data should be considered String
JSON objects can be accessed as DataData in XML need parsing
Many browsers support JSONXML parsing in the cross-browser might be tricky
JSON has no display capabilityXML can display as it is a markup language
Text and Number are the two data types that the JSON has supportedThe XML supports charts, images, text, numbers, graphs 
Value retrieval is easyValue retrieval is difficult
Supported by Ajax toolkitNot completely supported by Ajax toolkit
Deserializing/serializing is fully automated JavaScript.Serialize/de-serialize needs manual JavaScript code to be written to separate from XML
Only UTF-8 encoding is supported XML supports various other encodings 
JSON has no comment featureXML has a comment feature 
JSON files are easy to be read by humans XML documents are difficult to read and interpret compared to JSON
There is no support for the namespaces There is support for the namespaces 
It is less secured compared with XMLIt is more secure compared with JSON 

Samples of JSON and XML Codes

JSON:

{
  "employee": [ 
	
     { 
        "empid":"01", 
        "empname": "Ram", 
        "emplastname": "Sharma" 
     }, 
	
     { 
        "empid":"02", 
        "empname": "Lakshman", 
        "emplastname": "Varma" 
     } 
  ]   
}

XML: 

<?xml version="1.0" encoding="UTF-8"?>
<root>
	<employee>
		<id>01</id>
		<name> Tarun </name>
		<lastname>Rajesh</lastname>
	</employee>
	<employee>
		<id>02</id>
		<name>Girish</name>
		<lastname>Kumar</lastname>
	</employee>
</root>

XML Tutorial FAQs

Q: What is XML used for? 

A: An XML file is an extensible markup language file that is used for structure data so that it can be stored and transported. XML is a computer language in which tags are used for describing components in a file.

Q: How do I start learning XML? 

A: Learning XML is not that difficult even for beginners. You can start learning it through the online platforms that offer short courses and even tutorials on XML. You can even consider Great Learning to learn XML.

Q: How do I write an XML document? 

A: To create an XML document, you have to follow the process mentioned below:

  • Click File > New > Other. Now a window will open in which you have to select a wizard
  • Expand XML, select XML Schema File, click Next. Now the Create XML Schema wizard will open
  • At this step, you have to select a parent folder which will be followed by entering a file name for your XML schema file
  • Now simply click Finish.

Q: Is XML easy to learn? 

A: Learning XML is not that tough and it takes one month at the maximum. Even when you practice it, you will find it fairly easy to learn. There can be some confusing problems in the namespaces, and you must learn namespaces to use XML.


Q:  What is XML syntax? 

A: XML syntax denotes the rules that define how an XML application has to be written. The XML syntax is very straightforward, which ultimately makes learning XML very easy. The rules that have to be followed while creating XML syntax are:

  • All XML elements should come with a closing tag
  • XML tags are case sensitive
  • All XML documents should come with a root element
  • Attribute values should always be quoted
  • All XML elements must be properly nested

Q: What is an XML example? 

A: In this example of XML, there are some details of a few students. In this XML document, <students> is the root element, <student> is the child element and the sub-child elements are name, age, subject and gender.

<students>

 <student>

   <name>Rick Grimes</name>

   <age>35</age>

   <subject>Maths</subject>

   <gender>Male</gender>

 </student>

 <student>

   <name>Daryl Dixon </name>

   <age>33</age>

   <subject>Science</subject>

   <gender>Male</gender>

 </student>

 <student>

   <name>Maggie</name>

   <age>36</age>

   <subject>Arts</subject>

   <gender>Female</gender>

 </student>

</students>

Q: How do I convert an XML file to PDF? 

A: If you are converting an XML file to PDF on Mac, you can follow these steps:

  • You would need to launch the XML to PDF Converter and Open Your XML File. The first step would be launching PDFelement Pro and open your XML document
  • Now edit your XML File, although this is optional
  • Now you can perform the XML to PDF conversion.

Q: How to view an XML document? 

A: XML files are encoded in simple text form; therefore, you will be able to open them in any text editor and can clearly read them. Now you can right-click the XML file and select “Open With.” This will then show a list of programs that then helps in opening the file. Now select “Notepad” (Windows) or “TextEdit” (Mac).

Q: Why is XML required? 

A: XML is required for applications such as web publishing, web searching and automating web tasks, general applications, pervasive computing, e-business applications, and metadata applications.

Q: What is the difference between HTML and XML?

A: HTML defines the structure as well as displays information of a web page, while XML is used for structuring, storing, and transferring information and referring to what data is. 

This brings us to the end of the blog on Octave Tutorial. We hope that you found this helpful and were able to learn more about the concepts.
Discover a world of opportunity with our free online courses. From Cybersecurity to Management, Cloud Computing to IT, and Software, we offer a diverse range of industry-relevant domains to choose from. Our courses are designed to provide you with the skills and expertise necessary to thrive in your chosen field.

Avatar photo
Great Learning Team
Great Learning's Blog covers the latest developments and innovations in technology that can be leveraged to build rewarding careers. You'll find career guides, tech tutorials and industry news to keep yourself updated with the fast-changing world of tech and business.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top