Learning xml from scratch. XML Basics for Beginners. XML will be used everywhere

XML is a very popular and flexible format these days. Every programmer should understand it, it's just a must have. Many technologies are actively used today, and modern ones are among them.

Introduction

Hello dear readers of my article. I want to say right away that this is only the first article in my series of three articles. The main goal of the whole cycle is to dedicate each reader to XML and give, if not a complete explanation and understanding, then at least a good push to it, explaining the main points and things. The whole cycle will be for one nomination - "Attention to Details", and the division into 3 articles is done in order to fit into the character limit in posts and to divide a large amount of material into smaller portions for better understanding. The first article will focus on XML itself and what it is, as well as one of the ways to schema XML files - DTD. To begin with, I would like to make a small preface for those who are not yet familiar with XML at all: there is no need to be scared. XML is not very complex and should be understood by any programmer, as it is a very flexible, efficient and popular file format today for storing all sorts of information you want. XML is used in Ant, Maven, Spring. Any programmer needs knowledge of XML. Now that you have gathered strength and motivation, let's start studying. I will try to lay out all the material as simply as possible, collecting only the most important and not going into the jungle.

XML

For a clearer explanation, it would be more correct to visualize the XML with an example.< ? xml version= "1.0" encoding= "UTF-8" ? > < company> < name> IT Heaven< / name> < offices> < office floor= "1" room= "1" > < employees> < employee> < name> Maxim< / name> < job> Middle Software Developer< / job> < / employee> < employee> < name> Ivan< / name> < job> Junior Software Developer< / job> < / employee> < employee> < name> franklin< / name> < job> Junior Software Developer< / job> < / employee> < / employees> < / office> < office floor= "1" room= "2" > < employees> < employee> < name> Herald< / name> < job> Middle Software Developer< / job> < / employee> < employee> < name> Adam< / name> < job> Middle Software Developer< / job> < / employee> < employee> < name> Leroy< / name> < job> Junior Software Developer< / job> < / employee> < / employees> < / office> < / offices> < / company>HTML and XML are similar in syntax because they share a common parent, SGML. However, in HTML there are only fixed standard-specific tags, while in XML you can create your own tags, attributes, and generally do whatever you want to store data the way you want. In fact, XML files can be read by anyone who knows English. You can depict this example using a tree. Tree root– company. It is also the root (root) element from which all other elements come. Each XML file can only have one root element. It must be announced after xml file declarations(the first line in the example) and contain all other elements. A little about the declaration: it obligatory and is needed to identify the document as XML. It has three pseudo-attributes (special predefined attributes): version (according to the 1.0 standard), encoding (encoding) and standalone (standalone: ​​if yes and external schemas are connected to the document, then there will be an error, the default is no). Elements are entities that store data using other elements and attributes. Attributes is additional information about the element that is specified when adding the element. If we translate the explanation into the OOP field, then we can give the following example: we have a car, each car has characteristics (color, capacity, brand, etc.) - these are attributes, and there are entities that are inside the car: doors, windows, engine , the steering wheel is other elements. You can store properties as separate elements or attributes, depending on your desire. After all, XML is an extremely flexible format for storing information about something. After the explanations, it is enough for us to parse the example above for everything to fall into place. In the example, we described a simple company structure: there is a company that has a name and offices, and there are employees in the offices. The Employees and Offices elements are wrapper elements - they serve to collect elements of the same type, in fact, by combining them into one set for the convenience of their processing. Floor and room deserve special attention. These are office attributes (floor and number), in other words, its properties. If we had an “image” element, then we could pass its dimensions. You may notice that the company does not have a name attribute, but does have a name element. It's just that you can describe structures however you want. No one obliges you to write all the properties of elements only in attributes, you can use just elements and write some data inside them. For example, we can record the name and position of our employees as attributes:< ? xml version= "1.0" encoding= "UTF-8" ? > < company> < name> IT Heaven< / name> < offices> < office floor= "1" room= "1" > < employees> < employee name= "Maksim" job= "Middle Software Developer" > < / employee> < employee name= "Ivan" job= "Junior Software Developer" > < / employee> < employee name= "Franklin" job= "Junior Software Developer" > < / employee> < / employees> < / office> < office floor= "1" room= "2" > < employees> < employee name= "Herald" job= "Middle Software Developer" > < / employee> < employee name= "Adam" job= "Middle Software Developer" > < / employee> < employee name= "Leroy" job= "Junior Software Developer" > < / employee> < / employees> < / office> < / offices> < / company>As you can see, now the name and position of each employee are his attributes. And you can see that there is nothing inside the employee entity (tag), all employee elements are empty. Then you can make employee an empty element - close it immediately after the attribute declaration. This is done quite simply, just put a slash:< ? xml version= "1.0" encoding= "UTF-8" ? > < company> < name> IT Heaven< / name> < offices> < office floor= "1" room= "1" > < employees> < employee name= "Maksim" job= "Middle Software Developer" / > < employee name= "Ivan" job= "Junior Software Developer" / > < employee name= "Franklin" job= "Junior Software Developer" / > < / employees> < / office> < office floor= "1" room= "2" > < employees> < employee name= "Herald" job= "Middle Software Developer" / > < employee name= "Adam" job= "Middle Software Developer" / > < employee name= "Leroy" job= "Junior Software Developer" / > < / employees> < / office> < / offices> < / company>As you can see, by closing the empty elements, we preserved the integrity of the information and greatly reduced the record, making the information more concise and readable. In order to add a comment(text that will be skipped when parsing the file) in XML, there is the following syntax:< ! -- Иван недавно уволился, только неделю отработать должен. Не забудьте потом удалить его из списка. -- >And the last design is CDATA , means "character data". With this construct, it is possible to write text that will not be interpreted as XML markup. This is useful if you have an entity inside the XML file that stores XML markup in the information. Example:< ? xml version= "1.0" encoding= "UTF-8" ? > < bean> < information> < ! [ CDATA[ < name> Ivan< / name> < age> 26 < / age> ] ] > < / information> < / bean>The beauty of XML is that you can extend it however you want: use your own elements, your own attributes, and structure it however you want. You can use both attributes and elements to store data (as shown in the example earlier). However, you need to understand that you can invent your own elements and attributes on the go and as you wish, you can, but what if you work on a project where another programmer wants to transfer the name element to attributes, and you have all the program logic written so that name was an element? How to create your own rules for what elements should be, what attributes they have and other things so that you can validate XML files and be sure that the rules will become standard in your project and no one will violate them? In order to write all the rules of your own XML markup, there are special tools. The most famous are DTD and XML Schema. This article will focus only on the first.

DTD

DTD is designed to describe document types. The DTD is now obsolete and is being actively deprecated in XML, but there are still many XML files that use the DTD and are generally useful to understand. DTD is a technology for validating XML documents.. The DTD declares specific rules for a document type: its elements, what elements can be inside an element, attributes, whether they are required or not, the number of times they can be repeated, and the Entity. Similar to XML, a DTD can be visualized with an example to explain it more clearly.< ! -- Объявление возможных элементов -- > < ! ELEMENT employee EMPTY> < ! ELEMENT employees (employee+ ) > < ! ELEMENT office (employees) > < ! ELEMENT offices (office+ ) > < ! ELEMENT name (#PCDATA) > < ! ELEMENT company (name, offices) > < ! -- Добавление атрибутов для элементов employee и office -- > < ! ATTLIST employee name CDATA #REQUIRED job CDATA #REQUIRED > < ! ATTLIST office floor CDATA #REQUIRED room CDATA #REQUIRED > < ! -- Добавление сущностей -- > < ! ENTITY M "Maksim" > < ! ENTITY I "Ivan" > < ! ENTITY F "Franklin" >Here we have such a simple example. In this example, we have declared our entire hierarchy from the example XML: worker, workers, office, offices, name, company. To create DTD files, 3 main constructs are used to describe any XML files: ELEMENT (for describing elements), ATTLIST (for describing attributes for elements) and ENTITY (for substituting text with abbreviated forms). ELEMENT Used to describe an element. Elements that can be used inside the described element are listed in parentheses as a list. You can use quantifiers to specify a quantity (they are similar to regular expression quantifiers): + means 1+ * means 0+ ? means 0 OR 1 If no quantifiers have been added, then it is considered that there should be only 1 element. If we needed one of a group of elements, we could write it like this:< ! ELEMENT company ((name | offices) ) >Then one of the elements would be selected: name or offices, but if there were two of them inside the company at once, then the validation would not pass. You can also notice that the employee has the word EMPTY - this means that the element must be empty. There is also ANY - any elements. #PCDATA - text data. ATTLIST Used to add attributes to elements. ATTLIST is followed by the name of the required element, and after the dictionary of the form "attribute name - attribute type", and at the end you can add #IMPLIED (optional) or #REQUIRED (required). CDATA - text data. There are other types, but they are all lowercase. ENTITY ENTITY serves to declare abbreviations and the text that will be attached to them. In fact, we can simply use in XML, instead of the full text, just the name of the entity with the sign & in front of and; after. For example: to distinguish markup in HTML from just characters, the left angle bracket is often escaped with lt; , but you also need to set & before lt. Then we will not use markup, but just a symbol< . Как вы можете видеть, все довольно просто: объявляете элементы, объясняете, какие элементы объявленные элементы способны содержать, добавление атрибутов этим элементам и, по желанию, можете добавить сущности, чтобы сокращать какие-то записи. И тут вы должны были бы спросить: а как использовать наши правила в нашем XML файле? Ведь мы просто объявили правила, но мы не использовали их в XML. There are two ways to use them in XML: 1. Injection - writing DTD rules inside the XML file itself, just write the root element after the DOCTYPE keyword and enclose our DTD file inside square brackets. < ? xml version= "1.0" encoding= "UTF-8" ? > < ! DOCTYPE company [ < ! -- Объявление возможных элементов -- > < ! ELEMENT employee EMPTY> < ! ELEMENT employees (employee+ ) > < ! ELEMENT office (employees) > < ! ELEMENT offices (office+ ) > < ! ELEMENT name (#PCDATA) > < ! ELEMENT company (name, offices) > < ! -- Добавление атрибутов для элементов employee и office -- > < ! ATTLIST employee name CDATA #REQUIRED job CDATA #REQUIRED > < ! ATTLIST office floor CDATA #REQUIRED room CDATA #REQUIRED > < ! -- Добавление сущностей -- > < ! ENTITY M "Maksim" > < ! ENTITY I "Ivan" > < ! ENTITY F "Franklin" > ] > < company> < name> IT Heaven< / name> < ! -- Иван недавно уволился, только неделю отработать должен. Не забудьте потом удалить его из списка. -- > < offices> < office floor= "1" room= "1" > < employees> < employee name= "&M;" job= "Middle Software Developer" / > < employee name= "&I;" job= "Junior Software Developer" / > < employee name= "&F;" job= "Junior Software Developer" / > < / employees> < / office> < office floor= "1" room= "2" > < employees> < employee name= "Herald" job= "Middle Software Developer" / > < employee name= "Adam" job= "Middle Software Developer" / > < employee name= "Leroy" job= "Junior Software Developer" / > < / employees> < / office> < / offices> < / company> 2. Import - we write all our rules in a separate DTD file, after which we use the DOCTYPE construction from the first method in the XML file, only instead of square brackets you need to write SYSTEM and specify an absolute or relative path to the current location of the file. < ? xml version= "1.0" encoding= "UTF-8" ? > < ! DOCTYPE company SYSTEM "dtd_example1.dtd" > < company> < name> IT Heaven< / name> < ! -- Иван недавно уволился, только неделю отработать должен. Не забудьте потом удалить его из списка. -- > < offices> < office floor= "1" room= "1" > < employees> < employee name= "&M;" job= "Middle Software Developer" / > < employee name= "&I;" job= "Junior Software Developer" / > < employee name= "&F;" job= "Junior Software Developer" / > < / employees> < / office> < office floor= "1" room= "2" > < employees> < employee name= "Herald" job= "Middle Software Developer" / > < employee name= "Adam" job= "Middle Software Developer" / > < employee name= "Leroy" job= "Junior Software Developer" / > < / employees> < / office> < / offices> < / company>It is also possible to use the PUBLIC keyword instead of SYSTEM, but it is unlikely to be useful to you. If you are interested, you can read about it (and about SYSTEM too) in detail here: link. Now we can't use other elements without declaring them in the DTD, and all XML is subject to our rules. You can try writing this code in IntelliJ IDEA to a separate .xml file and try adding some new elements or removing an element from our DTD and notice how the IDE will point you to an error. However, DTDs have their downsides:
  • It has its own syntax, different from xml syntax.
  • DTDs do not have type checking and can only contain strings.
  • There is no namespace in DTD.
About the problem of own syntax: you have to understand two syntaxes at once: in XML and in DTD syntax. They are different and this can make you confused. It also makes it harder to track down errors in huge XML files in conjunction with the same DTD schemas. If something doesn't work for you, you have to check a huge amount of text of different syntaxes. It's like reading two books at the same time: in Russian and English. And if your knowledge of one language is worse for you, then it will be just as difficult to understand the text. About the Data Type Checking Issue: Attributes in DTDs do have different types, but they are all, at their core, string representations of something, lists or links. However, you cannot demand only numbers, much less positive or negative ones. And you can forget about object types altogether. The last problem will be discussed in the next article, which will be devoted to namespaces and XML schemas, since it is pointless to discuss it here. Thank you all for your attention, I have done a great job and continue to do it in order to complete the entire series of articles on time. In fact, I just need to understand XML Schemas and come up with an explanation of them in clearer words in order to finish the 2nd article. Half of it is already done, so you can expect it soon. The last article will be completely devoted to working with XML files using Java. Good luck to everyone and success in programming :) Next article: 1. Introduction

If any of you have ever tried to learn XML on your own, you may have come across many of the confusing concepts that have hit me in the past. DTD, XML Schema, namespaces (namespaces), XPath, XPointers, XSL, XSLT, DOM, SAX, SOAP, Everything, I give up. I will only add that most of this material is based on implementations, the code of which may contain errors. There are probably millions of ways to implement and use XML, but they can all be quite complex. You know, XML can be very simple. If we forget about DTDs, XML Schemas, namespaces, etc.
In an effort to get you up and running with XML as quickly as possible, I will ignore as much of the information that you can read in the relevant literature as possible. And the first thing I'm going to ignore is namespaces and schemas. This may seem strange to you, since most books begin with these concepts, but try to think of XML as a tool for a specific task, like a hammer. Is it necessary to know how to build a house in order to use a hammer? What if all I need is just to drive in a nail to hang a picture on it? It's the same with XML, it can be very complex, versatile enough to be used in hundreds if not thousands of applications, and very simple if you don't pay attention to a few things. In this article, I will concentrate on solving specific problems using XML.
So what exactly is the problem? Let's say I want to describe a simple object, like a glass, using XML. Why am I going to use XML for this? Well, first of all, that's exactly what XML is for. XML describes the data. In my example, glass, this is the data. In real life, data can be Word documents, spreadsheet sheets, images, a book, a database entry, or even C++ or Visual Basic classes. Second, XML is extensible. XML allows me to create as many features as needed to describe the data, and those features will be whatever I want. And finally, because XML is fast becoming the standard. If there is life on Mars, then you can be sure that they will be able to understand my XML file there.

What are the main properties that make it possible to describe a glass?

How would the same look in XML format?

glass 6 16 ice cube straw water yes

Note that the first line of the file () has a special look for now, just remember that it should be there. The beauty of the XML format is that anyone can understand what it says just by looking at it. It is also clear that this is not the only possible XML description of the order book. If I ask 10 people to develop an XML description of a glass with the same properties, perhaps they will all create different but correct descriptions. This is where the problem lies. Maybe not for us humans, but when a computer reads an XML file it would be a great idea to let it know what the file is about. This is where namespaces and schemes come in. Simply put, schemas are used to define an adequate structure for an XML file.
Now it's time to talk about a few simple XML rules to follow:

XML Rule #1: A valid XML file must exactly match its schema. But for ease of understanding of the material, none of my examples will use schemas. Thus, strictly speaking, none of my examples are "adequate". But honestly, I don't care. I'm not going to build a house, I just need to hang a picture. I'll talk more about this later when we discuss the XML Document Object Model.

XML Rule #2: If you're programming in VB, remember: XML is case sensitive. XML is case sensitive. XML is case sensitive. XML is case sensitive. Write this sentence 1000 times and never forget.

XML Rule #3: Tags are usually called elements and each opening tag must have a corresponding closing tag. By following this rule, you will have a valid XML file. This is very important because until the XML file is properly formatted, it will not be parsed and loaded into the Document Object Model. Note that if an element contains no values ​​and no other (nested) elements, the end tag may look like instead of a more cumbersome design . You can see this approach in the previous example ( ).

XML Rule #4: Elements can contain attributes, and attribute values ​​must be enclosed in quotation marks (single or double).

XML Rule #5: You can use attribute names multiple times, but element names must be unique throughout the file. In the previous example, the qty attribute had a different value depending on which element it was used on. ,, or . The value of an attribute depends on the context in which it is used. Whereas the value of an element always means the same, no matter where in the file the attribute is located. In the previous example, the element And always has the same value throughout our document. for example, always used to describe the height of a glass.

XML Rule #6: There are several special characters in XML that cannot be used directly because they are reserved in the XML syntax. Therefore, to use such characters, you would have to use the reserved construct starting with the character & and the special code, (the & character must be written as &) (the " must be written as ") (character< должен писаться как <) (символ >must be written as >) and (the symbol " must be written as "). Instead, you can also use the statement, where "...." can be any sequence of characters except "]]>". Such a construct can occur anywhere, but it cannot be nested.

2. XML Document Object Model

The XML Document Object Model allows programmers to load the contents of an XML file into memory. Once an XML file is loaded in this way, it can be manipulated using the properties, methods, and events of the Document Object Model. This is where the usefulness of XML comes in. The Document Object Model greatly facilitates the retrieval and processing of information in an XML file. I'm not going to talk about all the features of the document object model here, I'll just talk about some of the main features that will help achieve the goal of this article. I'm going to take the DOM XML file I just created, load it into the Document Object Model, and do a few things with it. I'll save the rest of the features and capabilities of the Document Object Model for the next article on client-side XML. Note that although the Document Object Model is very nice and developer-friendly, it requires quite a significant amount of system resources. Therefore, there is another method for parsing XML files, known as SAX. My article does not pretend to be an exhaustive source of information on this subject, so it would be useful to also use the XML SDK.

Let's look at an example using Microsoft's XML parser version 3.0 (Microsoft's XML parser version 3.0 (msxml3.dll)) to see how it all works. If you don't have an analyzer, then latest version can be downloaded from the Microsoft website.
Let's say I saved an example cup description in XML format to the file "http://web_server/xml/cup.xml" (local path C:\inetpub\wwwroot\xml\cup.xml) and now I want to load it into the Document Object Model . The following code assumes that the analyzer is already loaded and running.

Code in Visual Basic 6.0:(linking to Microsoft XML, v3.0) Dim xmlDoc as MSXML2.DOMDocument30 Set xmlDoc = New DOMDocument30 xmlDoc.async = False xmlDoc.validateOnParse = False xmlDoc.load ("c:\inetpub\wwwroot\xml\cup.xml" msgBox xmlDoc.xml ASP Server-Side code in Visual Basic: Dim xmlDoc Set xmlDoc = Server.CreateObject("Msxml2.DOMDocument.3.0") xmlDoc.async = False xmlDoc.validateOnParse = False xmlDoc.load "/xml/cup.xml" ASP Server Side Java Script code: var xmlDoc = Server.CreateObject("Msxml2.DOMDocument.3.0"); xmlDoc.async = false; xmlDoc.validateOnParse = false; xmlDoc.load("/xml/cup.xml");

Explanation of the above code - let's go through the code in VB6

Line 1: Dim xmlDoc as MSXML2.DOMDocument30

In this first line, we define a reference to "Microsoft XML, v3.0". In this line, I have defined the xmlDoc variable as a reference to an XML document. MSXML2 is a library (use that name, don't try to write MSXML3, it won't work). DOMDocument30 defines a version 3.0 conforming XML document object. You may also see this code: dim xmlDoc as MSXML2.DOMDocument. This construct is commonly used when you don't want to specify a specific version of an XML document. In this case, the analyzer registered by default in the system will be used. The only problem may be that the version of the analyzer registered by default may differ on different computers. If you want to be sure that the code you write will work with any version of the analyzer, then do not use constructs specific to specific versions of the analyzer. Because there is no guarantee that the user who will use your code has the name of the version of the analyzer under which you wrote your code. Another advantage of developing code that is independent of the analyzer version is that when more a new version analyzer, it will definitely be backward compatible with previous versions, and you won't have to recompile your code.

Line 2: Set xmlDoc = new DOMDocument30

This line initializes the xmlDoc variable as a new instance of the XML document object version 3.0.

Line 3: xmlDoc.async = False

XML files can be loaded either synchronously or asynchronously. If xmlDoc.async = False, then the contents of the XML file will be loaded, and only after that control will be transferred to the calling process. If xmlDoc.async = True, then control will be transferred to the calling process immediately, without waiting for the contents of the XML file to be fully loaded.

Line 4: xmlDoc.validateOnParse = False

This code tells the parser not to check the loaded XML file against its schema (validateOnParse = False). To enable schema validation, write validateOnParse = True.

Line 5: xmlDoc.load("C:\inetpub\wwwroot\xml\cup.xml")

This line calls the method for loading the specified XML file. There are two kinds of download method. The first one, which is written on line 5, loads the file into the document object model, and in doing so, it is necessary to pass the full path to the XML file. The second loading option involves passing a string as an xml parameter. This kind of loading could be called, for example, like this: xmlDoc.loadXML("valid xml string"). I'll show you how to use this method later.

Line 6: MsgBox xmlDoc.xml

This line displays the contents of the loaded XML file. As a result, we should get the original XML file that we created earlier.

2.2. Examining the XML Document Object Model

Create a new project in Visual Basic and name it standard.exe. Paste the code above into the load method of your project's main window. Make sure you declare a reference to "Microsoft XML v3.0". In order to do this, click Project-->References, then scroll down the list that appears and find the desired reference in it. Note that the analyzer version 3.0 must be installed on your computer, otherwise it will not be in the list. Set breakpoints on the last line of code (msgbox xmlDoc.xml). Run the application in debug mode. When the execution process reaches the breakpoint, call the "Locals" window and look at the Document Object Model. You can learn a lot by viewing what is displayed in this window. The "Locals" window should look like the one shown in the picture below. Here are some interesting properties of the Document Object Model.

The XML Document Object Model always contains two top-level nodes:

  • Item1 is the root of the document's item branch (ignore it)
  • Item2 is actually the first element of the document (remember this)

nodeName or baseName - can be used when looking up the name of an element or attribute.
nodeType - use to get the type of the current node.
nodeValue - use to find out the value of node data.
childNodes is a collection of child nodes. They can be element nodes, text nodes, and CDATA nodes. There may be other types of nodes that I won't go into right now, but you can learn all about them in the XML SDK.
attributes is a collection of attribute nodes for the current element.
length - used to determine the number of nodes in the tree directly belonging to the current one.
xml - this property is present in all nodes and can be used to represent the current position in the document. The XML string starts at the current node and goes down to the end of the tree. This is a very useful property. Experiment with it and see what happens.

2.2.2. Element nodes

An element node can contain element, attribute, text, or CDATA descendant nodes. The figure below shows the following information about the "SOLID" node:

nodeType - Current node type = NODE_ELEMENT - i.e. the current node is an element.
nodeName or baseName or tagName - Name of the current node (element) = SOLID.
Its parent element CONTENTS has 4 children.
You can see this in the following figure, but SOLID has one child, which is a text data type.
text - "ice cube" is a shorthand method for getting the value of the current node without moving to the child's text node.

2.2.3. Attribute nodes

Attribute nodes can only consist of text or CDATA child nodes. The following figure shows what information can be obtained about the "qty" node:

nodeType - Current node type = NODE_ATTRIBUTE - current node is an attribute.
nodeName or baseName - Name of the current node (Attributes) = qty

From the following figure, it is also clear that qty has one child, which has a text data type.
text or value - "2" is a shorthand method that allows you to get the value of the current node without moving to the child's text node.

2.2.4. Text nodes and CDATA nodes

Text or CDATA nodes do not contain children. Text nodes contain the processed text data of their parent node. CDATA contain the raw text data of their parent node. CDATA nodes are created when data in an XML file is framed in a special way. The CDATA label tells the parser not to parse the data and to accept the characters within this label as data. The CDATA section is especially useful when you need to insert code inside an XML file. The following figure shows what information can be obtained from the current text node:

nodeType - Current node type = NODE_TEXT - current node contains text data.
nodeName - Name of the current node (text) = #text - all text nodes are named #text
data or text or value - "2" is the node's current data.

2.2.5. Errors when loading a document

The parseError section of the Document Object Model can be useful in identifying problems that occur when loading an XML document. If I remove the end tag from OTHER in our example file and try to run the program again, I get the following result. The first piece of useful information is that our nextSibling now contains a value of Nothing. Now, if you look at childNodes, you can see that the length field is now 0. Both of these signs indicate that our XML document has not been loaded. To understand why, I open the parseError node and get all the error information.

So I've shown you how to load an XML file into the Document Object Model, but what do you do with it there? One of the main features that you will be able to use is to perform various queries on an XML document. To do this, you can of course look through the entire document until you find the information you are looking for. But the most preferred way is to use one of the two methods of the DOMDocument class. The two methods used to find the nodes in our previous example could be xmlDoc.SelectSingleNode(patternString) to get the node we're looking for, or xmlDoc.SelectNodes(patternString) to get the list of nodes we're looking for. The patternString parameter is just the request. It can be formed in one of two ways. Either as an XSL query or as an XPath query. The newer and preferred way to query an XML document is XPath. The patternString format must be set in advance, before the first call to either of the two data request methods, otherwise the XSL way of querying will be used by default. To set the patternString formation type, use setProperty("SelectionLanguage", "format"). In order to change the queries in our example to use the XPath way, I'll add the following command: setProperty("SelectionLanguage","XPath"). In my opinion, XPath is the most important XML technology to learn. I will give some simple XPath queries. The Microsoft XML SDK is a good place to start learning about this technology. Another way to explain this could be to write simple application in Visual Basic, which allows you to enter queries and display the result. You may find some free applications that do the same, but XPath is fairly new and may not be fully supported by these applications.

2.3.1. Using XPATH to query the Document Object Model

Let's add some code to the end of our previous example to return the contents of our glass:

GREAT! Let's now add another query that will allow us to determine if the glass has a lid or not. Add the following code to the end of the previous one:

Set objNode = xmlDoc.selectSingleNode("/CUP/LID") if objNode.text="yes" then MsgBox "We have a lid" else MsgBox "No lid on this cup" end if

Let's go through the code line by line:

Line 1: Dim objNode As IXMLDOMNode

This line defines the variable objNode of type XML Document Node. It is important to understand that an XML document node is also an object. It's not a value. It consists of itself, as well as its attributes and descendant (childNodes). In this way, you can cut off unnecessary branches of the tree, choosing only the ones you need.

Line 2: Dim objListOfNodes As IXMLDOMNodeList

This line defines the objListOfNodes variable, which has the type of a list of nodes in an XML document (a group of nodes).

Line 3: xmlDoc.setProperty "SelectionLanguage", "XPath"

This line sets how patternString is formed as XPath.

Line 4: MsgBox "Your cup contains the following items:"

Line 5: Set objListOfNodes = xmlDoc.selectNodes("//CONTENTS/*[@qty>0]")

This line executes an XPath query that will return a group of nodes and store them in the objListOfNodes variable. The request is divided into the following parts:

  • //CONTENTS - get all the CONTENTS elements in the XML document. Note: // is a shorthand for the entire content of an XML document.
  • /* - take all (* - used to specify all) descendant elements from the list of CONTENTS elements. This reduces the result to four element nodes ( ). These four nodes fall directly under the CONTENTS node.
  • [@qty>0] - check each descendant element to ensure that its qty attribute (@ - means attribute) is greater than 0. If this condition is not met, the node is discarded. Anything inside an XPath query can be either True or False. If the result is True, then the node is saved. If the result is False, then the node is discarded. After that, our result is reduced to three nodes (

Line 6-8: For Each objNode In objListOfNodes / MsgBox objNode.Text / Next

These lines display the values ​​of each element node that match the query. ("ice cube" , "straw" , "water").

Line 9: Set objNode = xmlDoc.selectSingleNode("/CUP/LID")

This line returns all LID elements that belong to the CUP element, which, in turn, is descended from the root of the tree (when a query starts with /, it means that you need to start at the root). This is very similar to the path to a file or folder. In our example, this query will return the LID element, which contains the value "yes". The important thing here is that I told the request to start at the root element of the XML document. Queries don't always start at the root, they usually start at the current node. In our example, this does not matter, since the current node (xmlDoc) is the root element of the XML document (but this is not the case in all cases).

Line 10-15: if objNode.text="yes" then / MsgBox "We have a lid" /
else / MsgBox "No lid on this cup" / end if

This line displays the message "We have a lid" because the LID element's text property is "yes".

3. Convert ADO to XML

Now that you understand the basics of XML, let's create an ActiveX control that will convert an ADO dataset into XML format. The goal is to get the titles of the books from the Titles table in the Pubs database and return them in XML format. I will use the result that I get in my next article. You can tell ADO has its own methods for saving the result in XML format, right? Yes, but if I trust ADO to do this, then I end up with an XML file in such a terrible format that it will be impossible to work with it. ADO will create an XML file using the namespace, which I don't need at all right now. Second, ADO will create an XML file that will be in the form of attributes. In other words, each entry will become an element and each field an attribute:

And I would like to receive an XML file in the form of elements, where each entry would be contained in a tag , and each field would be an element inside a tag . The syntax of my XML string would be:

data from table data from table data from table data from table data from table datafrom table datafromtable

By the way, what I just did was create a schema for my XML string. Now, if I need to validate the structure of an XML document against a schema, all I have to do is convert the schema to the correct format. That is, in DTD or XDR syntax. Notice that I have added some attributes to each element. . One of the reasons for this is that this information can be used by the client. Prettyname can be used as data labels. The datatype attribute could be used to validate data on the client side. But to be honest, the real reason these attributes are there is because they have a special purpose in the XSL file template that I often use to build the where section of SQL queries. Maybe I'll post an article demonstrating this approach soon. The template is actually very useful. When the XML structure is applied to data from the Titles table, the result will look like this:

The Busy Executive's Database Guide BU1032 The Busy Executive's Database Guide business 19.99 4095 An overview of available database systems with emphasis on common business applications. illustrated. 6/12/1991 Cooking with Computers: Surprising Balance Sheets BU1111 Cooking with Computers: Surprising Balance Sheets business 11.95 3876 Helpful hints on how to use your electronic resources to the best advantage. 6/9/1991

Now I got something to work with!

Listing 1 - CUP.XML

glass 6 16 ice cube straw water yes

Dim xmlDoc As MSXML2.DOMDocument30 Set xmlDoc = New DOMDocument30 xmlDoc.async = False xmlDoc.validateOnParse = False xmlDoc.Load("c:\inetpub\wwwroot\xml\cup.xml") MsgBox xmlDoc.xml Dim objNode As IXMLDOMNode Dim objListOfNodes As IXMLDOMNodeList xmlDoc.setProperty "SelectionLanguage", "XPath" MsgBox "Your cup contains the following items" Set objListOfNodes = xmlDoc.selectNodes("//CONTENTS/*[@qty>0]") For Each objNode In objListOfNodes MsgBox objNode. Text Next Set objNode = xmlDoc.selectSingleNode("/CUP/LID") If objNode.Text = "yes" Then MsgBox "We have a lid" Else MsgBox "No lid on this cup" End If

Listing 3 - ActiveX Control: ADO to XML (WebClass.dll)(xmlControl.cls)

Option Explicit "Declare Database variables Private m_dbConnection As New ADODB.Connection Private m_dbCommand As ADODB.Command Private m_adors As ADODB.Recordset Private m_adoErrors As ADODB.Errors Private m_adoErr As Error Public nCommandTimeOut As Variant Public nConnectionTimeOut As Variant Public strConnect As Variant Public strAppName As String Public strLogPath As String Public strDatabase As String Public strUser As String Public strPassword As String Public strServer As String Public strVersion As String Public lMSADO As Boolean "Private Global Variables Private gnErrNum As Variant Private gstrErrDesc As Variant Private gstrErrSrc As Variant Private gstrDB As String Private gstrADOError As String Private Const adLeonNoRecordset As Integer = 129 Private gtableName(6) As String Private gcolumnName(6) As String Private gprettyName(6) As String Private gdatatype(6) As String Private gfilter(6) As String Private Function OpenDatabase() If Len(strConnect) = 0 Then "set defaults If Len(strDatabase) = 0 Then strDatabase = "pubs" End If nConnectionTimeOut = 0 Then nConnectionTimeOut = 600 End If If nCommandTimeOut = 0 Then nCommandTimeOut = 600 End If If Len (strAppName) = 0 Then strAppName = "xmlControl" End If If Len(strUser) = 0 Then strUser = "sa" End If Len(strPassword) = 0 Then strPassword = "" End If strConnect = "Provider=SQLOLEDB.1 ; " & _ "Application Name=" & strAppName & _ "; Data Source="&strServer&"; Initial Catalog="&strDatabase&"; "&_" User ID="&strUser&"; Password=" & strPassword & ";" End If "connect to SQL Server and open database On Error GoTo SQLErr "Enable error handler With m_dbConnection .ConnectionTimeout = nConnectionTimeOut .CommandTimeout = nCommandTimeOut .Open strConnect "open database using connection string End With On Error GoTo 0 "disable error handler OpenDatabase = True "database opened successfully Exit Function SQLErr: Call logerror("OPEN") OpenDatabase = False End Function Private Function BuildSQLwhere(tmpWhere) As String "This is for the future End Function Public Function GetTitlesXML (Optional xmlWhere As Variant) As String Dim whereClause As String Dim strSQL As String Call OpenDatabase "open database pubs If IsMissing(xmlWhere) Then "when query fails whereClause = "" Else whereClause = BuildSQLwhere(xmlWhere)"convert query to correct sql End If "initialize a sql statement that will query book titles strSQL = "select title_id,title,type,price,ytd_sales,notes,pubdate from titles" & whereClause Call NewRecordSet "create a dataset "set cursorlocation m_adors. CursorLocation = adUseClient "open recordset m_adoRs.Open strSQL, m_dbConnection, adOpenForwardOnly, adLockReadOnly, adCmdText "disconnect from dataset Set m_adors.ActiveConnection = Nothing On Error GoTo 0 "disable error handler "close database and release connection Call CloseDatabase If m_adors. EOF Then GetTitlesXML = "" "the query did not return any values ​​Else If lMSADO Then GetTitlesXML = msado(m_adoRs) "convert the dataset to Microsoftado-->xml Else GetTitlesXML = ADOtoXML(m_adors, True) "convert the ado recordset to custom xml End If End If "close the data set Call CloseRecordset Exit Function SQLErr: Call logerror(strSQL) End Function Private Function ADOtoXML(tmprs As ADODB.Recordset, tmpMP As Boolean) As String Dim adoFields As ADODB.Fields "declare a collection to store Dim fields adoField As ADODB.Field "is used to get each field from the collection Dim xmlDoc As msxml2.DOMDocument30 Dim tmpLine As String "stores the xml representation of each book Dim tmpXML As String "is used to concatenate xml strings Dim i As Integer If tmprs.EOF Then "query did not return any records ADOtoXML = "" Exit Function Else Set adoFields = tmprs.Fields "create a collection of fields End If tmpXML = " " "all books will be enclosed in a tag Do Until tmprs.EOF "loop through each line in the dataset i = 0 " I is the ado index of the field, which starts at 0 - the first field will be field(0) tmpLine = " " & tmprs("title") & vbCrLf For Each adoField In adoFields "loop through all fields" build xml tag and its attributes for the current field tmpLine = tmpLine & " " & adoField.Value tmpLine = tmpLine & ""& vbCrLf i = i + 1 "go to the next field Next tmpXML = tmpXML & tmpLine & "" & vbCrLf "end tag after last field tmprs.MoveNext "next header Loop Set adoField = Nothing "destroy field object Set adoFields = Nothing "destroy field collection object tmpXML= tmpXML & "" & vbCrLf " closing tag Set xmlDoc = New msxml2.DOMDocument30 "create xmlDOM xmlDoc.async = False "wait for document to load xmlDoc.validateOnParse = False "do not check against schema xmlDoc.loadXML(tmpXML) "load string into Document Object Model On Error Resume Next "if file does not exist, handle this error Kill("c:\temp\custom.xml") "delete the file if it exists On Error GoTo 0 "tell the error handler to abort when an error is encountered xmlDoc.save ("c:\temp\custom. xml") "save xml to file ADOtoXML=xmlDoc.xml "returns xml string Set xmlDoc=Nothing "destroy Document Object Model End Function Private Function msado(tmprs As ADODB.Recordset) As String Dim xmlDoc As msxml2.DOMDocument30 On Error Resume Next "if the file doesn't exist, get a Kill error ("c:\temp\msado.xml") "erase the file if it exists On Error GoTo 0 " tell the error handler to abort on error tmprs.save "c:\temp\msado .xml", adPersistXML "save xml to file Set xmlDoc = New msxml2.DOMDocument30 "create xml document object model xmlDoc.async = False "wait for xml document to load xmlDoc.validateOnParse = False "do not check against schema xmlDoc.Load("C: \temp\msado.xml") "load file into Document Object Model msado = xmlDoc.xml "return xml string Set xmlDoc = Nothing "destroy Document Object Model End Function Private SubCloseRecordset() "close dataset m_adoRs.Close Set m_adors =Nothing End Sub Private Sub NewRecordSet() Set m_adoRs= Nothing Set m_adoRs=New ADODB.Recordset End Sub Private Sub CloseDatabase() m_dbConnection.Close Set m_dbConnection =Nothing End Sub Private Sub logerror(errSQL As String) Dim hFile As Integer Dim expFile As String On Error GoTo 0 gnErrNum = Err.Number gstrErrDesc =Err.Description gstrErrSrc = Err.Source Set m_adoErrors = m_dbConnection.Errors For Each m_adoErr In m_adoErrors gstrADOError = m_adoErr.Description & "," & CStr(m_adoErr.NativeError) _ & ", " & CStr(m_adoErr.Number) & "," & m_adoErr.Source _ & "," & CStr(m_adoErr.SQLState) Next hFile =FreeFile If Len(strLogPath) = 0 Then strLogPath = "C:\temp\" End If expFile = strLogPath & strAppName & ".err" Open expFile For Append As #hFile Print #hFile,"*************************** *******" Print #hFile, Now() Print#hFile, "****************************** ****" Print #hFile,"Subroutine: " & tmpPro Print #hFile, "Error Number:" & gnErrNum Print#hFile, "Error Description: " & gstrErrDesc Print #hFile, "Error Source:" & gstrErrSrc Print # hFile, "Ado error String: " & gstrADOError Print #hFile, "Bad SQL: " & errSQL Close #hFile End Sub Private Sub Class_Initialize() strVersion = "xmlControl Version 1. 1" "title_id,title,type,price,ytd_sales,notes,pubdate gtableName(0) = "titles" gcolumnName(0) = "title_id" gprettyName(0) = "Title Identification Number" gdatatype(0) = "number" gfilter(0) = "" gtableName(1) = "titles" gcolumnName(1) = "title" gprettyName(1) = "Title of the Book" gdatatype(1) = "text" gfilter(1) = "" gtableName (2) = "titles" gcolumnName(2) = "type" gprettyName(2) = "Type of Book" gdatatype(2) = "text" gfilter(2) = "" gtableName(3) = "titles" gcolumnName( 3) = "price" gprettyName(3) = "Price of the Book" gdatatype(3) = "number" gfilter(3) = "" gtableName(4) = "titles" gcolumnName(4) = "ytd_sales" gprettyName( 4) = "Year to date sales" gdatatype(4) = "number" gfilter(4) = "" gtableName(5) = "titles" gcolumnName(5) = "notes" gprettyName(5) = "Notes about the book " gdatatype(5) = "memo" gfilter(5) = "" gtableName(6) = "titles" gcolumnName(6) = "pubdate" gprettyName(6) = "Date Published" gdatatype(6) = "date" gfilter (6) = "" End Sub

Listing 4 - VB test application to test WebClass

Private Sub Command1_Click() Dim objWC As xmlControl Dim xml As String Set objWC = New xmlControl objWC.strDatabase = "pubs" objWC.strServer = "ltweb" objWC.strUser = "sa" objWC.strPassword = "" objWC.lMSADO = Option2 .Value objWC.strAppName = "Article1" Text1.Text = objWC.getTitlesXML End Sub

Listing 5 - ASP for testing WebClass

To date, it has become obvious to all specialists in the field of web technologies that the existing standards for data transmission over the Internet are not enough. The HTML format, having once become a breakthrough in the field of displaying the content of Internet nodes, no longer meets all the requirements that are currently necessary. It allows you to describe how the data should be displayed on the screen of the end user, but does not provide any means for effectively describing and managing the transmitted data.

In addition, a stumbling block for many companies involved in the development software, is the need for the joint use of various components, ensuring their interaction, the possibility of exchanging data between them.

Until recently, there was no standard that provides tools for intelligent information retrieval, data exchange, adaptive processing of the received data.

The solution to all the problems described above was the XML language approved in 1998 by the international organization W3C (EN). XML (eXtensible Markup Language) is an extensible markup language for describing structured data in text form. This text (text-based) format, in many ways similar to HTML, is designed specifically for storing and transmitting data.

XML allows you to describe and communicate structured data such as:

  • separate documents;
  • metadata describing the content of any Internet site;
  • objects containing data and methods for working with them (for example, ActiveX controls or Java objects);
  • individual records (for example, the results of executing database queries);
  • all kinds of web links to information and human resources of the Internet (addresses Email, hypertext links, etc.).

Creating XML Documents

Data described in XML is called XML documents. The XML language is easy to read and easy enough to understand. If you were familiar with HTML, then learning how to write XML documents will not be difficult for you.

The source text of an XML document consists of a set of XML elements, each containing a start and end tag. Each pair of tags represents a piece of data. That is, like HTML, XML uses tags to describe data. But, unlike HTML, XML allows an unlimited set of pairs of tags, each representing not how the data it contains should look like, but what it means.

Good morning NEWS Series Gentle Poison Field of Wonders (repeat) M. f. Health NEWS Enjoy Your Bath! M. f. Together NEWS finest hour NEWS Weather GOOG night kids TIME Sight

This text can be created in plain text format and saved as an XML file.

Any element of an XML document can have attributes that specify its characteristics. An attribute is a name="value" pair that is specified when the element is defined in the start tag. In the example above, the element has a date="December 25" attribute, and the element has - attribute name="ORT".

The extensibility principle of the XML language is the ability to use an unlimited number of tag pairs, defined by the creator of the XML document. For example, the above description of the TV guide can be extended to include information about the broadcast region and the program guide of the PTP channel. In this case, the XML description will take the form:

Russia Saint Petersburg Good morning NEWS Series Gentle Poison Field of Wonders (repeat) M. f. Health NEWS Enjoy Your Bath! M. f. Together NEWS finest hour NEWS Weather GOOG night kids TIME Sight M. f. Weather RTR Post Good morning Country! director himself Purple Haze GOLDEN KEY Federation Secret agents Boyarsky Dvor My family Full house NEWS ASTEROID (USA) DINNER AT FRED'S (USA) Weather

Now, from this XML description, you can extract the program guide of the ORT and RTR channels for December 25 in the city of St. Petersburg, Russia.

The principle of independence of the definition of the internal structure of the document from the ways of presenting this information is to separate the data from the process of their processing and display. Thus, the received data can be used in accordance with the needs of the client, that is, choose the desired design, apply the necessary processing methods.

You can control the display of elements in a client program window (for example, in a browser window) using special instructions - XSL style sheets (eXstensible Stylesheet Language). These XSL tables allow you to define the appearance of an element based on its location within the document, meaning that two elements with the same name can have different formatting rules applied to them. In addition, the underlying language of XSL is XML, which means that XSL tables are more versatile, and DTDs or data schemas, discussed below, can be used to control the correctness of such style sheets.

The XML format, compared to HTML, has a small set of simple parsing rules that allow XML documents to be parsed without resorting to any external descriptions of the XML elements used. In general, XML documents must meet the following requirements:

  • Each opening tag that defines some part of the data in the document must be followed by a closing tag, that is, unlike HTML, closing tags cannot be omitted.
  • Nesting of tags in XML is strictly controlled, so the order of opening and closing tags must be monitored.
  • XML is case sensitive.
  • All information located between the start and end tags is treated as data in XML, and therefore all formatting characters are taken into account (that is, spaces, newlines, tabs are not ignored, as in HTML).
  • XML has a set of reserved characters that must be specified in an XML document only in a special way. These characters and the character sets that define them are:
    < <
    & &
    > >
    " "
    " "
  • Each XML document must have a unique root element. In our example, this element is the element .
  • All attribute values ​​used in tag definitions must be enclosed in quotation marks.

If an XML document does not violate the above rules, then it is called formally correct.

Today, there are two ways to control the correctness of an XML document: DTD definitions (Document Type Definition) and data schemas (Semantic Schema). If an XML document is created and sent using DTDs or Schemas, then it is called valid.

Scheme is a way to create rules for constructing XML documents, that is, specifying valid names, types, attributes, and relationships of elements in an XML document. schemes are alternative way creating rules for building XML documents. Compared to DTDs, schemas are more powerful for defining complex data structures, provide a clearer way to describe the grammar of a language, and can be easily upgraded and extended. The undoubted advantage of schemas is that they allow you to describe the rules for an XML document using XML itself. From this point of view, the XML language can be called self-describing.

Since the XML elements used in the same document may come from different XML schemas, element name conflicts can occur. Name spaces solve this problem. Namespaces allow you to distinguish between elements with the same name but different meanings. However, they do not specify how such elements are handled; this is what the XML parsers discussed below do.

To better understand the purpose and uses of XML Schemas, let's look at the schema for the TV guide example discussed above.

This XML schema must be saved in the TV-ProgramSchema.XML file. The root element of this XML file is the element , whose attributes are the name of the TV-ProgramSchema schema and a reference to the namespaces that define the built-in data types used in this schema: xmlns="urn:schemas-microsoft-com:xml-data" . The minOccurs and maxOccurs element attributes of this schema define the minimum and maximum possible number of such elements in the schema, respectively. For example, the line means that the number of items of type item (that is, the TV shows themselves) in the scheme can be from 0 to infinity.

If the above scheme is used to control the correctness of the XML description of the TV program guide, then the scheme used must be indicated in the header of the XML document. Then the XML description of the TV program of the ORT channel will look like this:

Russia Saint Petersburg Good morning NEWS Series Gentle Poison Field of Wonders (repeat) M. f. Health NEWS Enjoy Your Bath! M. f. Together NEWS finest hour NEWS Weather GOOG night kids TIME Sight

Now the root element of this XML description has the xmlns="x-schema:TV-ProgramSchema.xml" attribute, which is a reference to the XML schema used.

Parsing XML Documents

Getting data from an XML document, as well as checking the correctness of XML documents is provided analyzers(parsers) XML documents. If an XML document is formally correct, then all parsers designed to parse XML documents will be able to work with it correctly.

Since the use of DTDs in XML is optional, any formally valid document can be recognized and parsed by a program designed to parse XML documents. For example, any XML description given in this document is formally correct, so it will be recognized correctly by any XML parser.

If the XML parser receives an XML document that uses an XML schema as input, then it will be parsed, checked for correctness, and compliance with the schema. For example, an XML description of the TV program guide of the RTR channel using the TV-ProgramSchema.xml scheme will be recognized as formally correct and valid.

XML parsers allow, if the language constructs specified in the document are syntactically correct, to correctly extract the document elements defined by them and pass them to the application program that performs the necessary display actions. That is, after parsing the XML document, in most cases, the application program is provided with an object model that displays the contents of the resulting XML document, and the tools necessary to work with it (walk through the element tree).

Since XML, unlike HTML, does not in any way define how the document elements described with it are displayed and used, the XML parser is given the opportunity to choose the desired design.

As already mentioned, XSL tables can be used to define the appearance of XML elements. The principle of processing XML documents using style sheets is as follows: when parsing an XSL document, the parser program processes the instructions of this language and assigns a set of tags to each element found in the XML tree that determines the formatting of this element. In other words, with the help of XSL tables, a formatting template for XML elements is specified, and this template itself can have the structure of the corresponding fragment of an XML document. XSL statements define the exact location of an XML element in the tree, so it is possible to apply different styling to the same element, depending on the context in which it is used.

In some parsers, the way document structure is represented is based on the Document Object Model (DOM) specification, which allows a strict hierarchical DOM to be used when creating XML documents.

An example of an XML parser is the built-in Microsoft Internet. Explorer version 5.0 XML Parser MSXML. It allows you to read data from an XML file, process it, generate an element tree, display the data using XSL style sheets, and represent all data elements as objects using the DOM.

Using XML

Many people think of XML as new technology integration of software components. The main advantages of using XML are:

  • Integration of data from various sources. XML can be used to combine heterogeneous structured data at the middle level of three-tier web systems, databases.
  • Local data processing. The received data in XML format can be parsed, processed and displayed directly on the client without additional calls to the server.
  • Viewing and manipulating data in various sections. The received data can be processed and viewed by the client in various ways, depending on the needs of the end user.
  • Possibility of partial updating of data. With XML, you can only update the part of the structured data that has changed, rather than the entire structure.

All these advantages make XML an indispensable tool for developing flexible database searches, powerful three-tier web applications, and applications that support transactions. In other words, with the help of XML it is possible to form queries to databases of various structures, which allows you to search for information in numerous databases that are incompatible with each other. The use of XML in the middle layer of three-tier web applications enables efficient data exchange between clients and servers of e-commerce systems.

In addition, the XML language can be used as a tool to describe the grammar of other languages ​​and control the correctness of documents.

Tools for processing data received in XML format can be developed in the Visual Basic, Java or C++ environment.

This section is about working with XML. It will include both theoretical and practical material. Basic operations with XML files will be considered, as well as interaction with LINQ and much more.

Creating an XML file

XML(Extensible Markup Language) - an extensible markup language, used to create databases, web pages, used to exchange information between programs, used in technologies such as Ajax, SOAP, and is also the basis of the XAML language, which you can meet when working with WPF.

To create an xml file, we just need to enter

XML file structure

Any XML file begins with a declaration declaration.

Declaration

xml declaration file includes:

Version (version) - the version number of the XML language, 1.0 and 1.1

If you are using , then the declaration line can be omitted; if you are using version 1.1, then this line must be specified.

Encoding (encoding) - specifies the encoding of the file

With this entry, you do not set the encoding for the physical file! But you just make it clear to the program that will process this file, in what encoding, the data inside the file is contained. In doing so, you must ensure that the encoding of the document and the encoding specified in the declaration line match.

To set the document encoding, you can use, for example, the Notepad ++ program

xml file elements

The XML language is made up of elements.

An element is a string that contains the start and end tags, as well as the data placed between them.

  • meaning- element

One file can contain any number of elements.

tags

As mentioned earlier, an element is made up of tags.

  • - tag

Tag names can start with a letter, underscore, or colon, followed by any characters.

Tags are: paired and single.

  • - double
  • - single

A single tag can be used in a situation where no information is contained between the tags, and in order not to indicate a paired tag and a void between them, use a single tag, which can be replaced with a paired tag at any time. A single tag must be closed!

When building an XML document, it is very important to observe the correct nesting of tags:

  • Wrong
  • Right

XML case-sensitive language

  • error!
  • Right
  • Right

Comments

Comments in an XML document use the same syntax as in HTML.

After declaring the declaration and getting acquainted with the main components of the XML language, we proceed to filling our file.

Root element

The root element is always listed first, there can only be one root element per XML document!

In this example, two root elements are created

  • wrong
  • Right

In the second example, one root element "Root" is created, which contains a regular "Admin" element.

After declaring the root element, you can add any number of elements to your . All added elements must be placed between the tags of the root element.

"library" is the root element containing the book element, which contains nested elements: title, author, year.

xml file attributes

Attributes are set in the opening tag of any element.

Syntax: name = "value" enclosed in double quotes.

There can be any number of attributes, but they must not be repeated, and their names must not contain spaces.

  • wrong
  • wrong

Error, there are two duplicate "id" attributes, and there is a space between id and number.

  • Right
  • Right

After XML document created, it must be saved, and do not forget to change the file extension.

  • filename.xml

XML was created to describe data with an eye to what the data is.

HTML was created to display data with an eye to how the displayed data looks like.

What is XML?

  • XML stands for Extensible Markup Language
  • XML is markup language, similar to HTML
  • XML was created for data descriptions
  • XML tags are not predefined. You can use your tags
  • XML uses Document Type Definition (DTD) or XML Schema to describe data
  • XML W3C recommended

The main difference between XML and HTML

XML was designed for data transfer.

XML is not a replacement for HTML.

XML and HTML were developed with different purposes:

  • XML was created to describe data and the focus is on what data is being passed.
  • HTML was designed to display data with the focus on displaying data
  • So HTML is more about displaying information, while XML is more about describing information.

XML does nothing

XML was not created to perform any action.

It may not be easy to understand, but XML doesn't do anything. This markup language was created to structure, store and communicate information. The following example is a note from Anton Ire, rendered in XML:

Ira

Anton

Reminder

Don't forget to meet this week!

As you can see, the XML language is very concise.

A note ( ) consists of a header ( ) and content ( ) letters. It contains the sender (tag - "from whom the letter is") and the recipient (tag - "to whom"). But this letter does nothing. This is pure information wrapped in tags. In order to send, receive, and display this information, someone has to write a program.

XML is a free extensible markup language

XML tags are not predefined. You can enter your own tags.

The tags and document structure in HTML are predefined. The creator of an html document can only use the tags defined by the standards.

XML allows you to enter your tags and document structure to the author of the xml document. The tags shown in the example (for example, And ) are not defined by the XML standard. These tags are introduced by the author of the document.

XML is the complement of HTML

XML is not a replacement for HTML.

It is important to understand that XML is not a replacement for HTML. In the future, web developers will use XML to describe data, while HTML will be used to format and display that data.

My best definition of XML is this: XML is a cross-platform, software- and hardware-independent communication tool.

The note: Cross-platform - suitable for any operating system and any hardware.

If you know there are different OS, except for the familiar Windows. These are OS Linux, Mac and others.

As for the hardware, we can say the following: it can be ordinary PCs, laptops, PDAs, etc.

XML in the future of web development

XML will be used everywhere.

We have been witnessing the development of XML since its inception. It was amazing to see how quickly the XML standard was developed and how quickly a large number of software vendors adopted the standard. We strongly believe that XML will be as important to the future of the Internet as HTML, which is the backbone of the Internet, and that XML will be the most widely used tool for all data manipulation and communication.

 
Articles By topic:
Receiving information What is the difference between receiving information and receiving a message
Information is information about somethingThe concept and types of information, transmission and processing, search and storage of informationExpand content Collapse content Information is, definition Information is any information received and transmitted, stored
What is a landing page and how should it look like What is the name of a landing page on the Internet
Most owners of private businesses, various services and small organizations understand how important it is to conduct business online. Creating a website and maintaining a page in social networks is now part of the marketing strategy of any company. But few companies and
How to install your template on ucoz - A program that you did not know about, we are learning to connect!
How to Install a Joomla Template - Troubleshooting - 4.5 out of 5 based on 2 votes Selecting, installing and configuring a template is one of the most important steps in creating a Joomla site. In this tutorial, we'll look at how to install a template
Faibisovich - a guide to the design of electrical networks
HANDBOOK ON THE DESIGN OF ELECTRIC NETWORKS Edited by D. L. FAIBISOVICH Edition 4, revised and supplemented Reviewer V. V. Mogirev Authors : I. G. Karapetyan (pp. 3.2, 5.1, 5.3–5.8, sec. 6, sec. 7), D. L. Faibisovi