Like HTML, XML is also a markup language which stands for Extensible Markup Language. It is developed by World Wide Web Consortium(W3C) to define the syntax for encoding documents which both humans and machine can read. This file contains markup tags. There is a difference between HTML and XML. In HTML, the markup tag describes the structure of the page, and in xml, it describes the meaning of the data contained in the file. In R, we can read the xml files by installing "XML" package into the R environment. This package will be installed with the help of the familiar command i.e., install.packages.
Creating XML File
We will create an xml file with the help of the given data. We will save the following data with the .xml file extension to create an xml file. XML tags describe the meaning of data, so that data contained in such tags can easily tell or explain about the data.
Example: xml_data.xml
- <records>
- <employee_info>
- <id>1</id>
- <name>Shubham</name>
- <salary>623</salary>
- <date>1/1/2012</date>
- <dept>IT</dept>
- </employee_info>
-
- <employee_info>
- <id>2</id>
- <name>Nishka</name>
- <salary>552</salary>
- <date>1/1/2012</date>
- <dept>IT</dept>
- </employee_info>
-
- <employee_info>
- <id>1</id>
- <name>Gunjan</name>
- <salary>669</salary>
- <date>1/1/2012</date>
- <dept>IT</dept>
- </employee_info>
-
- <employee_info>
- <id>1</id>
- <name>Sumit</name>
- <salary>825</salary>
- <date>1/1/2012</date>
- <dept>IT</dept>
- </employee_info>
-
- <employee_info>
- <id>1</id>
- <name>Arpita</name>
- <salary>762</salary>
- <date>1/1/2012</date>
- <dept>IT</dept>
- </employee_info>
-
- <employee_info>
- <id>1</id>
- <name>Vaishali</name>
- <salary>882</salary>
- <date>1/1/2012</date>
- <dept>IT</dept>
- </employee_info>
-
- <employee_info>
- <id>1</id>
- <name>Anisha</name>
- <salary>783</salary>
- <date>1/1/2012</date>
- <dept>IT</dept>
- </employee_info>
-
- <employee_info>
- <id>1</id>
- <name>Ginni</name>
- <salary>964</salary>
- <date>1/1/2012</date>
- <dept>IT</dept>
- </employee_info>
-
- </records>
Reading XML File
In R, we can easily read an xml file with the help of xmlParse() function. This function is stored as a list in R. To use this function, we first need to load the xml package with the help of the library() function. Apart from the xml package, we also need to load one additional package named methods.
Let's see an example to understand the working of xmlParse() function in which we read our xml_data.xml file.
Example: Reading xml data in the form of a list.
- # Loading the package required to read XML files.
- library("XML")
-
- # Also loading the other required package.
- library("methods")
-
- # Giving the input file name to the function.
- result <- xmlParse(file = "xml_data.xml")
-
- xml_data <- xmlToList(result)
- print(xml_data)
Output
Example: Getting number of nodes present in xml file.
- # Loading the package required to read XML files.
- library("XML")
-
- # Also loading the other required package.
- library("methods")
-
- # Giving the input file name to the function.
- result <- xmlParse(file = "xml_data.xml")
-
- #Converting the data into list
- xml_data <- xmlToList(result)
-
- #Printing the data
- print(xml_data)
-
- # Exracting the root node form the xml file.
- root_node <- xmlRoot(result)
-
- # Finding the number of nodes in the root.
- root_size <- xmlSize(root_node)
-
- # Printing the result.
- print(root_size)
Output
Example: Getting details of the first node in xml.
- # Loading the package required to read XML files.
- library("XML")
-
- # Also loading the other required package.
- library("methods")
-
- # Giving the input file name to the function.
- result <- xmlParse(file = "xml_data.xml")
-
- # Exracting the root node form the xml file.
- root_node <- xmlRoot(result)
-
- # Printing the result.
- print(root_node[1])
Output
Example: Getting details of different elements of a node.
- # Loading the package required to read XML files.
- library("XML")
-
- # Also loading the other required package.
- library("methods")
-
- # Giving the input file name to the function.
- result <- xmlParse(file = "xml_data.xml")
-
- # Exracting the root node form the xml file.
- root_node <- xmlRoot(result)
-
- # Getting the first element of the first node.
- print(root_node[[1]][[1]])
-
- # Getting the fourth element of the first node.
- print(root_node[[1]][[4]])
-
- # Getting the third element of the third node.
- print(root_node[[3]][[3]])
Output
How to convert xml data into a data frame
It's not easy to handle data effectively in large files. For this purpose, we read the data in the xml file as a data frame. Then this data frame is processed by the data analyst. R provide xmlToDataFrame() function to extract the information in the form of Data Frame.
Let's see an example to understand how this function is used and processed:
Example
- # Loading the package required to read XML files.
- library("XML")
-
- # Also loading the other required package.
- library("methods")
-
- # Giving the input file name to the function xmlToDataFrame.
- data_frame <- xmlToDataFrame("xml_data.xml")
-
- #Printing the result
- print(data_frame)
Output
No comments:
Post a Comment