Saturday, November 28, 2020

What is XML File in R?

Like HTML, XML is also a markup language which stands for Extensible Markup Language. It is developed by World Wide Web Consortium(W3C) to define the syntax for encoding documents which both humans and machine can read. This file contains markup tags. There is a difference between HTML and XML. In HTML, the markup tag describes the structure of the page, and in xml, it describes the meaning of the data contained in the file. In R, we can read the xml files by installing "XML" package into the R environment. This package will be installed with the help of the familiar command i.e., install.packages.

  1. install.packages("XML")  


R XML File

Creating XML File

We will create an xml file with the help of the given data. We will save the following data with the .xml file extension to create an xml file. XML tags describe the meaning of data, so that data contained in such tags can easily tell or explain about the data.

Example: xml_data.xml

  1. <records>  
  2. <employee_info>  
  3. <id>1</id>  
  4. <name>Shubham</name>  
  5. <salary>623</salary>  
  6. <date>1/1/2012</date>  
  7. <dept>IT</dept>  
  8. </employee_info>  
  9.       
  10. <employee_info>  
  11. <id>2</id>  
  12. <name>Nishka</name>  
  13. <salary>552</salary>  
  14. <date>1/1/2012</date>  
  15. <dept>IT</dept>  
  16. </employee_info>  
  17.   
  18. <employee_info>  
  19. <id>1</id>  
  20. <name>Gunjan</name>  
  21. <salary>669</salary>  
  22. <date>1/1/2012</date>  
  23. <dept>IT</dept>  
  24. </employee_info>  
  25.   
  26. <employee_info>  
  27. <id>1</id>  
  28. <name>Sumit</name>  
  29. <salary>825</salary>  
  30. <date>1/1/2012</date>  
  31. <dept>IT</dept>  
  32. </employee_info>  
  33.   
  34. <employee_info>  
  35. <id>1</id>  
  36. <name>Arpita</name>  
  37. <salary>762</salary>  
  38. <date>1/1/2012</date>  
  39. <dept>IT</dept>  
  40. </employee_info>  
  41.   
  42. <employee_info>  
  43. <id>1</id>  
  44. <name>Vaishali</name>  
  45. <salary>882</salary>  
  46. <date>1/1/2012</date>  
  47. <dept>IT</dept>  
  48. </employee_info>  
  49.   
  50. <employee_info>  
  51. <id>1</id>  
  52. <name>Anisha</name>  
  53. <salary>783</salary>  
  54. <date>1/1/2012</date>  
  55. <dept>IT</dept>  
  56. </employee_info>  
  57.   
  58. <employee_info>  
  59. <id>1</id>  
  60. <name>Ginni</name>  
  61. <salary>964</salary>  
  62. <date>1/1/2012</date>  
  63. <dept>IT</dept>  
  64. </employee_info>  
  65.       
  66. </records>  

Reading XML File

In R, we can easily read an xml file with the help of xmlParse() function. This function is stored as a list in R. To use this function, we first need to load the xml package with the help of the library() function. Apart from the xml package, we also need to load one additional package named methods.

Let's see an example to understand the working of xmlParse() function in which we read our xml_data.xml file.

R XML File

Example: Reading xml data in the form of a list.

  1. # Loading the package required to read XML files.  
  2. library("XML")  
  3.   
  4. # Also loading the other required package.  
  5. library("methods")  
  6.   
  7. # Giving the input file name to the function.  
  8. result <- xmlParse(file = "xml_data.xml")  
  9.   
  10. xml_data <- xmlToList(result)  
  11. print(xml_data)  

Output

R XML File

Example: Getting number of nodes present in xml file.

  1. # Loading the package required to read XML files.  
  2. library("XML")  
  3.   
  4. # Also loading the other required package.  
  5. library("methods")  
  6.   
  7. # Giving the input file name to the function.  
  8. result <- xmlParse(file = "xml_data.xml")  
  9.   
  10. #Converting the data into list  
  11. xml_data <- xmlToList(result)  
  12.   
  13. #Printing the data  
  14. print(xml_data)  
  15.   
  16. # Exracting the root node form the xml file.  
  17. root_node <- xmlRoot(result)  
  18.   
  19. # Finding the number of nodes in the root.  
  20. root_size <- xmlSize(root_node)  
  21.   
  22. # Printing the result.  
  23. print(root_size)  

Output

R XML File

Example: Getting details of the first node in xml.

  1. # Loading the package required to read XML files.  
  2. library("XML")  
  3.   
  4. # Also loading the other required package.  
  5. library("methods")  
  6.   
  7. # Giving the input file name to the function.  
  8. result <- xmlParse(file = "xml_data.xml")  
  9.   
  10. # Exracting the root node form the xml file.  
  11. root_node <- xmlRoot(result)  
  12.   
  13. # Printing the result.  
  14. print(root_node[1])  

Output

R XML File

Example: Getting details of different elements of a node.

  1. # Loading the package required to read XML files.  
  2. library("XML")  
  3.   
  4. # Also loading the other required package.  
  5. library("methods")  
  6.   
  7. # Giving the input file name to the function.  
  8. result <- xmlParse(file = "xml_data.xml")  
  9.   
  10. # Exracting the root node form the xml file.  
  11. root_node <- xmlRoot(result)  
  12.   
  13. # Getting the first element of the first node.  
  14. print(root_node[[1]][[1]])  
  15.   
  16. # Getting the fourth element of the first node.  
  17. print(root_node[[1]][[4]])  
  18.   
  19. # Getting the third element of the third node.  
  20. print(root_node[[3]][[3]])  

Output

R XML File

How to convert xml data into a data frame

It's not easy to handle data effectively in large files. For this purpose, we read the data in the xml file as a data frame. Then this data frame is processed by the data analyst. R provide xmlToDataFrame() function to extract the information in the form of Data Frame.

Let's see an example to understand how this function is used and processed:

Example

  1. # Loading the package required to read XML files.  
  2. library("XML")  
  3.   
  4. # Also loading the other required package.  
  5. library("methods")  
  6.   
  7. # Giving the input file name to the function xmlToDataFrame.  
  8. data_frame <- xmlToDataFrame("xml_data.xml")  
  9.   
  10. #Printing the result  
  11. print(data_frame)  

Output

R XML File

No comments:

Post a Comment

How to DROP SEQUENCE in Oracle?

  Oracle  DROP SEQUENCE   overview The  DROP SEQUENCE  the statement allows you to remove a sequence from the database. Here is the basic sy...