Saturday, November 28, 2020

What is Data Reshaping in R? Provide example.

 In R, Data Reshaping is about changing how the data is organized into rows and columns. In R, data processing is done by taking the input as a data frame. It is much easier to extract data from the rows and columns of a data frame, but there is a problem when we need a data frame in a format which is different from the format in which we received it. R provides many functions to merge, split, and change the rows to columns and vice-versa in a data frame.

Data Reshaping in R

Transpose a Matrix

R allows us to calculate the transpose of a matrix or a data frame by providing t() function. This t() function takes the matrix or data frame as an input and return the transpose of the input matrix or data frame. The syntax of t() function is as follows:

  1. t(Matrix/data frame)  

Let's see an example to understand how this function is used

Example

  1. <- matrix(c(4:12),nrow=3,byrow=TRUE)  
  2. a  
  3. print("Matrix after transpose\n")  
  4. <- t(a)  
  5. b  

Output:

Data Reshaping in R

Joining rows and columns in Data Frame

R allows us to join multiple vectors to create a data frame. For this purpose R provides cbind() function. R also provides rbind() function, which allows us to merge two data frame. In some situation, we need to merge data frames to access the information which depends on both the data frame. There is the following syntax of cbind() function and rbind() function.

  1. cbind(vector1, vector2,.......vectorN)  
  2. rbind(dataframe1, dataframe2,........dataframeN)  

Let's see an example to understand how cbind() and rbind() function is used.

Example

  1. #Creating vector objects  
  2. Name <- c("Shubham Rastogi","Nishka Jain","Gunjan Garg","Sumit Chaudhary")  
  3. Address <- c("Moradabad","Etah","Sambhal","Khurja")  
  4. Marks <- c(255,355,455,655)  
  5.   
  6. #Combining vectors into one data frame  
  7. info <- cbind(Name,Address,Marks)  
  8.   
  9. #Printing data frame  
  10. print(info)  
  11.   
  12. # Creating another data frame with similar columns  
  13. new.stuinfo <- data.frame(  
  14.     Name = c("Deepmala","Arun"),  
  15.     Address = c("Khurja","Moradabad"),  
  16.     Marks = c("755","855"),  
  17.     stringsAsFactors=FALSE  
  18. )  
  19.   
  20. #Printing a header.  
  21. cat("# # # The Second data frame\n")   
  22.   
  23. #Printing the data frame.  
  24. print(new.stuinfo)  
  25.   
  26. # Combining rows form both the data frames.  
  27. all.info <- rbind(info,new.stuinfo)  
  28.   
  29. # Printing a header.  
  30. cat("# # # The combined data frame\n")   
  31.   
  32. # Printing the result.  
  33. print(all.info)  

Output:

Data Reshaping in R

Merging Data Frame

R provides the merge() function to merge two data frames. In the merging process, there is a constraint i.e.; data frames must have the same column names.

Let's take an example in which we take the dataset about Diabetes in Pima Indian Women which is present in the "MASS" library. We will merge two datasets on the basis of the value of the blood pressure and body mass index. When selecting these two columns for merging, the records where values of these two variables match in both data sets are combined together to form a single data frame.

Example

  1. library(MASS)  
  2. merging_pima<- merge(x = Pima.te, y = Pima.tr,  
  3.    by.x = c("bp", "bmi"),  
  4.    by.y = c("bp", "bmi")  
  5. )  
  6. print(merging_pima)  
  7. nrow(merging_pima)  

Output:

Data Reshaping in R

Melting and Casting

In R, the most important and interesting topic is about changing the shape of the data in multiple steps to get the desired shape. For this purpose, R provides melt() and cast() function. To understand its process, consider a dataset called ships which is present in the MASS library.

Example

  1. library(MASS)  
  2. print(ships)  

Output:

Data Reshaping in R

Melt the Data

Now we will use the above data to organize it by melting it. Melting means the conversion of columns into multiple rows. We will convert all the columns except type and year of the above dataset into multiple rows.

Example

  1. library(MASS)  
  2. library(reshape2)  
  3. molten_ships <- melt(ships, id = c("type","year"))  
  4. print(molten_ships)  

Output:

Data Reshaping in R

Casting of Molten Data

After melting the data, we can cast it into a new form where the aggregate of each type of ship for each year is created. For this purpose, R provides cast() function.

Let's starts doing the casting of our molten data.

Example

  1. library(MASS)  
  2. library(reshape2)  
  3. #Melting the data  
  4. molten.ships <- melt(ships, id = c("type","year"))  
  5. print("Molted Data")  
  6. print(molten.ships)  
  7. #Casting of data  
  8. recasted.ship <- dcast(molten.ships, type+year~variable,sum)  
  9. print("Cast Data")  
  10. print(recasted.ship)  

Output:

Data Reshaping in R

No comments:

Post a Comment

How to DROP SEQUENCE in Oracle?

  Oracle  DROP SEQUENCE   overview The  DROP SEQUENCE  the statement allows you to remove a sequence from the database. Here is the basic sy...