Joining data in R using Dplyr

Working with data, Joining is the common operation.

Joining means combine i.e combine the data from two or more than two different sources on the basis of some conditions.

For performing such type of operation in R dplyr is the best option for doing so.

During this post, we will these key points.

  • Types of Joins in R
  • Syntax
  • Joining on DataFrame
  • Joining on tables

Types of Joins

There are six types of Joins in R :

  1. Inner Join (inner_join)
  2. Left Join (left_join)
  3. Right Join (right_join)
  4. Full Join (full_join)
  5. Semi Join (semi_join)
  6. Anti Join (anti_join)

Syntax




# Syntax of Joining in R

Join_type(x,y,by=condition)

/**
  * x: dataframe1/table1
  * y: dataframe2/table2
*/

Joins are basically applied on tables and in case of R files are to be considered as tables.

For a better understanding of joins, we are taking two files 

  1. Product_category_name.csv (product_category_name, product_category_name_english)
  2. product_dataset.csv(product_id,product_category_name,product_name_lenght, product_description_lenght, product_photos_qty, product_weight_g, product_length_cm, product_height_cm, product_width_cm)

You can download these files for your practice my GitHub account

As we know during applying joins on two different datasets or tables we need a common field on the basis of that we can be able to apply a join.

So, in this case, both files contain product_category_name so on the basis of that we can apply to join.

For using a CSV file as a table we are using command 




# read csv file

read.csv(file_name)

So for applying joins on these two files at first, we have to consider these two files as two tables using read.csv() command. As shown below.




# load data


table1 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/product_category_name.csv") # file location of Product_category_name.csv
table2 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/products_dataset.csv")   # file location of product_dataset.csv 

Inner Joins

Syntax :



# inner join syntax


inner_join(x,y,by='condition')

Examples :

Select all columns

library(dplyr)
table1 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/product_category_name.csv") # file location of Product_category_name.csv
table2 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/products_dataset.csv")   # file location of product_dataset.csv 

appliedInnerJoin <- inner_join(table1,table2,by='product_category_name')

print(head(appliedInnerJoin,n =20))

OutPut:

Select specified columns

If we don’t want to extract all columns then we can select specified columns using select command.

Suppose In this after applying inner join on table1 and table2 we don’t want to extract all columns of these tables but also we want to extract only
product_id.product_category_name_english.

Then we can use select command of dplyr package (for more detail click here) and extract specified columns that we want to extract

Example


library(dplyr)
table1 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/product_category_name.csv") # file location of Product_category_name.csv
table2 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/products_dataset.csv")   # file location of product_dataset.csv 

appliedInnerJoin <- inner_join(table1,table2,by='product_category_name')
#select specified column

specified_columns <- select(appliedInnerJoin,product_id,product_category_name_english)
print(specified_columns)

OutPut:

left join

Syntax :


left_join(x,y,by='condition')
 

Example:


library(dplyr)
table1 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/product_category_name.csv") # file location of Product_category_name.csv
table2 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/products_dataset.csv")   # file location of product_dataset.csv 
# left join
appliedLeftJoin <- left_join(table1,table2,by='product_category_name')<img src="http://www.krdheeraj.info/wp-content/uploads/2020/01/leftJoin.png" alt="" width="1345" height="579" class="alignnone size-full wp-image-496" />

print(head(appliedLeftJoin,n=10))

OutPut:

right join

Syntax :


right_join(x,y,by='condition')
 

EXample


library(dplyr)
table1 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/product_category_name.csv") # file location of Product_category_name.csv
table2 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/products_dataset.csv")   # file location of product_dataset.csv 
# left join
appliedRightJoin <- right_join(table1,table2,by='product_category_name')

print(head(appliedRightJoin,n=10))

Output:

Leave a Reply

Your email address will not be published. Required fields are marked *