Dplyr grammar of Data Manipulation in R

dplyr package is used for Data Manipulation in R.

So it is the reason that’s why dplyr is called grammar of Data Manipulation.

With the help of dplyr package, we can be able to manipulate data and extract useful information easily and quickly.

Installing and loading dplyr package in R

Installing dplyr



/**
  * install dplyr package
*/

install.package("dplyr")

loading dplyr



/**
  * load dplyr package
*/

library(dplyr)

dplyr package contains 5 verbs for Data Manipulation. That we will discuss in this post.

dplyr 5 verbs for Data Manipulation

Dplyr Function Description
Select () Returns subset of the Columns.
Filter() Returns subset of Rows.
Arrange() Reorder rows according to single or multiple variables.
Mutate() Used for adding a new column from the existing column.
Summarize() Reduce each group to a single row by calculate aggregate measure.

For Exploring our knowledge in dplyr we have need a dataset.

So for that purpose, I am loading a variable.

For reading a CSV file we are using read.csv function.

Syntax :



/**
  * load dplyr package
*/


read.csv("file location")

Example:



/**
  * load dplyr package
*/

load_csv_data <- read.csv("/home/dheeraj/Downloads/brazilian-ecommerce/order_items_dataset.csv") ///

print(load_csv_data)

OutPut:

read csv file
Although read.csv() always return data.frame.

We can be able to store read.csv() return inside a variable as in previous example we are storing in load_csv_data. So if we are accessing variable it means we are accessing that data.

Now we are performing some operation with dplyr Package.
For that purpose, we taking all functions of dplyr package and performing some operations.

But Before using dplyr 5 verbs. We should observe our data on which we are going to implement dplyr verbs.
And dplyr package contains a separate function glimpse() for that purpose.


glimpse(): returns Observation of Data frame.
Syntax :


/**
  * load dplyr package
*/

glimpse(data_frame)

Example:



/**
  * load dplyr package
*/

glimpse(load_csv_data)

OutPut:
glimpse
So glimpse() function provides details about our dataframe.
Now we understood that our data frame contains 112,650 rows and 7 columns as shown below.



Observations: 112,650
Variables: 7
$ order_id            
$ order_item_id       
$ product_id          
$ seller_id          
$ shipping_limit_date 
$ price               
$ freight_value

Selecting Column using Select

Select returns subset of columns.
In other words, we can also say that remove columns from the dataset.
Syntax :



/**
  * select syntax
*/

Select(df, column1,column2)
Where df: data frame 
      Column1: name of column1
      Column2: name of column2

Example:

We have already loaded our data inside our load_csv_data variable. So Now perform these following actions

  • Extract order_id,product_id,price from load_csv_data.
  • Extract order_id,order_item_id, product_id,seller_id,shipping_limit_date,price from load_csv_data.
  1. Extract order_id,product_id,price from load_csv_data.


/**
  * select example for extracting columns 
*/

select_Columns <- select(load_csv_data,order_id,product_id,price)

print(select_Columns)

OutPut:

2. Extract order_id,order_item_id, product_id,seller_id,shipping_limit_date,price from load_csv_data.



/**
  * select example for extracting columns 
*/

select_Columns <- select(load_csv_data,order_id,order_item_id, product_id,seller_id,shipping_limit_date,price)

print(select_Columns)

OutPut:

If we have to access columns in a sequence i.e no column should be removed from the sequence.

As in the previous example, we want to access from order_id to price and these columns are in sequence and no column is missing inside sequence.

In this case, we can access our columns like 



/**
  * syntax for select verb in dplyr 
*/


select(data_frame,column1:column(N-i))

So we can also perform our previous operation in this way.



/**
  * select example for extracting columns 
*/


select_Columns <- select(load_csv_data,order_id:price)

print(select_Columns)

OutPut:

Select rows using filter

glimpse() is used for filtering rows on basis of condition.
Syntax:



/**
  * filter syntax
*/

filter(data_frame,condition1 ... condition(N-i))

Example:

Filter row on basis of single condition

Extract row from load_csv_data data frame of which order_id is 5.



/**
  * filter example
*/


selected_rows <- filter(load_csv_data,order_id==5)

print(selected_rows)

OutPut:

Filter row on the basis of multiple conditions.

Extract rows from load_csv_data from which order_item_id=1 and shipping_limit_date= 2017-11-27 19:09:02



/**
  * filter example
*/
 

selected_rows_multiple_conditions <- filter(load_csv_data,order_item_id==1,shipping_limit_date==	 '2017-11-27 19:09:02')

print(selected_rows_multiple_conditions)

OutPut:

Leave a Reply

Your email address will not be published. Required fields are marked *