Reactive Programming using shiny

Reactive Programming helps us to build an interactive application using shiny.

In shiny, there are three fundamental components of Reactive Programming :

  1. Reactive source
  2. Reactive endpoint
  3. Reactive conductor

Reactive source

– User input that comes through browser interface typically.
– It can be connected through multiple endpoints.


# load library
library(shiny)
#create ui
ui <- fluidPage(
  textInput('name','Enter your name'),
  
)
# create server
server <- function(input,output,session){
  
}
# Run the app
shinyApp(ui,server)

Reactive endpoint

– Something that appears in the user’s browser window, such as a plot or a table of values.
– In simple words, we can say that output that typically appears in the browser window, such as a plot or a table of values.


#load library
library(shiny)
#create ui
ui <- fluidPage(
  textInput('name','Enter your name'),
  textOutput('greeting')
  
)
#create server
server <- function(input,output,session){
  output$greeting <- renderText({
    paste('Hello ',input$name)
  })
  
}
#Run the app
shinyApp(ui,server)

Reactive conductor

– Reactive component between a source and endpoint typically used to encapsulate slow computations.


# create server
server <- function(input,output,session){
  #plot putout
  output$plot_trendy_names <- ploty::renderPlotly({ babynames %>%
      filter(name == input$name) %>%
      ggplot(val_bnames, aes(x=year, y=n)) +
      geom_col()
  })
  #table output
  output$table_trendy_names <- DT::renderDT({ babynames %>%
      filter(name== input$name)
  })
}

Reactive Expression

– A Reactive expression is an R expression that uses widget input and returns a value.
– The reactive expression will update this value whenever the original widget changes.
– Reactive expressions are lazy and cached.

To create a reactive expression we use reactive function, which takes an R expression surrounded by braces (just like render function).



ui <- fluidPage(
  numericInput('nrows', 'Number of Rows', 10, 5, 30),
  tableOutput('table'), 
  plotOutput('plot')
)
# create server
server <- function(input, output, session){
 #input 1
 
  cars_1 <- reactive({
    print("Computing cars_1 ...")
    head(cars, input$nrows)
  })
#input 2
  cars_2 <- reactive({
    print("Computing cars_2 ...")
    head(cars, input$nrows*2)
  })
 #output plot
 
  output$plot <- renderPlot({
    plot(cars_1())
  })
#output table  
  output$table <- renderTable({
    cars_1()
  })
}
# Run the app
shinyApp(ui = ui, server = server)

Note: A reactive expression can call other reactive expressions.That allows us to modularize computations and ensure that they are NOT executed repeatedly.

You should also visit
Build a Hello world shiny App With R

More about shinny App (Inputs, Outputs, Layouts)

More about shinny App (Inputs, Outputs, Layouts)

In the previous post, we had Build a Hello world shiny App With R.
In this post, we will cover some points regarding shiny Package. During this journey, we will go through these points :


#Design Layout
#Shiny Inputs
#Shiny Outputs

Design Layout

For designing layout, we use sidebarLayout() function. As we know that all the functions we write inside UI parts. As we know all the functions that we use for designing we write these functions in the UI part.


ui <- fluidPage( sidebarLayout()
)

sidebarLayout()

sidebarLayout() functions contains two different panels.

  • sidebarPanel()
  • mainPanel()

So, Now our code will look like this


#sidebarPanel syntax
ui <- fluidPage(
        sidebarLayout({
             sidebarPanel(),
             mainPanel()
})
)

Some more points about sidebarLayout() function

  • sidebarPanel() : In this panel input is stored.
  • mainPanel() : In this panel outPut is stored.

#load library
library(shiny)
library(ggplot2)

# create a html ui with html function
ui <- fluidPage(
  sidebarLayout(
     sidebarPanel(
       #taking user input
       textInput("name","Enter your Name"style="border: 2px solid;")
     ),
    mainPanel(
      plotOutput('trend')
    )
  )
)
#create a server
server <- function(input, output, session) {
  output$trend <- renderPlot({
    ggplot()
  })
}
#Run App
shinyApp(ui = ui, server = server)

OutPut:

shiny Inputs

shiny provides variety of input elements.

textInput


#load library
library(shiny)
# create a html ui with html function
ui<- fluidPage(
  textInput('name','Enter your name'),
  textOutput('name')
)
#create a server
server <- function(input, output,session){
  output$name <- renderText({
    paste('Hi ', input$name)
  })
}
#Run App
shinyApp(ui,server)

OutPut:

selectInput


#load library
library(shiny)
# create a html ui with html function
ui<- fluidPage(
  selectInput('hobby','Enter your Hobby',
              choices = c("Reading","Playing","Dancing","Swimming")),
  textOutput('result')
),
#create a server
server <- function(input, output,session){
  output$result <- renderText({
    paste('You are found of ', input$hobby)
  })
}
#Run App
shinyApp(ui,server)

OutPut:

sliderInput


#load library
library(shiny)
# create a html ui with html function
ui <- fluidPage(
  sliderInput('salary','Select your salary',
              value=30000,min=10000,max=500000),
  plotOutput('salaryResult')
)
#create a server
server <- function(input,output,session){
  output$salaryResult <- renderPlot(
       hist(rnorm(input$salary))
  )
}
#Run App
shinyApp(ui, server)

OutPut:

Build a Hello world shiny App With R

Introduction to Shiny

Shiny is R Package that allows you to turn your analysis into an interactive and engaging  Web Application using R.

In this post, we will develop a Hello World shiny application. In an upcoming post, we will explore more about the shiny package and develop more applications.

Now without wasting our time, we move for developing shiny App.

For developing a shiny we have to follow these steps :

  • Install Shiny Package
  • Load Shiny Package
  • Create a HTML UI with HTML function
  • Create a Server
  • Run the app

Install Shiny Package


# install shiny package
install.packages("shiny")

Load Shiny Package


# load shiny package
library(shiny)

Create a HTML UI with HTML function


#Create a HTML UI with HTML function
ui <- fluidPage("Hello World")

Create a Server


#create shiny server
server <- function(input,output,session){
}

Run the app


#Run the app
shinyApp(ui=ui,server=server)

Hence for building Hello World app with shiny we have write following lines of code.


# install shiny package
install.packages("shiny")
# load shiny package
library(shiny)
#Create a HTML UI with HTML function
ui <- fluidPage("Hello World")
#create shiny server
server <- function(input,output,session){
}
#Run the app
shinyApp(ui=ui,server=server)

OutPut

Now we are modifying our script and make it a little bit dynamic.
For that, we will add an input textbox When you insert your name inside the textbox. Suppose I have inserted my name inside textbox then it will contact with Hello World and complete text will be Hello World Dheeraj.

So, Now the question rises how to add an input text field for entering your name then for that purpose Shiny has defined some functions.


#taking user input
ui <- fluidPage(textInput("name","Enter your Name"),
                textOutput("q"))

Then after we will concate this input name with Hello World text as shown below.


load library
library(shiny)

# create a html ui with html function
#taking user input
ui <- fluidPage(textInput("name","Enter your Name"),
                textOutput("q"))
#create a server
server <- function(input,output,session){
  output$q <- renderText({
    paste("Hello World ",input$name)
  })
}
#Run an PP
shinyApp(ui, server)

OutPut

Data visualization Examples in R

In this post, We will explore more examples of Data Visualization using R. For that purpose we are using mtcars as dataset
here is a list of all the features of the observations in mtcars:

  • mpg — Miles/(US) gallon
  • cyl — Number of cylinders
  • disp — Displacement (cu.in.)
  • hp — Gross horsepower
  • drat — Rear axle ratio
  • wt — Weight (lb/1000)
  • qsec — 1/4 mile time
  • vs — V/S engine.
  • am — Transmission (0 = automatic, 1 = manual)
  • gear — Number of forward gears
  • carb — Number of carburetors

Example 1: Plot graph on X and Y axis


# include ggplot2 library
library(ggplot2)

# 1 - Map mpg to x and cyl to y
ggplot(mtcars, aes(x=mpg, y=cyl)) +
  geom_point()

# 2 - Reverse: Map cyl to x and mpg to y
ggplot(mtcars, aes(x=cyl, y=mpg)) +
  geom_point()

OutPut:

Example 2: Change the color, shape, and size of the points


# include ggplot2 library
library(ggplot2)

#chnage color,shape and Size
ggplot(mtcars, aes(x=wt, y=mpg, col=cyl)) +
  geom_point(shape=1, size=4)

OutPut:

Example 3: Add alpha and fill


# include ggplot2 library
library(ggplot2)
# Expand to draw points with alpha 0.5 and fill cyl
ggplot(mtcars, aes(x = wt, y = mpg, fill = cyl)) +geom_point(alpha=0.5)

OutPut:

Exercise 4: Change Shape and color


library(ggplot2)
# Change shape and color
ggplot(mtcars, aes(x = wt, y = mpg, fill = cyl)) +geom_point(shape=24,col="yellow")

OutPut:

Exercise 5: Change shape and Size


# include ggplot2 library
library(ggplot2)
# Define a hexadecimal color
change_color <- "#4ABEFF"
# Set the fill aesthetic; color, size and shape attributes
ggplot(mtcars,aes(x=wt,y=mpg,fill=cyl))+ geom_point(size=10,shape=23,col=change_color)

OutPut:

Explore row data using R

Understanding the structure of your Data

View dimensional

Syntax:


dim()

Example:


# dimensional of mtcars
dim(mtcars)

OutPut:

Looking your DataView dimensional

head()

  • view top of the dataset

Note: By default, it fetches 6 rows but we can also vary a number of rows.
Syntax:


head()

Example:


#head of mtcars
head(mtcars)
# we can vary number of rows
head(mtcars,n=8)

OutPut:

tail()

  • view bottom of the dataset

Example:


#Syntax:tail()
#tail of mtcars
tail(mtcars)

# we can vary number of rows
tail(mtcars,n=8)

Output:
Visualizing your data

hist()

  • view histogram of a single variable

Example:


Syntax:hist()
#histogram
hist(mtcars$mpg)

OutPut:

plot()

  • view plot of two variables

Example:


Synatx: plot()
#plot
plot(mtcars$mpg,mtcars$qsec)

OutPut:

Gather

  • Gather columns into key-value pairs

Syntax:




gather (data, key, value, ...)
/**
 *
data: a data frame
key: bare name of the new key column
value: bare name of the new value column
*/

Spread

  • Opposite of Gather
  • Spread key-value pairs into columns
  • Takes key-value pairs and spread them into multiple columns

Syntax:


spread(data, key, value)
/**
 *
 data: a data frame
 key: bare name of the column containing keys
 value: bare name of the column containing values
*/

Separating columns

  • The separate() function allows you to separate one column into multiple columns.
  • In the case of separate() function, we can also specify sep as an argument for specifying separator.

Syntax:


seperate(data, column_set, c("column1", "column2"))

Uniting column

  • Opposite of separate is unite

Syntax:


unite(data, column-set, c("column1", "column2"))

Note: we can also specify separator between these two columns

Introduction to Data Visualization in R

Data Visualization is an essential component of your skillset as a Data Scientist or Data Analyst. Data Visualization is basically a form of Visual communication.

ggplot2 is a plotting package that helps us to create complex plots from data in data frame.

ggplot2 functions built step by step by adding new elements

Install ggplot2 package




# install ggplot2

install.packages(ggplot2)

Load ggplot2 package



# include ggplot2 library

library(ggplot2)

During this discussion, we are going to use mtcars package for the dataset.
Note:
The matcars dataset contains information about 32 cars from 1973 motor trends magazine. The dataset is small but contains a variety of continuous and categorical variables.

Before describing ggplot2 in more detail just have a look mtcars dataset using str() command.



#structure of matcarsbasically
str(mtcars);

OutPut:

Have a look ggplot2 example 

Example:



# include ggplot2 library
library(ggplot2)
ggplot(mtcars , aes(x=wt, y=mpg))+geom_point()

OutPut:

Some points regarding ggplot2ppp

  • VisualizationVisual elements in ggplot2 are called geoms (as in geometric objects bars, points …)
  • The appearance and location of these geoms (size, color) are controlled by aesthetic properties.basicallybasically
  • aesthetic properties are shown by aes()
  • Variable that you want to plot is represented by aes() as shown in the previous example.
Goem layer Description
geom_bar() Create a layer with bars representing different statistical properties.
geom_point() Create a layer with data points.
geom_line() Create a layer with a straight line.
geom_smooth() Create a layer with smoother.
geom_histogram() Create a layer with a histogram.
geom_blogplot() Create a layer with text in it.
geom_text() Create a layer with a text in it.
geom_error_bar() Create a layer with error bars in it.
geom_hline and geom_vline() Create a layer with a user-defined horizontal and vertical line respectively.

How to derive iris.tidy from iris?



library(tidyr)
#Convert iris to iris.tidy using tidy function
iris.tidy <- iris %>%
  gather(key, Value, -Species) %>%
  separate(key, c("Part", "Measure"), "\\.")

print(head(iris.tidy))

How to derive iris.wide from iris?


# Load the tidyr package
library(tidyr)
# Add column with unique ids (don't need to change)
iris$Flower <- 1:nrow(iris)
# Produce the iris.wide dataset
iris.wide <- iris %>%
  gather(key, value, -Species, -Flower) %>%
  separate(key, c("Part", "Measure"), "\\.") %>%
  spread(Measure, value)

OutPut:

 

Some important functions in R

There are some important functions that we are using in Day to day life. I have categories these functions in these categories for understanding.

  • Common function
  • String Function
  • Looping function

Common function

At first, I am dealing with some common functions in R that we use in day to day life during development.

abs() : Calculate absolute value.

 Example:


  
  # abs function
  
amount <- 56.50
absolute_amount <- abs(amount)
print(absolute_amount)
sum(): Calculate the sum of all the values in the data structure.

 Example:


  
  # sum function
  
myList <- c(23,45,56,67)
sumMyList <- sum(myList)
print(sumMyList)
mean() : Calculate arithmetic mean.

 Example:


  
  # mean function
  
myList <- c(23,45,56,67)
meanMyList <- mean(myList)
print(meanMyList)
round() : Round the values to 0 decimal places by default.

 Example:


    
  ############## round function #####################
    
  amount <- 50.97
  print(round(amount));
 
seq(): Generate sequences by specifying the from, to, and by arguments.

 Example:

String Function
    
     # seq() function
     # seq.int(from, to, by)
    
  sequence_data <- seq(1,10, by=3)
  print(sequence_data)
  
rep(): Replicate elements of vectors and lists.

 Example:


    
    #rep exampleString Function
    #rep(x, times)
    sequence_data <- seq(1,10, by=3)  
    repeated_data <- rep(sequence_data,3)
    print(repeated_data)

  
sort(): sort a vector in ascending order, work on numerics.

 Example:


    
    #sort function
    
  data_set <- c(5,3,11,7,8,1)
  sorted_data <- sort(data_set)Functionround
  print(sorted_data)
  
rev(): Reverse the elements in a data structure for which reversal is defined.

 Example:


    
   # reverse function 
    String Function
  data_set <- c(5,3,11,7,8,1)
  sorted_data <- sort(data_set)
  reverse_data <- rev(sorted_data)
  print(reverse_data)
  
str(): Display the structure of any R Object.

 Example:


    
  # str function 
    
  myWeeklyActivity <- data.frame(
    activity=c("mediatation","exercie","blogging","office"),
    hours=c(7,7,30,48)
  )
  print(str(myWeeklyActivity))
  
append() : Merge vectors or lists.

 Example:


    
   #append function 
    
  activity=c("mediatation","exercie","blogging","office")
  hours=c(7,7,30,48)
  append_data <- append(activity,hours)
  print(append_data)
  
is.*(): check for the class of an R Object.

 Example:


    
  #is.*() function
    
  list_data <- list(log=TRUE,
                    chStr="hello"
                    int_vec=sort(rep(seq(2,10,by=3),times=2)))
  print(is.list(list_data))
  
as.*(): Convert an R Object from one class to another.

 Example:


    
  #as.*() function
    
  list_data <- as.list(c(2,4,5))
  print(is.list(list_data))
  

String Function

Now we are discussing some string function that plays a vital role during data cleaning or data manipulation.

These are functions of stringr package.So, before using these functions at first you have to install stringr package.


    
  # import string library
    
  library(stringr)
 
str_trim () : removing white spaces from string.

 Example:


    
    ############### str_trim ####################
    
   trim_result <- str_trim(" this is my string test. ");
   print("Trim string")
   print(trim_result)
 
str_detect(): search for a string in a vector.That returns boolean value.

 Example:


    
  ############### str_detect ####################
    
  friends <- c("Alice","John","Doe")
  string_detect <- str_detect(friends,"John")
  print("String detect ...")
  print(string_detect)
 
str_replace() : replace a string in a vector.

 Example:


    
  ############## str_replace #####################
    
  str_replace(friends,"Doe","David")
  print("friends list after replacement ....");
  print(friends);
 
tolower() : make all lowercase.

 Example:


    
  ############## tolower #####################
    
  myupperCasseString <- "THIS IS MY UPPERCASE";
  print("lower case string ...");
  print(tolower(myupperCasseString));
 
toupper() : make all uppercase.

 Example:


    
  ############## toupper #####################
    
  myupperCasseString <- "My name is Dheeraj";
  print("Upper case string ...");
  print(toupper(myupperCasseString));

 

Lopping

lapply(): Loop over a list and evaluate a function on each element.

 Some important points regarding lapply :

# lapply takes three arguments:

  1. list X
  2. function (or name the function) FUN
  3. … other argumnts

# lapply always returns list, regardless of the class of input.

Example:


    
  ############## lapply example #####################
    
  x <- list(a = 1:5,rnorm(10))
  lapply(x,mean)
 

OutPut:

Anonymous function

Anonymous functions are those functions that have no name.


    
  ############## lapply example #####################
  # Extract first column of matrix 
    
  
  x <- list(a=matrix(1:4,2,2),b=matrix(1:6,3,2))
  lapply(x,function(elt)elt[,1])
 

OutPut:


Use function with lapply


    
  ############## lapply example #####################
  # multiply each element of list with factor
    
  
  multiply <- function(x,factor){
    x * factor
  }

lapply(list(1,2,3),multiply,factor=3)
 

OutPut:

sapply(): Same as lapply but try to simplify the result.

 Example:


    
  ############## sapply example #####################
  # multiply each element of list with factor
    
  multiply <- function(x,factor){
    x * factor
  }
sapply(list(1,2,3),multiply,factor=3)
 

OutPut:

apply() : Apply a function over the margin of an array.

 Example:


    
  ############## apply function #####################
    
  mat1 <- matrix(c(1:10),nrow=5,ncol = 6)
  apply(mat1,2, sum)
 

OutPut:

tapply(): Apply a function over subsets of a vector.

 Example:


    
  ############## tapply function #####################
    
  tapply(mtcars$mpg, list(mtcars$cyl, mtcars$am), mean)
 

OutPut:

mapply():Multivariate version of lapply.

 Example:

Joining data in R using Dplyr

Working with data, Joining is the common operation.

Joining means combine i.e combine the data from two or more than two different sources on the basis of some conditions.

For performing such type of operation in R dplyr is the best option for doing so.

During this post, we will these key points.

  • Types of Joins in R
  • Syntax
  • Joining on DataFrame
  • Joining on tables

Types of Joins

There are six types of Joins in R :

  1. Inner Join (inner_join)
  2. Left Join (left_join)
  3. Right Join (right_join)
  4. Full Join (full_join)
  5. Semi Join (semi_join)
  6. Anti Join (anti_join)

Syntax




# Syntax of Joining in R

Join_type(x,y,by=condition)

/**
  * x: dataframe1/table1
  * y: dataframe2/table2
*/

Joins are basically applied on tables and in case of R files are to be considered as tables.

For a better understanding of joins, we are taking two files 

  1. Product_category_name.csv (product_category_name, product_category_name_english)
  2. product_dataset.csv(product_id,product_category_name,product_name_lenght, product_description_lenght, product_photos_qty, product_weight_g, product_length_cm, product_height_cm, product_width_cm)

You can download these files for your practice my GitHub account

As we know during applying joins on two different datasets or tables we need a common field on the basis of that we can be able to apply a join.

So, in this case, both files contain product_category_name so on the basis of that we can apply to join.

For using a CSV file as a table we are using command 




# read csv file

read.csv(file_name)

So for applying joins on these two files at first, we have to consider these two files as two tables using read.csv() command. As shown below.




# load data


table1 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/product_category_name.csv") # file location of Product_category_name.csv
table2 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/products_dataset.csv")   # file location of product_dataset.csv 

Inner Joins

Syntax :



# inner join syntax


inner_join(x,y,by='condition')

Examples :

Select all columns

library(dplyr)
table1 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/product_category_name.csv") # file location of Product_category_name.csv
table2 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/products_dataset.csv")   # file location of product_dataset.csv 

appliedInnerJoin <- inner_join(table1,table2,by='product_category_name')

print(head(appliedInnerJoin,n =20))

OutPut:

Select specified columns

If we don’t want to extract all columns then we can select specified columns using select command.

Suppose In this after applying inner join on table1 and table2 we don’t want to extract all columns of these tables but also we want to extract only
product_id.product_category_name_english.

Then we can use select command of dplyr package (for more detail click here) and extract specified columns that we want to extract

Example


library(dplyr)
table1 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/product_category_name.csv") # file location of Product_category_name.csv
table2 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/products_dataset.csv")   # file location of product_dataset.csv 

appliedInnerJoin <- inner_join(table1,table2,by='product_category_name')
#select specified column

specified_columns <- select(appliedInnerJoin,product_id,product_category_name_english)
print(specified_columns)

OutPut:

left join

Syntax :


left_join(x,y,by='condition')
 

Example:


library(dplyr)
table1 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/product_category_name.csv") # file location of Product_category_name.csv
table2 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/products_dataset.csv")   # file location of product_dataset.csv 
# left join
appliedLeftJoin <- left_join(table1,table2,by='product_category_name')<img src="http://www.krdheeraj.info/wp-content/uploads/2020/01/leftJoin.png" alt="" width="1345" height="579" class="alignnone size-full wp-image-496" />

print(head(appliedLeftJoin,n=10))

OutPut:

right join

Syntax :


right_join(x,y,by='condition')
 

EXample


library(dplyr)
table1 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/product_category_name.csv") # file location of Product_category_name.csv
table2 <- read.csv("/home/dheeraj/Desktop/Blog_post/joins_in_r_using_dplyr/dataset/products_dataset.csv")   # file location of product_dataset.csv 
# left join
appliedRightJoin <- right_join(table1,table2,by='product_category_name')

print(head(appliedRightJoin,n=10))

Output:

Dplyr grammar of Data Manipulation in R

dplyr package is used for Data Manipulation in R.

So it is the reason that’s why dplyr is called grammar of Data Manipulation.

With the help of dplyr package, we can be able to manipulate data and extract useful information easily and quickly.

Installing and loading dplyr package in R

Installing dplyr



/**
  * install dplyr package
*/

install.package("dplyr")

loading dplyr



/**
  * load dplyr package
*/

library(dplyr)

dplyr package contains 5 verbs for Data Manipulation. That we will discuss in this post.

dplyr 5 verbs for Data Manipulation

Dplyr Function Description
Select () Returns subset of the Columns.
Filter() Returns subset of Rows.
Arrange() Reorder rows according to single or multiple variables.
Mutate() Used for adding a new column from the existing column.
Summarize() Reduce each group to a single row by calculate aggregate measure.

For Exploring our knowledge in dplyr we have need a dataset.

So for that purpose, I am loading a variable.

For reading a CSV file we are using read.csv function.

Syntax :



/**
  * load dplyr package
*/


read.csv("file location")

Example:



/**
  * load dplyr package
*/

load_csv_data <- read.csv("/home/dheeraj/Downloads/brazilian-ecommerce/order_items_dataset.csv") ///

print(load_csv_data)

OutPut:

read csv file
Although read.csv() always return data.frame.

We can be able to store read.csv() return inside a variable as in previous example we are storing in load_csv_data. So if we are accessing variable it means we are accessing that data.

Now we are performing some operation with dplyr Package.
For that purpose, we taking all functions of dplyr package and performing some operations.

But Before using dplyr 5 verbs. We should observe our data on which we are going to implement dplyr verbs.
And dplyr package contains a separate function glimpse() for that purpose.


glimpse(): returns Observation of Data frame.
Syntax :


/**
  * load dplyr package
*/

glimpse(data_frame)

Example:



/**
  * load dplyr package
*/

glimpse(load_csv_data)

OutPut:
glimpse
So glimpse() function provides details about our dataframe.
Now we understood that our data frame contains 112,650 rows and 7 columns as shown below.



Observations: 112,650
Variables: 7
$ order_id            
$ order_item_id       
$ product_id          
$ seller_id          
$ shipping_limit_date 
$ price               
$ freight_value

Selecting Column using Select

Select returns subset of columns.
In other words, we can also say that remove columns from the dataset.
Syntax :



/**
  * select syntax
*/

Select(df, column1,column2)
Where df: data frame 
      Column1: name of column1
      Column2: name of column2

Example:

We have already loaded our data inside our load_csv_data variable. So Now perform these following actions

  • Extract order_id,product_id,price from load_csv_data.
  • Extract order_id,order_item_id, product_id,seller_id,shipping_limit_date,price from load_csv_data.
  1. Extract order_id,product_id,price from load_csv_data.


/**
  * select example for extracting columns 
*/

select_Columns <- select(load_csv_data,order_id,product_id,price)

print(select_Columns)

OutPut:

2. Extract order_id,order_item_id, product_id,seller_id,shipping_limit_date,price from load_csv_data.



/**
  * select example for extracting columns 
*/

select_Columns <- select(load_csv_data,order_id,order_item_id, product_id,seller_id,shipping_limit_date,price)

print(select_Columns)

OutPut:

If we have to access columns in a sequence i.e no column should be removed from the sequence.

As in the previous example, we want to access from order_id to price and these columns are in sequence and no column is missing inside sequence.

In this case, we can access our columns like 



/**
  * syntax for select verb in dplyr 
*/


select(data_frame,column1:column(N-i))

So we can also perform our previous operation in this way.



/**
  * select example for extracting columns 
*/


select_Columns <- select(load_csv_data,order_id:price)

print(select_Columns)

OutPut:

Select rows using filter

glimpse() is used for filtering rows on basis of condition.
Syntax:



/**
  * filter syntax
*/

filter(data_frame,condition1 ... condition(N-i))

Example:

Filter row on basis of single condition

Extract row from load_csv_data data frame of which order_id is 5.



/**
  * filter example
*/


selected_rows <- filter(load_csv_data,order_id==5)

print(selected_rows)

OutPut:

Filter row on the basis of multiple conditions.

Extract rows from load_csv_data from which order_item_id=1 and shipping_limit_date= 2017-11-27 19:09:02



/**
  * filter example
*/
 

selected_rows_multiple_conditions <- filter(load_csv_data,order_item_id==1,shipping_limit_date==	 '2017-11-27 19:09:02')

print(selected_rows_multiple_conditions)

OutPut:

Web Scraping with R

Now a days to run a business we have need to understand business pattern, Client behavior, Culture, location, environment. In the absence of these, we can’t be able to run a business successfully.

With the help of these factors, the probability of growing our business will become high.

So, In simple terms to understand and run a business successfully, we have need Data from which we can be able to understand Client behavior, business pattern, culture, location, and environment.

Today one of the best sources for collecting data is web and to collect data from the web there are various methods.

One of them is Web Scraping in different languages we extract data from web from different ways.

Here we will discuss some of the methods for extracting data from the web using R Language.

There are various resources on the web and we have various techniques for extracting data from these different resources.

Some of these resources are :

    • Google sheets
    • Wikipedia
    • Extracting Data from web tables
    • Accessing Data from Web 

Read Data in html tables using R

Generally for storing large amounts of data on the web, we use tables and here we are discussing the way for extracting data from html tables. So, without further delay, we are following steps for extracting data from html tables.

During this session, we have design some steps  so that anyone can be able to access html tables following steps:

  1. Install library
  2. Load library
  3. Get Data from url
  4. Read HTML Table
  5. Print Result

Install library


# install library
install.packages('XML')
install.packages('RCurl')

During reading data from web generally, we used these library 


# load library 
library(XML)
library(RCurl)

Get Data from url

During this session, we will extract the list of Nobel laureates from Wikipedia page and for that at first copy url of table and here is url


https://en.wikipedia.org/wiki/List_of_Nobel_laureates#List_of_laureate

In R we write these lines of code for getting data from url.


# Get Data from url
url <- "https://en.wikipedia.org/wiki/List_of_Nobel_laureates#List_of_laureates"
url_data <- getURL(url)

Read HTML Table

Now it’s time to read table and extract information from the table and for that we will use readHTMLTable() function.


# Read HTML Table
data <- readHTMLTable(url_data,stringAsFactors=FALSE)

Print Result

Finally, our data has been stored in data variable and now we can print this


# print result
print(data) 

Here is complete code for reading HTML table from web using R


# install library
install.packages('XML')
install.packages('RCurl')
# load library 
library(XML)
library(RCurl)
# Get Data from url
url <- "https://en.wikipedia.org/wiki/List_of_Nobel_laureates#List_of_laureates"
url_data <- getURL(url)
# Read HTML Table
data <- readHTMLTable(url_data,stringAsFactors=FALSE)
# print result
print(data)

rvest package for Scraping

rvest is most important package for scraping webpages. It is designed to work with magrittr to make it easy to scrape information from the Web inspired by beautiful soup.

Why we need some other package when we already have packages like XML and RCurl package?

During Scraping through XML and RCurl package we need id, name, class attribute of that particular element.
If Our element doesn’t contain such type of attribute, then we can’t be able to Scrap information from the website.
Apart from that rvest package contains some essential functions that enhance its importance from other packages.
During this session also, we have to follow the same steps as we have designed for XML and RCurl package access HTML tables. We are repeating these steps :

    1. Install package
    2. Load package
    3. Get Data from url 
    4. Read HTML Table
    5. Print Result

we are repeating same example as we have discussed before but with rvest package.

Install package


# install package
install.packages('rvest')

Load package


#load package
library('rvest')

Get Data from url and Read HTML Table


url <- 'https://en.wikipedia.org/wiki/List_of_Nobel_laureates'
# Get Data from url and Read HTML Table 
prize_data <- url %>% read_html() %>% html_nodes(xpath = '//*[@id="mw-content-text"]/div/table[1]') %>%
  html_table(fill = TRUE)

Here we have combined two steps with a single step and i.e beauty of piping in R. Apart from that inside html_nodes() method we have used XPath.
Yeah with rvest package we have to use Xpath of element that we want to copy.
And steps for copy XPath as shown below inside image in which we are copying XPath of table

print Data


#print Data
print(prize_data)

Here is complete code for reading HTML table from web using rvest in R


# install package

install.packages('rvest')

#load package
library('rvest')
url <- 'https://en.wikipedia.org/wiki/List_of_Nobel_laureates'
# Get Data from url and Read HTML Table
prize_data <- url %>% read_html() %>% html_nodes(xpath = '//*[@id="mw-content-text"]/div/table[1]') %>%
  html_table(fill = TRUE)
# read prize data
print(prize_data)

Both example i.e reading table with XML and RCurl package and reading the same table with rvest package that will be looked like 
Output:

web Scraping output in R
Note : we will go through more examples on rvest in the next post but before that we took a quick introduction of googlesheets package in R.

Extracting Data from Google sheets :

Google sheets became one of the most important tools for storing data on the web. It is also useful for Data Analysis on the web.

In R we have a separate package for  extracting Data from web i.e googlesheets

How to use Google Sheets with R?

In this section, we will explain how to use googlesheets package for extracting information from google sheets.

We have the process of extracting data from google sheets  in 5 steps

  1. Installing googlesheets 
  2. Loading googlesheets 
  3. Authenticate google account
  4. Show list of worksheets
  5. Read a spreadsheets
  6. Modify the sheet

Install googlesheets package


install.package("googlesheets")

Loading googlesheet


library("googlesheets")

Authenticate google account


gs_ls()

After that in the browser authentication page will be opened like shown below

Complete code


 # install packages 
 install.packages('googlesheets') 
 # load library
 library('googlesheets')
 # Authentication complete,Please close this page and return 
 gs_ls()
 # take worksheet with title
 take_tile <- gs_title("amazon cell phone items")
 #get list of worksheets
  gs_ws_ls(be)
    

More Tutorials on R

Introduction to Text mining in R

Introduction to Text mining in R Part 2

Create Word Cloud in R