Joins in SQL for Data Analyst

It is the age of Data and now we Data assume data is more precious than Gold. In Data world SQL is one of the best tool used for Data Analysis.

Types of Joins

Basically, there are four types of JOINS in SQL.

      • CROSS JOIN
      • INNER JOIN

                     –   EQUI JOIN
                                 –  NATURAL JOIN

    • OUTER JOIN

                     –   LEFT JOIN (LEFT OUTER JOIN)
                     –   RIGHT JOIN (RIGHT OUTER JOIN)
                     –   FULL JOIN    (FULL OUTER JOIN)

    • SELF JOIN

For practical implementation of JOINS, we are taking two tables 




# Table Details


Products_dataset:-(product_id,product_category_name,product_name_lenght,product_description_lenght,product_photos_qty,product_weight_g,product_length_cm,product_height_cm,product_width_cm)

product_category_name_translation :- (product_category_name	product_category_name_english)

CROSS JOIN

CROSS JOIN returns cartesian product of two tables.

Cartesian Product

If table A contains m rows and table B contains n rows, then cartesian product of table A and table B will contain m*n rows.
Syntax :




# cross join syntax


SELECT * FROM tableA CROSS JOIN tableB

     or
     
SELECT * FROM tableA,tableB

Example :


SELECT * FROM products_dataset CROSS JOIN product_category_name_translation

OutPut:

INNER JOIN

return all rows from tableA where there are matching values in tableB and all columns from tableA and tableB. If there are multiple matches between tableA and tableB then all matches will return from tableA and tableB on the basis of a specified condition.

And condition has been mentioned in the syntax after the last table.
Syntax:



# Inner join syntax

SELECT * FROM tableA INNER JOIN tableB ON (tableA.x = tableB.y)


or 

SELECT * FROM tableA JOIN tableB ON (tableA.x = tableB.y)

Note: By default, JOIN is considered as INNER JOIN or EQUI JOIN.

Example :



# inner join example

SELECT * FROM products_dataset pds INNER JOIN product_category_name_translation pct
ON pds.product_category_name=pct.product_category_name

OutPut :

LEFT JOIN (LEFT OUTER JOIN) :

return all rows from tableA, and all columns from tableA and tableB. Rows in tableA with no match in tableB will have NA values in the new columns. If there are multiple matches between tableA and tableB, all combinations of the matches are returned.
Syntax :



# left join syntax

SELECT * FROM tableA LEFT JOIN tableB ON (tableA.x = tableB.y)

Example:



#  left join example

SELECT * FROM products_dataset pds LEFT JOIN product_category_name_translation pct

ON pds.product_category_name=pct.product_category_name

OutPut :

RIGHT JOIN (RIGHT OUTER JOIN)

return all rows from tableB, and all columns from tableA and tableB. Rows in tableB with no match in tableA will have NA values in the new columns. If there are multiple matches between tableA and tableB, all combinations of the matches are returned.
Syntax:



# right join syntax

SELECT * FROM tableA RIGHT JOIN tableB ON (tableA.x = tableB.y)

Example:



#  right join example

SELECT * FROM products_dataset pds RIGHT JOIN product_category_name_translation pct
ON pds.product_category_name=pct.product_category_name

OutPut:

Import CSV file in MySQL database using SQL

 

During Data Analysis it is a common situation during which we have to import CSV files into MYSQL database.

In this post, we will discuss the same scenario and a simple way for importing CSV files using SQL.

Before importing a CSV file we need to create a table that will contain all columns of CSV file.

Suppose I have CSV file order_items_dataset.csv and we want to import this CSV file in  MYSQL Database.

import csv file

 

Then before importing CSV file, we have to create a table in MYSQL Database.

So, At first, we are creating a table

Create a table in MYSQL Database



/**
  * create order_items_datasets table for importing csv file
*/

CREATE TABLE order_items_datasets (
order_id int PRIMARY KEY,
order_item_id int,
product_id varchar (200),
seller_id varchar(200),
shipping_limit_date DATETIME,
price float,
freight_value float
)

Now our table has been created successfully.

Now we need to import CSV file in order_items_datasets table.

For that, we have to follow these few steps.

STEP 1:   Load Data

For that purpose, we will LOAD DATA Statement.

LOAD DATA Statement: reads a row from a text file into a table with a very high speed.  


# comment line

LOAD DATA

Step 2: Include CSV file

For including CSV file from which we have to import Data we used INFILE command with LOAD DATA  Statement.

INFILE: allow us to read CSV data from a text file.

NOTE: LOAD DATA INFILE statement allows us to read CSV data from a text file and import that into a database table with very fast speed.  

So for Including a file, we will write 



# comment line

LOAD DATA INFILE "/home/dheeraj/Downloads/brazilian-ecommerce/order_items_dataset.csv"  -- file location

Step 4: Import Data to table

For importing data inside the table we will add INTO then after table name with LOAD DATA statement. 

As shown below for importing data into my new table order_items_datasets that we have created previously.



# import data inside table


LOAD DATA INFILE "/home/dheeraj/Downloads/brazilian-ecommerce/order_items_dataset.csv" -- file location
INTO TABLE order_items_datasets -- table name 

Step4: Extract values from CSV file 

CSV file contains comma-separated values. So we need to separate these values on the basis of rows and columns.

For this, we will add these few lines



# comment line

FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
LINES TERMINATED BY '\n'

Step5: Ignore header of CSV file

During importing CSV file we need to ignore the header. For this, we will add this line.



# ignore header 

IGNORE 1 ROWS;

After following these steps we can be able to import CSV files in MYSQL database.

Complete Query:



/**
Create table
*/

CREATE TABLE order_items_datasets (
order_id int PRIMARY KEY,
order_item_id int,
product_id varchar (200),
seller_id varchar(200),
shipping_limit_date DATETIME,
price float,
freight_value float
)

--  load data 
LOAD DATA

-- include file

LOAD DATA INFILE "/home/dheeraj/Downloads/brazilian-ecommerce/order_items_dataset.csv"  -- file location

-- import data inside table

LOAD DATA INFILE "/home/dheeraj/Downloads/brazilian-ecommerce/order_items_dataset.csv" -- file location
INTO TABLE order_items_datasets -- table name 

-- extract data

FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
LINES TERMINATED BY '\n'

-- ignore header

IGNORE 1 ROWS;

Basic Of SQL for Data Analysis

Introduction to SQL :

SQL stands for Structural Query Language. 

According to the survey of StackOverflow SQL always remains in the top 5 programming language in the world.

It means you are learning one of the most popular language in the world.

Operation in SQL :

Create Operation :

Create Operation is most basic operation in SQL. 

Create operation is used for multiple purposes like creating a table, creating schema, etc.

For performing a Create Operation, we use Create Command with some keyword as

CREATE SCHEMA :

CREATE SCHEMA command is used for creating schema in SQL

Syntax:



/**
CREATE SCHEMA 
*/

CREATE SCHEMA shema_name;

Example: 



/**
CREATE SCHEMA 
*/

CREATE SCHEMA e_commerce_db;

CREATE TABLE :

Used for Creating table
Syntax :



/**
CREATE SCHEMA 
*/

CREATE TABLE table_name (column_name1 data type  key order_detailsconstraints,column_name1 data type  key constraints … column_namen data type  key constraints)

Example:



/**
CREATE TABLE 
*/
Create table professionals_records(
id int primary key not null AUTO_INCREMENT,
name varchar (100),
mobile varchar(10),
email varchar(200),
city varchar(100),
occupation varchar(200),
designation varchar(100),
salary float
)

CREATE VIEW :

After creating table SELECT command is used for view data inside table (latter discussed about SELECT command). But in the case of SQL, we don’t show the same data for all users. So, in that case, we have to CREATE VIEW.

Syntax :



/**
CREATE SCHEMA 
*/

CREATE VIEW AS view_name( Query for view) 

Note: We will explain the Example of View latter in the same post.

shown below in the table

Create Operation Summary

Create command with keyword Description Syntax Example
CREATE SCHEMA For creating Schema in Database CREATE SCHEMA schema_name CREATE SCHEMA e_commerce_db
CREATE TABLE For creating table in Database CREATE TABLE table_name(column_name data type key constraints , …) CREATE TABLE order_details(id int primary key, name varchar(25) )
CREATE VIEW Used for Creating a view in Database CREATE VIEW AS view_name (query for creating a view) CREATE VIEW AS view_name(SELECT name FROM order_details)

order_details

Insert Operation

For Inserting rows inside a table we use insert command.
There are two different ways for Inserting rows inside the table :

  1. order_detailsInserting a single row inside a table
  2. Inserting multiple rows inside a table

Inserting a single row inside a table :

Syntax:



# comment line

INSERT INTO TABLE table_name(column1,column2,column3 ... , columnN) VALUES(value1,value2,value3 ..., valueN)

Example:

order_details

# Insert Single line


INSERT INTO professionals_records (id,name,mobile,email,city,occupation,designation,salary) 
values (1,'Nitin verma','7739042930','nverma@gmail.com','Jaipur','Engineer','Manager',100000)

Inserting multiple rows inside a table :

Syntax:

order_details

# comment line

INSERT INTO TABLE table_name(column1,column2,column3 ... , columnN) 
VALUES(value1,value2,value3 ..., valueN),
(value1,value2,value3 ..., valueN),
(value1,value2,value3 ..., valueN)
 .
 .
 . 
(value1,value2,value3 ..., valueN)

Example



# Insert Multiple lines


INSERT INTO professionals_records (id,name,mobile,email,city,occupation,designation,salary) 
values 
(1,'Nitin Verma,'7739042930','nverma@gmail.com','Jaipur','Engineer','Manager',100000),
(2,'Mukesh Sharma','9939042932','msharma@gmail.com','Jaipur','Engineer','Software Developer',50000),
(3,'Nimesh Mehra','8965721230','rmehra@gmail.com','Jaipur','Engineer','Software Developer',60000)order_details

SELECT Operation:

SELECT Statement return result set of records from one or more tables.
Syntax:



# comment line

SELECT * FROM table_name 

Note : * denotes all column

Example:



# Select query
SELECT * FROM professionals_records 

we can also specify specific columns also for extracting a particular column.
Syntax:



# comment line

SELECT column1, column2 ... columnN FROM table_name

Example:



# Select query

SELECT name,mobile,email,city,occupation FROM professionals_records; 

WHERE clause :

It is the most important keyword for SELECT Operation.
It is basically used for filtering data from SQL tables.
Syntax:



# comment line

SELECT * FROM table_name WHERE 

Example :



# comment line

SELECT * FROM professionals_records WHERE id=3;

Note: WHERE clause is not only used SELECT statement but also it is also used with UPDATE, DELETE statement.

LIMIT keyword

It is used for limiting how many number of rows that we want to extract.

Syntax :



# comment line

LIMIT number_of_rows;

Example:


order_details
# comment line

SELECT * FROM professionals_records LIMIT 2;

In the above example it will extract only 2 rows from professionals_records table.

Aggregate Function :

Aggregate function basically used for apply a calculation on a column.
It is also used for filtering some value on the basis of some calculations.

Types of Aggregate Function

  • AVG
  • SUM
  • COUNT
  • MIN
  • MAX

Note: AVG, SUM, MIN are applied to numerical values. So before applying these functions on a particular column make sure that the column contains numerical values or not.

Aggregate Function Description Example
AVG() Calculate average of a particular column SELECT avg(salary) FROM professionals_records;
SUM() Calculate sum of selected column SELECT sum(salary) FROM professionals_records;
COUNT() Count number of records (rows) SELECT count (*) FROM professionals_records;order_details
MIN() Return minimum value on applied column SELECT MIN(salary) FROM professionals_records;
MAX() Return maximum value of applied column SELECT MAX(salary) FROM professionals_records;

UPDATE Operation:

Used for updating records inside table.
For changing rows data inside a table we use UPDATE command.order_details
Syntax :



# comment line

UPDATE TABLE table_name set column_name=;

Example :



# comment line


UPDATE TABLE professionals_records SET salary=25000

Just wait from the above query all the values of the salary of all employees will be interchanged with 25000.
And we obviously don’t want to do so.

How do we avoid making such type of mistake?

For avoiding such types of mistakes, we will use WHERE Clause.
Syntax:



# comment line

UPDATE TABLE table_name SET column_name= WHERE ;

Example:


order_details
# comment line


UPDATE TABLE professionals_records SET salary=25000 WHERE id=1

Above query update salary=25000 of which id is 1.

DELETION Operation :

order_detailsDeletion Operation is done for deleting records from Database.
For deleting records from the Database table we use Delete Statement.
Syntax:



# comment line

DELETE TABLE table_name 

Example:

order_details

# comment line

DELETE TABLE professionals_records

It will delete all records from the professionals_records table.
For deleting particular records, we will have to use WHERE clause as shown below.

DELETE Statement with WHERE Clause

order_detailsorder_detailsSyntax :



# comment line

DELETE TABLE table_name WHERE  

Example:



# comment line


DELETE TABLE professionals_records WHERE id=25;

Delete records only for which id is 25.