5 Fundamental SQL Commands That Every Data Scientist Must Know

There were many tools to deal with huge in the domain of Data Science. However, SQL is still used. It also became one of the key skills that are mentioned in the job description of a Data Scientist or a Data Engineer. Knowing Basic SQL commands will definitely add value to your career. So in this blog post, I wrote about the basic 5 SQL commands that a data scientist should know about. These are extensively used as part of data exploration. 

FUNDAMENTAL SQL COMMANDS FOR DATA SCIENTIST

Introduction  

SQL is a structured query language. It will allow you to do data analysis for the data stored in a relational database management system[RDBMS] through the usage of native SQL queries. In RDBMS, Data is stored in tables in a normalized form. Through SQL, you can do several activities on RDBMS tables like inserting data, querying, filtering, and many more. 

This demo is based on the SQL Version – Oracle SQL developer version 19.2.1.247

Prerequisites

To perform this demo, a Database, SQL editor, and a Schema are mandatory. 

Let’s get started. 

We will be using the below test Customer data set 

First, we need to create a table as below

CREATE TABLE "DBTRAINING"."CUSTOMER" 
   (	"CUSTOMER_ID" NUMBER(6,0), 
	"CUSTOMER_NAME" VARCHAR2(20 BYTE), 
	"ADDRESS" VARCHAR2(25 BYTE) NOT NULL ENABLE, 
	"CITY" VARCHAR2(25 BYTE) NOT NULL ENABLE, 
	"PINCODE" VARCHAR2(25 BYTE) NOT NULL ENABLE, 
	"COUNTRY" VARCHAR2(25 BYTE) NOT NULL ENABLE, 
	"PROFIT" NUMBER(6,0) NOT NULL ENABLE
   )

Let’s insert some test data into the table with the help of the below queries

REM INSERTING into DBTRAINING.CUSTOMER
SET DEFINE OFF;
Insert into DBTRAINING.CUSTOMER (CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,PINCODE,COUNTRY,PROFIT) values (1,'David Harris','Obere Str. 57','Stuttgart','70173','Germany',10000);
Insert into DBTRAINING.CUSTOMER (CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,PINCODE,COUNTRY,PROFIT) values (2,'Ronald','Wallen,87','Amsterdam','1012','Netherlands',30000);
Insert into DBTRAINING.CUSTOMER (CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,PINCODE,COUNTRY,PROFIT) values (3,'Faysel','Berguvsvägen 8','Arboga','73230','Swedan',65789);
Insert into DBTRAINING.CUSTOMER (CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,PINCODE,COUNTRY,PROFIT) values (4,'Nagarajan','130 Parrys','Chennai','600001','India',98750);
Insert into DBTRAINING.CUSTOMER (CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,PINCODE,COUNTRY,PROFIT) values (5,'Christina Hardy','Hanover Sq,120','London','E17AD','UK',50000);
Insert into DBTRAINING.CUSTOMER (CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,PINCODE,COUNTRY,PROFIT) values (6,'William Hardy','Berlin Sq,120','Berlin','10115','Germany',4000);
Insert into DBTRAINING.CUSTOMER (CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,PINCODE,COUNTRY,PROFIT) values (7,'Frank Jeff','Macquarie Sq,120','Sydney','2000','Australia',30000);
Insert into DBTRAINING.CUSTOMER (CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,PINCODE,COUNTRY,PROFIT) values (8,'RutherAdams','Heidelberglaan,20','Utrecht','3553','Netherlands',70000);
Insert into DBTRAINING.CUSTOMER (CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,PINCODE,COUNTRY,PROFIT) values (9,'Nick James','Piccadilly St,120','Manchester','E180D','UK',87000);
Insert into DBTRAINING.CUSTOMER (CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,PINCODE,COUNTRY,PROFIT) values (10,'Shranya','Nampally,34','Hyderabad','500001','India',67000);

Now let us see the data from the table 

As you can see, the data has been inserted successfully. Now that our data is ready, Let’s learn the basic queries that we can use to explore the data.

Displaying a subset of data

We can select a subset of data using either “ROWNUM” or “FETCH NEXT N ROWS ONLY”. Let’s say we need to see only the first 5 rows then we need to use the queries as below.

SELECT * FROM customer WHERE ROWNUM <=5;

We can also use “FETCH” clause as below

SELECT * FROM customer FETCH NEXT 5 ROWS ONLY

If in case you need to see only a few columns in the dataset, then we can just mention the column names in the query. For example, if we need to see only the customers and their countries, then use the query as below 

SELECT customer_name,country FROM customer

Removing Duplicates

If we have duplicate entries in the data, then we can remove that using the command “DISTINCT”.For example in the above data set, there are duplicate countries in the country column, If I want to check from what all countries, customers are available then we can specify the query as below.

SELECT distinct(country) FROM customer

Filtering Data 

Filtering or restricting the data from the table is possible using the clause “WHERE”.

For example, if I want to display data only related to Profit>30000, then we need to use the query below 

SELECT * FROM customer WHERE Profit > 30000

Ordering Data 

While displaying, the data can be ordered either in descending or ascending order by using the clause “ORDER BY”. By default if you do not specify any keyword then ascending order will be displayed, If you need descending order then specify the keyword “DESC”.Let’s see the data where Profit > 30000 in descending order as below.

Aggregating Data 

Often we can also do calculations like aggregation, averages, minimum, maximum of certain columns. Here we will see how to find out the total profit for each country. This can be done by using clauses GROUP BY and SUM as below 

Conclusion

In this blog post, you have learned fundamental SQL commands that can help you with data exploration. Do you find this article insightful? If you have any feedback feel free to comment below.

Leave a Comment