Week 1: Getting Started and Selecting & Retrieving Data with SQL
Introduction
What is SQL?
Structured Query Language (SQL) is a standard computer language for relational database management and data manipulation.
Used to query, insert, update and modify data.
Pronounced as “sequel” or S-Q-L.
Used to communicate with databases;
SQL is non-procedural language:
Cannot write complete applications
Simple, but powerful
How is SQL used?
Read/retrieve data
Write data - add data to a table;
Update data - insert new data;
Who uses SQL?
- Backend Developer
- Data Architect
- QA Engineer
- ETL Developer
- Database Admin (DBA)
- Data analyst
- System Admin
- System Engineer
- Data Scientist
DBA or Data Scientist
DBA | Data Scientist |
---|---|
Manages/governs entire database | End user of a database |
Gives permissions to users | Use SQL to query and retrieve data |
Determines access to data | |
Manages and creates tables | |
Uses SQL to query and retrieve data |
How do data scientists use SQL?
Retrieve data;
May create their own table or test enviroments;
Combine multiple sources together;
Write complex queries for analysis.
Relational Database Management Systems
- SQL Server
- IBM DB2 Oracle
- Sybase ASE
- PostgreSQL
- MySQL
- Microsoft SQL Server
- Apache Open Office Base
- SQLite
Write syntax depends on what DBMS you are using.
Each DBMS has its own dialect.
SQL can translate.
Tweak based on the dialect your DBMS speaks.
Data Models
Thinking about your data
Understand the business process or subject matter the data is modeled after
Know the bussiness rules
Understand how your data is organized and structured in the table(modeled).
Why this is worthwhile
Get more accurate results,
Speed up your work,
Have less rework.
Some concepts
Name | Description |
---|---|
Database | A container to store organized data |
Tables | A structured list of data or a specific type |
Column | A single field in a table |
Row | A record in a table |
What is data modeling?
Organizes and structures information into multiple, related tables.
Can represent a bussiness process or show relationships between bussiness processes.
Should closely represent real world.
Types of data modeling
Models for prediction built by data scientist.
Data model as data tables represented and organized in a database.
Evolution of data models
Time | Types |
---|---|
1960 | Hierarchical |
1969 | Network |
1970 | Relational |
1976 | Entity relationship |
1978 | Semantic |
1985 | Object-oriented |
1990 | Extended relational(O/R DBMS) |
2009 | NoSQL - Not only SQL |
Relational vs. Transactional Models
Relational Model | Transactional Model |
---|---|
Allows for easy querying and data manipulation in an easy, logical and intuitive way. | Operational database - insurance claims within a healthcare database |
Data model building blocks
Name | Description |
---|---|
Entity | Person, place thing or event, Distinguishable, unique and distinct |
Attribute | A characteristic of an entity |
Relationship | Describes association among entities. One-to-many,Many-to-many,One-to-one |
- One-to-many: customer to invoices.
- Many-to-many: student to classes.
- One-to-one: manager to store.
ER Diagrams
ER Model
- Show relationships
- Bussiness process
- Represented visually
- Show links (Primary keys)
Some examples of ER Diagram
Retrieving data with SELECT statement
|
|
For multiple columns, be sure to use a comma.
|
|
The following expression is more clearly.
|
|
Retrieving multiple columns using a wildcard
Request all columns by using a asterisk(*) wildcard character.
|
|
Limit results
If the database is very large, we can use LIMIT
to limit the number of retrieved results.
|
|
- Different Syntaxes
Database | Codes | ||
---|---|---|---|
SQLite |
|
NULL and Primary Keys
Every column is either NULL or NOT NULL.
Primary keys cannot be null, they must have a value.
Adding data to the table
|
|
Create a template table
|
|
Adding comments
Single line
|
|
Section
|
|
Week 2: Filtering, Sorting, and Calculating Data with SQL
Introductions
Clauses and Operators in SQL
- WHERE
- BETWEEN
- IN
- OR
- NOT
- LIKE
- ORDER BY
- GROUP BY
Wildcards
Math Operators
- AVERAGE
- COUNT
- MAX
- MIN
Basics of Filtering with SQL
Why?
- Be specific about the data you want to retrieve
- Reduce the number of records you retrieve
- Increase query performance
- Reduce the strain on the client application
- Goverance the limitations
WHERE Clause
|
|
Some clause operators
Filtering on a single condition
|
|
Filtering on a single value
|
|
Checking for non-matches
|
|
Filtering with a range of values (BETWEEN…AND…)
|
|
Filtering no value (NULL)
|
|
Advanced filtering: IN, OR, and NOT
IN operator
- Specifies a range of conditions
- Comma delimited list of values
- Enclosed in ()
Example
|
|
OR operator
DBMS will not evaluate the second conditions in WHERE clause if the first condition is met.
Use for any rows matching the specific conditions.
|
|
IN vs. OR
IN works the same as OR.
Benfits of IN:
- Long list of options,
- IN executes faster than OR,
- Don’t have to think about the order with IN,
- Can contain another SELECT5
OR with AND
|
|
|
|
Don’t rely on the default order, it’s better to have the habit of using the ().
NOT operator
|
|
Using Wildcards in SQL
- Special character used to match parts of a value,
- Use
LIKE
Operator
What are wildcards?
- Can only be used with strings.
- Cannot be used for non-text datatype.
- Helpful for data scientists as they explore string variables.
Using % wildcards
- % wildcards will not match NULLS, NULL represents no value in a column.
Using underscore(_) wildcards
- Matches a single character.
- Is not supported by DB2.
Downsides of wildcards
- Takes longer to run.
- Better to use another operator (=, <, >=, and etc.)
- Statements with wildcards will take longer to run if used at the end of search pattern.
- Placement of wildcards is important.
Sorting with ORDER BY
|
|
- Can sort by a column not retrieved
- Must always be the last clause in a select statement
Sorting by column position
|
|
It means data sorted by 2nd and 3rd column.
Sort direction
DESC
: descending order
ASC
: ascending order
Only applies to the column names it directly precedes.
Math operators
Operator | Description |
---|---|
+ | Addition |
- | Subtraction |
* | Multiplication |
/ | Division |
Order of operators
- Parenthese
- Exponents
- Multiplication
- Division
- Addition
- Subtraction
Use parenthese if needed
|
|
Aggregate Functions
What?
- Used to summarize data
- Finding the highest and lowest values
- Finding the total number of rows
- Finding the average value
Functions
Average function
|
|
Count functions
1.COUNT(*) - Counts all rows (contain NULLs).
|
|
2.COUNT(Column_name) - Counts all rows in a specific column ignoring NULL value.
|
|
MIN and MAX functions
Columns with NULL values are ignored by MIN and MAX functions.
|
|
SUM aggregate functions
|
|
Using DISTINCT on aggregate functions
- If DISTINCT is not specified, ALL is assumed.
- Cannot use DISTINCT on COUNT(*).
- No value to use with MIN and MAX function.
|
|
Grouping Data with SQL
Grouping examples
|
|
Notices
- GROUP BY clause can contain multiple columns.
- Every column in your SELECT statement must be present in a GROUP BY clause (except for aggregate functions).
- NULLs will be grouped together if your GROUP BY column contains NULLs.
HAVING clause filtering for groups
WHERE does not work for GROUP.
WHERE filters on rows.
Instead using HAVING clause to filter for groups.
HAVING examples
|
|
WHERE vs. HAVING
WHERE filters before data is grouped.
HAVING filters after data is grouped.
Rows elimated by WHERE clause will not be included in the group.
|
|
Putting it all together
Filtering is useful
Narrow down your results.
Increasing query & application performance.
Understanding your data:
- Finding specific values
- Finding a range of values
- Finding blank values
Key SQL clauses
Week 3: Subqueries and joins in SQL
Using Subqueries
What?
- Queries embedded into other queries
- Relational databases store data in multiple tables
- Subqueries merge data from multiple source together
- Helps with adding other filtering criteria
Example
|
|
Working with Subsequery statements
Always perform the innermost SELECT portion first.
DBMS is performing two operations
- Getting the order numbers for the product selected,
- Adding that to the WHERE clause and processing the overall SELECT statement.
Subquery best practices and considerations
- There is no limit to the number of subqueries you can have.
- Performance slows when you nest too deeply.
- Subquery selects can only retrieve a single column.
Subquery in subquery
Be sure to use indenting.
|
|
The power of subqueries
- Subqueries are powerful tools
- Not always the best option due to performance
Joining tables: an introduction
Benefits of breaking data into tables
- Efficient storage
- Easier manipulation
- Greater scalability
- Logically models a process
- Tables are related through common values (keys)
Joins
- Associate correct records from each table on the fly.
- Allows data retrieval from multiple tables in one query.
- Joins are not physical, they persist data for the duration of the query execution.
Cartesian (Cross) Joins
- Not frequently used.
- Computationally taxing.
- Will return products with the incorrect vendor or no vendor at all.
Examples
|
|
|
|
Inner Joins
What?
The INNER JOIN keyword selects records that have matching values in both tables.
Example
|
|
Syntax
- Join type is specified (INNER JOIN)
- Join condition is in the FROM clause and uses ON clause
- Joining more tables together affects overall database performance
- You can join multiple tables, no limit
- List all the tables, then define conditions
Inner Join with multiple tables
|
|
Best practices
- Make sure you are pre-qualifying names
- Do not make unnecessary joins
- Think about the type if join you are making
Aliases and self joins
What?
|
|
Query example using alias
|
|
Self joins
- Match customers from the same city
- Take the table and treat it like two separate tables
- Join the original table to itself
|
|
Examples
Matching the customers from the same city:
|
|
Advanced joins: Left, Right and Full Outer Joins.
- SQL Lite only does left joins,
- Other DBMS use all joins.
Left Join
Return all records from the left table, and the matched records from the right table.
The result is NULL from the right side, if there is no match.
|
|
Right Join
Return all records from the right table, and the matched records from the left table.
The result is NULL from the left side, if there is no match.
|
|
Left Joins can be turned into right joins by reversing the order of tables.
Full Outer Join
Return all records when there is a match in either left table or right table records.
Select all customers and orders:
|
|
Unions
The UNION operator is used to combine the result-set of two or more SELECT statements.
Each SELECT statement with UNION must have the same number of columns.
- Columns must have similar data types.
- The columns in each SELECT statement must be in the same order.
|
|
Note
- UNION selects only distinct values.
- Use UNION ALL to also select duplicate values!
Avoiding duplicate in a Union
|
|
Summary
Best practices using joins
- Check the number of records.
- Check for duplicates.
- Check the number of records each time you make a new join.
- Start smart: one table at a time.
Join syntax
- Check the particular syntax for your DBMS.
- SQLite does not do RIGHT and FULL OUTER joins.
Further readings
- Difference Between Union and Union All - Optimal Performance Comparison
- Thinking in SQL vs Thinking in Python
Module 3 Coding Assignment
All of the questions in this quiz refer to the open source Chinook Database. Please familiarize yourself with the ER diagram in order to familiarize yourself with the table and column names in order to write accurate queries and get the appropriate answers.
1. Using a subquery, find the names of all the tracks for the album “Californication”.
|
|
|
|
2. Find the total number of invoices for each customer along with the customer’s full name, city and email.
|
|
|
|
3. Retrieve the track name, album, artist, and trackID for all the albums.
|
|
|
|
4. Retrieve a list with the managers last name, and the last name of the employees who report to him or her.
|
|
|
|
5. Find the name and ID of the artists who do not have albums.
|
|
|
|
6. Use a UNION to create a list of all the employee’s & customer’s first names and last names ordered by the last name in descending order.
|
|
|
|
7. See if there are any customers who have a different city listed in their billing city versus their customer city.
|
|
|
|
Week 4: Modifying and analyzing data with SQL
Working with text strings
Retrieve the data in the format you need
Support Joins
String Functions:
- Concatenations
|
|
SQL server supports + instead of ||
- Trimming strings
Trims the leading or trailing space from a string
TRIM
RTRIM
LTRIM
- Substring
Returns the specified number of characters from a particular position of a given strings.
|
|
|
|
- Upper and Lower
|
|
|
|
|
|
Working with date and time strings
- Dates are stored as datetypes
- Each DBMS uses it’s own variety of datetypes
DATE: Format YYYY-MM-DD
DATETIME: Format YYYY-MM-DD HH:MI:SS
TIMESTAMP: Format YYYY-MM-DD HH:MI:SS