- Published on
DS 4100 Day 12
- Authors
- Name
- Jacob Aronoff
DS 4100 Data Collection, Integration, and Analysis
Today we're talking more about SQL.
Three primary operations in relational database:
- Selection
- retrieve specific rows from a table
- Projection
- retrieve specific columns from a table
- Join
- combine multiple tables
The basic schema for a query is as follows:
SELECT <attribute list>
FROM <table list>
WHERE <condition>
GROUP BY <column>
HAVING <search-condition>
ORDER BY <column>;
Column Calculation
In SQL it's possible to calculate columns on the fly:
SELECT city, region,
(revenue - budget),
month (sales_date)
FROM Sales
WHERE (revenue - budget) < 0;
SELECT t_qty, t_cost,
(t_qty * t_cost) vol
FROM Trade
WHERE vol > 10000;
In the second query, we're giving the calculation (t_qty * t_cost) the column name vol, and then using that in our WHERE clause. Also! Important to note that order can change!
In basic mathematics, join does a cartesian product.
Joining
SELECT inv_name, p_desc
FROM investor i, portfolio p
WHERE i.inv_id = p.inv_id;
Here, we're joining the investor and their portfolio.
There's also NATURAL JOIN, which joins by fields that have the same name:
SELECT p_id, p_value, inv_name
FROM portfolio
NATURAL JOIN investor;
Like selection
SELECT * FROM investment
WHERE i_name LIKE 'D%';
Grouping
SELECT i_id, AVG (t_qty*t_cost)
FROM Trade
GROUP BY i_id;
SELECT i_id, AVG (t_qty*t_cost)
FROM Trade
WHERE p_id = 300
GROUP BY i_id;
Just throw a bunch of similar things together.
Range:
SELECT t_date, t_qty
FROM Trade
WHERE t_date
BETWEEN 'JAN-01-95' AND 'DEC-31-95';
Set membership
SELECT i_name, i_price
FROM Investment
WHERE i_name
IN ('IBM', 'DEC', 'MSFT');
SELECT i_name, i_price
FROM Investment
WHERE i_name
NOT IN ('IBM', 'DEC', 'MSFT');
Null
SELECT t_name, t_qty
FROM Trade
WHERE t_date IS NULL
AND t_qty IS NOT NULL;
Union
SELECT t_id, t_cost
FROM Trade
WHERE t_cost > 10
UNION
SELECT p_id, pos_value
FROM Position
WHERE pos_value > 10000;
- In order to ‘union’ two tables, they must:
- have the same number of columns
- the columns must be of the same data type
- the columns must have values from the same domains
- Union acts as an “append” - the data in one table is appended to the bottom of the other.
Subqueries
- The subquery is an important SQL multi-table query facility:
- natural way to express queries
- easier to write than join SELECTs
- allows a query to be broken down
- some queries cannot be defined without subqueries
SELECT * FROM investment
WHERE i_id IN
(SELECT i_id FROM trade
WHERE t_qty > 1000);
SELECT t_id, t_qty
FROM Trade
WHERE t_qty > (SELECT AVG(pos_qty)
FROM Position);
SELECT t_id
FROM trade
WHERE t_qty*t_cost > ANY
(SELECT p_value
FROM portfolio);