DS 4100 Day 12
DS 4100 Data Collection, Integration, and Analysis
Today we’re talking more about SQL.
Three primary operations in relational database:
- retrieve specific rows from a table
- retrieve specific columns from a table
- combine multiple tables
The basic schema for a query is as follows:
SELECT <attribute list> FROM <table list> WHERE <condition> GROUP BY <column> HAVING <search-condition> ORDER BY <column>;
In SQL it’s possible to calculate columns on the fly:
SELECT city, region, (revenue - budget), month (sales_date) FROM Sales WHERE (revenue - budget) < 0; SELECT t_qty, t_cost, (t_qty * t_cost) vol FROM Trade WHERE vol > 10000;
In the second query, we’re giving the calculation (t_qty * t_cost) the column name vol, and then using that in our WHERE clause. Also! Important to note that order can change!
In basic mathematics, join does a cartesian product.
SELECT inv_name, p_desc FROM investor i, portfolio p WHERE i.inv_id = p.inv_id;
Here, we’re joining the investor and their portfolio.
There’s also NATURAL JOIN, which joins by fields that have the same name:
SELECT p_id, p_value, inv_name FROM portfolio NATURAL JOIN investor;
SELECT * FROM investment WHERE i_name LIKE 'D%';
SELECT i_id, AVG (t_qty*t_cost) FROM Trade GROUP BY i_id; SELECT i_id, AVG (t_qty*t_cost) FROM Trade WHERE p_id = 300 GROUP BY i_id;
Just throw a bunch of similar things together.
SELECT t_date, t_qty FROM Trade WHERE t_date BETWEEN 'JAN-01-95' AND 'DEC-31-95';
SELECT i_name, i_price FROM Investment WHERE i_name IN ('IBM', 'DEC', 'MSFT'); SELECT i_name, i_price FROM Investment WHERE i_name NOT IN ('IBM', 'DEC', 'MSFT');
SELECT t_name, t_qty FROM Trade WHERE t_date IS NULL AND t_qty IS NOT NULL;
SELECT t_id, t_cost FROM Trade WHERE t_cost > 10 UNION SELECT p_id, pos_value FROM Position WHERE pos_value > 10000;
- In order to ‘union’ two tables, they must:
- have the same number of columns
- the columns must be of the same data type
- the columns must have values from the same domains
- Union acts as an “append” - the data in one table is appended to the bottom of the other.
- The subquery is an important SQL multi-table query facility:
- natural way to express queries
- easier to write than join SELECTs
- allows a query to be broken down
- some queries cannot be defined without subqueries
SELECT * FROM investment WHERE i_id IN (SELECT i_id FROM trade WHERE t_qty > 1000);
SELECT t_id, t_qty FROM Trade WHERE t_qty > (SELECT AVG(pos_qty) FROM Position);
SELECT t_id FROM trade WHERE t_qty*t_cost > ANY (SELECT p_value FROM portfolio);