Published on

DS 4100 Day 12

Authors
  • avatar
    Name
    Jacob Aronoff
    Twitter

DS 4100 Data Collection, Integration, and Analysis

Today we're talking more about SQL.

Three primary operations in relational database:

  • Selection
    • retrieve specific rows from a table
  • Projection
    • retrieve specific columns from a table
  • Join
    • combine multiple tables

The basic schema for a query is as follows:

SELECT 	<attribute list>
FROM 	<table list>
WHERE 	<condition>
GROUP BY <column>
HAVING <search-condition>
ORDER BY <column>;

Column Calculation

In SQL it's possible to calculate columns on the fly:

SELECT city, region,
       (revenue - budget),
       month (sales_date)
  FROM Sales
 WHERE (revenue - budget) < 0;

SELECT t_qty, t_cost,
       (t_qty * t_cost) vol
  FROM Trade
 WHERE vol > 10000;

In the second query, we're giving the calculation (t_qty * t_cost) the column name vol, and then using that in our WHERE clause. Also! Important to note that order can change!

In basic mathematics, join does a cartesian product.

Joining

SELECT inv_name, p_desc
  FROM investor i, portfolio p
 WHERE i.inv_id = p.inv_id;

Here, we're joining the investor and their portfolio.

There's also NATURAL JOIN, which joins by fields that have the same name:

SELECT p_id, p_value, inv_name
  FROM portfolio
  NATURAL JOIN investor;

Like selection

SELECT * FROM investment
WHERE i_name LIKE 'D%';

Grouping

SELECT i_id, AVG (t_qty*t_cost)
  FROM Trade
GROUP BY i_id;

SELECT i_id, AVG (t_qty*t_cost)
  FROM Trade
 WHERE p_id = 300
GROUP BY i_id;

Just throw a bunch of similar things together.

Range:

SELECT t_date, t_qty
  FROM Trade
 WHERE t_date
 BETWEEN 'JAN-01-95' AND 'DEC-31-95';

Set membership

SELECT i_name, i_price
  FROM Investment
 WHERE i_name
    IN ('IBM', 'DEC', 'MSFT');

SELECT i_name, i_price
  FROM Investment
 WHERE i_name
NOT IN ('IBM', 'DEC', 'MSFT');

Null

SELECT t_name, t_qty
  FROM Trade
 WHERE t_date IS NULL
   AND t_qty IS NOT NULL;

Union

SELECT t_id, t_cost
  FROM Trade
 WHERE t_cost > 10
UNION
SELECT p_id,  pos_value
  FROM Position
 WHERE pos_value > 10000;
  • In order to ‘union’ two tables, they must:
    • have the same number of columns
    • the columns must be of the same data type
    • the columns must have values from the same domains
  • Union acts as an “append” - the data in one table is appended to the bottom of the other.

Subqueries

  • The subquery is an important SQL multi-table query facility:
    • natural way to express queries
    • easier to write than join SELECTs
    • allows a query to be broken down
    • some queries cannot be defined without subqueries
SELECT * FROM investment
 WHERE i_id IN
   (SELECT i_id FROM trade
    WHERE t_qty > 1000);
SELECT t_id, t_qty
  FROM Trade
 WHERE t_qty > (SELECT AVG(pos_qty)
                  FROM Position);
SELECT t_id
  FROM trade
 WHERE t_qty*t_cost > ANY
         (SELECT p_value
            FROM portfolio);