DS 4100 Data Collection, Integration, and Analysis

Today we’re talking more about SQL.

Three primary operations in relational database:

The basic schema for a query is as follows:

SELECT 	<attribute list>
FROM 	<table list>
WHERE 	<condition>
GROUP BY <column>
HAVING <search-condition>
ORDER BY <column>;

Column Calculation

In SQL it’s possible to calculate columns on the fly:

SELECT city, region, 
       (revenue - budget),
       month (sales_date)
  FROM Sales
 WHERE (revenue - budget) < 0;

SELECT t_qty, t_cost, 
       (t_qty * t_cost) vol
  FROM Trade
 WHERE vol > 10000;

In the second query, we’re giving the calculation (t_qty * t_cost) the column name vol, and then using that in our WHERE clause. Also! Important to note that order can change!

In basic mathematics, join does a cartesian product.


SELECT inv_name, p_desc
  FROM investor i, portfolio p
 WHERE i.inv_id = p.inv_id;

Here, we’re joining the investor and their portfolio.

There’s also NATURAL JOIN, which joins by fields that have the same name:

SELECT p_id, p_value, inv_name
  FROM portfolio 
  NATURAL JOIN investor;

Like selection

SELECT * FROM investment
WHERE i_name LIKE 'D%';


SELECT i_id, AVG (t_qty*t_cost)
  FROM Trade
GROUP BY i_id;

SELECT i_id, AVG (t_qty*t_cost)
  FROM Trade
 WHERE p_id = 300
GROUP BY i_id;

Just throw a bunch of similar things together.


SELECT t_date, t_qty
  FROM Trade
 WHERE t_date 
 BETWEEN 'JAN-01-95' AND 'DEC-31-95';

Set membership

SELECT i_name, i_price
  FROM Investment
 WHERE i_name 
    IN ('IBM', 'DEC', 'MSFT');

SELECT i_name, i_price
  FROM Investment
 WHERE i_name 
NOT IN ('IBM', 'DEC', 'MSFT');


SELECT t_name, t_qty
  FROM Trade
   AND t_qty IS NOT NULL; 


SELECT t_id, t_cost
  FROM Trade
 WHERE t_cost > 10
SELECT p_id,  pos_value
  FROM Position
 WHERE pos_value > 10000;


SELECT * FROM investment
 WHERE i_id IN 
   (SELECT i_id FROM trade
    WHERE t_qty > 1000);
SELECT t_id, t_qty
  FROM Trade
 WHERE t_qty > (SELECT AVG(pos_qty)
                  FROM Position);
  FROM trade
 WHERE t_qty*t_cost > ANY
         (SELECT p_value
            FROM portfolio);