Query Execution
Query compilation into plans
Query Optimization
Traditional optimizations
Consider the following queries
Q1: SELECT month, SUM(sales)
Q2: SELECT month, SUM(sales) WHERE month = 'Feb'
Q3: SELECT month, AVG(sales)
Q4: SELECT month, region, SUM(sales)
Q5: SELECT month, SUM(sales) WHERE region = "US"
1 dimensional data structure on Month can answer Q1 and Q2, but not the rest:
Jan --> SUM(sales)
Feb --> SUM(sales)
Mar --> SUM(sales)
Can answer Q1-Q3, but not Q4,5
Jan --> {sum:, count: }
Feb --> {sum:, count: }
Mar --> {sum:, count: }
Build a two dimensional data structure on Month & Region. Can answer Q1-Q5
US Eur Asia
Jan {sum,count} {sum,count} {sum,count}
Feb {sum,count} {sum,count} {sum,count}
Mar {sum,count} {sum,count} {sum,count}
SELECT month, SUM
to SELECT year, SUM
drill down, slice, dice
US Eur Asia 2010 Jan {sum,count} {sum,count} {sum,count} 2010 Feb {sum,count} {sum,count} {sum,count} 2010 Mar {sum,count} {sum,count} {sum,count}
When does this work? For algebraic and distributive operators
for a function f, there exists a function g such that
f(set) = g(f(partition1), ... f(partitionN))
COUNT({0,1,2,...10}) = SUM(COUNT({0}), COUNT({1,2}),... )
SUM({0,1,...}) = SUM(SUM({0}), SUM({1,2...}), SUM(...))
if you run a subquery very very often and rarely update the underlying data, then cache it!
CREATE VIEW myview AS ( ...query.... )
One size doesn’t fit all
Motivation
Logging
10k machines * 100hz = 1,000,000
* # Sensors * # data centers * bulk writes, few updates, read-mostly
Star schema
unnormalized retailer schema
store_id, store_name, store_owner, store_addr1, ....,
emp_id, emp_name, emp_age, emp_salary, ...
prod_id, prod_name, prod_id, prod_price, ...
...
normalized “star” schema
fact(store_id, emp_id, prod_id, cust_id, ...)
store(store_id, name, owner, addr1, ...)
...
Overview
Physical layout
sorted on subset of the columns (sort key)
EMP1(name, age | age) # sorted on age
The columns for EMP1:
(name, storage_key)
(age, storage_key)
// storage key not stored, just its order number in the column
// for joining between projections
few distinct values
(val, startidx, endidx)
bitmaps
(val, bitmap) e.g., (99, 0111100010100)
delta compression
10, 12, 13, 14 --> 10,2,1,1
reduce bits per val, then compress again
So fast that CPU decompression is often bottleneck!
Why not implement in a rowstore
Isn’t storing all these projections blowing up disk costs by several X?
Query execution walk through
select avg(price)
from data
where symbol = 'GM' AND date = xxx
read and send entire tuples throughout the plan
AVG price
|
SELECT date = xxx
|
SELECT sym = 'GM'
|
| (GM, 1, ... xx, ...)
|
[Symbol, price, nshares, exchange, date,...]
[GM 1 ... xx ]
[GM 2 ... xx ]
[GM 3 ... xx ]
[AAPL 4 ... xx ]
...
construct()
due to record overhead, can be slower than vanilla row store
Avg price
|
SELECT date = xx
|
SELECT sym = GM
|
| (GM, 1, xx)
|
Construct (Symbol, Price, Date)
/ | \
[Symbol] [price] [nshares] [exchange] [date] ...
GM 1 xx
GM 2 xx
GM 3 xx
AAPL 4 xx
send compressed bitstrings through query plan
Avg price
| (1,2,3)
Lookup price
| (1,1,1,0)
AND
(1,1,1,0) / \ (1,1,1,1)
SELECT sym = GM SELECT date=xx
/ \
/ \
[Symbol] [price] [nshares] [exchange] [date] ...
GM 1 xx
GM 2 xx
GM 3 xx
AAPL 4 xx
CStore w/ compression
Avg price
| (1,+1,+1)
Lookup price
| (3x1, 1x0)
AND
(3x1,1x0) / \ (4x1)
SELECT sym = GM SELECT date=xx
/ \
/ \
[Symbol] [price] [nshares] [exchange] [date] ...
3xGM 1 4x "xx"
AAPL +1
+1
+1
tuple.get_attr('id')
per tupleHighlights:
100x faster than RDBMS – everyone uses it
How is a SQL query executed?
SELECT count( * ) WHERE type = 100
def filterOperator:
def next():
cur = child.next()
while cur == null || !predicate(cur):
cur = child.next()
return cur
def predicate(tuple, attr, ...):
return binaryOp("=", tuple.get(attr), ...)
def binaryOp(op, l, r):
if op == "=": return l == r
...
Hand written code
v = 0
for type in sales
if type == 100
v++
Why?