Note: This schedule is in flux at the moment!

  Title Description Instructor
Week 1 1/18 Introduction Why not single machine? Big-data challenges, datacenter structure, typical use cases and their requirements. Course overview. All
Week 2 1/25 Basics HDFS, Hadoop basics Sahu
Week 3 2/01 Batch Processing MapReduce Sahu
Week 4 2/08 Batch Processing ctd Map Reduce additional details with more complex examples, HBase, Hive Sahu
Week 5 2/15 Iterative Processing Intro to Spark, Spark programming Sahu
Week 6 2/22 Data Models and Cleaning Why the relational data model? Why schemas? The ins and outs. Wu
    Readings:
What goes around comes around,
Unified Logging@Twitter
 
Week 7 3/01 Cleaning and Integration Readings:
Truth finding on the deep web,
Data Wrangler
Wu
Week 8 3/08 Classic Query Processing and Fast Query Processing Reading:
C-Store,
Col vs Row Stores,
OLTP
Wu
Week 9 3/15 NO CLASS. Spring Break!    
Week 10 3/22 Potourri Wu’s goodbye tour: Graph analysis. Scalable visualization. ML. Distributed transactions(?) Wu
    Readings:
ML in DBs,
Graphs in DBs
Viz in DBs
 
Week 11 3/29     Sahu
Week 12 4/05     Sahu
Week 13 4/12     Sahu
Week 14 4/19     Sahu
Week 15 4/26     Sahu
Week 16 5/03 Poster Presentation + Submit Writups   Sahu