W4121: Non-graded Quiz,  Wu’s Lectures
Click Here For Solutions
Lec2: Data Modeling
The point of data modeling is to think about what characteristics of the data you will need for your application.
Q1: Data Independence
  - What is data independence and why does it matter?
 
  - What are the main two reasons why hierarchical data model is a bad idea?
 
  - Give an example that illustrates the bad idea.
 
  - When could it make sense to use a hierarchical data model?
 
  - What did the graph model try to solve?  What limitations did it still have?
 
Q2: Queries
Suppose our data adheres to the following schema
    animals(id, species, name, gender, keeper_id)
    keepers(id, name, birthday)
    feedings(keeper_id, animal_id, date, time, food, pounds)
 
Describe in english what each of the following operations will do:
  - filter(feedings, time < 12pm)
 
  - filter(feedings, time < 12pm and pounds > 10)
 
  - project(animals, [name])
 
  - project(animals, [species])
 
  - join(feedings, keepers, keeper_id = id)
 
  - join(animals, feedings, keeper_id = id)
 
  - groupby(feedings, [food], SUM(pounds), COUNT(pounds))
 
What are the operator symbols (greek symbols) for the above operations?
Describe in english what the following operations do:
  - Πfood σtime=12pm(feedings)
 
  - Πname (σtime=12pm(feedings) ⋈id=animal_id σspecies=gorilla(animals))
 
Q3: Logging
We have seen that having a schema is helpful because then data can be easily queried.  Before the solution described in the paper, Twitter’s event messages DID adhere to a schema of (category, message).
  - Even though the events had a schema, why did data scientists still have trouble analyzing logging data?
    
      - What was the source of this trouble?
 
      - Why do you think Twitter let this happen?  Couldn’t they have prevented it from the very beginning?
 
    
   
  - What was the main solution to address the above troubles?
 
Lec 3/4: Query Execution