COLUMBIA UNIVERSITY DSI W4121
(This page is in flux)

Important Dates

11:59PM EST of due date

Project Teams

Teams should consist of 1-3 people. In addition, if you have a project in mind, please indicate briefly (1–2 sentences) what you are thinking. We have included a list of possible projects at the end of this document although you are not required to choose from these.

Click here to submit before class on 2/1

Prospectus

Your research prospectus will contain an overview of the research problem, your hypothesis, first pass at related work, a description of how you plan to complete the project, and metrics to decide if it worked. A good prospectus is basically the skeleton of the full report. It is highly recommended that you come to office hours to discuss project ideas before writing the prospectus.

Your prospectus should follow the example:

Submission

  1. Rename the filename of your prospectus to the following format, last names should be in alphabetical order. prospectus_<lastname1>_<lastname2>.._<lastnameN>.pdf
  2. Upload the file by 2/11 11:59PM EST

Poster Session

Your team will prepare and present a project poster at the end-of-course poster session. This gives you an opportunity to present a short demo of your work and show what you have accomplished in the class!

Submission

Report

You will prepare a conference-style report on your project with maximum length of 15 pages (10 pt font or larger, one or two columns, 1 inch margins, single or double spaced – more is not better.) Your report should expand upon your prospectus and introduce and motivate the problem your project addresses, describe related work in the area, discuss the elements of your solution, and present results that measure the behavior, performance, or functionality of your system (with comparisons to other related systems as appropriate.)

Because this report is the primary deliverable upon which you will be graded, do not treat it as an afterthought. Plan to leave at least a week to do the writing, and make sure your proofread and edit carefully!

Submission

  1. Rename the filename of your report to the following format, last names should be in alphabetical order. report_<lastname1>_<lastname2>.._<lastnameN>.pdf
  2. Make sure your UNIs are included in the first page of the report (so that I can assign credit appropriately!)
  3. Upload the file by 5/2 11:59PM EST

What is Expected

Good class projects can vary dramatically in complexity, scope, and topic. The only requirement is that they be related to something we have studied in this class and that they contain some element of research – e.g., that you do more than simply engineer a piece of software that someone else has described or architected. To help you determine if your idea is of reasonable scope, we will arrange to meet with each group several times throughout the semester.

Project Suggestions

The following are examples of possible projects – they are by no means a complete list and you are free to do whatever project you want. That’s the point of extra-credit projects! In general, projects can be of three varieties:

  1. Research project: model an unsolved problem, propose algorithmic solution, evaluate and report findings.
  2. Win: pick an existing useful application and a well-recognized metric (latency, prediction, etc) and win against the state of the art.
  3. Break and fix: implement a state of the art algorithm on real data, show that it doesn’t actually work (results are poor, it’s slow, etc), make it work.

Dynamic Space Utilization

There is no standard for how sensor-based data collection is used within smart buildings. One of the fundamental metrics to quantify is utilization – where, when, and how people occupy a space – upon which decisions of other building systems such as heating and cooling rely. While many organizations currently use static or even manual measurement, utilization sensors of several types are rapidly being implemented within the built environment.

Data Cleaning

Understand how scientific articles use and talk about data. Two possible directions:

Arachnid is a new explanation engine that automatically generates cleaning programs based on user specifications of data quality. It is an extension to ideas from Scorpion. Contact Eugene for a copy of Arachnid. Some possible projects:

Automatic Interface Generation

Precision interfaces automatically generates interaction interfaces from program logs. It supports any parsable language that can be represented as an abstract syntax tree. Extend the system in interesting ways

Query Engine for Interactive Apps

Smoke is the fastest lineage-enabled database engine. It captures the relationships between output and input records as efficient lineage indexes. It turns out, this can be used to express and speed up interactive applications such as visualizations. Extend or use it in interesting ways