Data to Decisions: Principles of Efficient Data Science

Cornell University, Spring 2025

Instructor: Sainyam Galhotra
Email: sg@cs.cornell.edu
Office: Gates 445

Time: Monday and Wednesday, 1:25pm-2:40pm ET
Office Hour: Every Week Wednesday 3:00 - 4:00pm ET, Gates 445

Course Description:

This course delves into the principles of managing large-scale datasets with the goal of designing an efficient data science pipeline. It encompasses a diverse set of topics, such as data acquisition, visualization, preparation, validation, and analysis. As we go through these modules, you’ll also engage in extensive paper readings and reviews, enriching your understanding with the latest research and innovative approaches in the field. Students will apply the learned techniques as a part of an open-ended project.

By the end of the course, students will be able to:
• Implement the principles of data management to a real-world problem.
• Design complex data processing pipelines by combining data from different sources.
• Evaluate the impact of noise on the quality of data analysis and design methods to get rid of it.
• Critically evaluate scientific papers from the data management research community and prepare presentations covering these topics.

Prerequisites:

If you are an undergraduate, you should have taken CS 4780 or an equivalent course, since it is a prerequisite. For all others, knowledge of machine learning at the level of CS 4780 is needed.

Course Workload and Grading:

Paper Presentation: 20%
• One presentation individually or two presentations in teams of two.
• Grading considers quality of slides and verbal explanations, as well as audience engagement.
Quality of paper reviews: 15%
•Paper reviews should be about one page in length. The review should follow the template of a conference paper review.
Assignments: 10%
•Two assignments with four slip days.
Active participation in class discussions: 15%
• One point per lecture for asking at least one relevant question.
Project: 40%
• Teams of upto two students per project.
• Projects are chosen within the first three weeks of the semester.