Schedule (will be updated over the course of the semester.)
Please use cornell email to access the slides (any other email won’t work). Also available on canvas.
Introduction and Class Overview
- Wednesday Jan 22
- Lecture #1: Overview
- Slides
- Topics:
• Overview
• Course outline and syllabus
• Data Science Pipelines
Lecture 2
- Monday Jan 27
- Lecture #2: How to show datasets to the user
- Slides
- Topics:
• Data Discovery
• Max-cover problem
• Data search interface
Bloom Filters
- Wednesday Jan 29
- Lecture #3: Joining Data from different sources
- Slides
- Topics:
• Query-by-example
• Join Paths
• Bloom Filters
Key Estimation
- Monday Feb 3
- Lecture #4: Join Keys and overlap estimation
- Slides
- Topics:
• Key estimation
• Distinct value estimation
• Minhash and Jaccard similarity
Assignment 1 released
Minhash
- Wednesday Feb 5
- Lecture #5: Jaccard Similarity and LSH
- Slides
- Topics:
• Minhash
• Jaccard Similarity and LSH
•
Feb 10
- Monday Feb 10
- Lecture #6: LSH wrap up
- Slides
- Topics:
•LSH
Dataset search wrap up
- Wednesday Feb 12
- Lecture #7: Dataset search wrap up
- Slides
- Topics:
• Dataset Search
𠈮xploratory data analysis
Exploratory Data Analysis
- Wednesday Feb 19
- Lecture #8: Intro to Causal graphs
- Slides
- Topics:
• Simpson’s Paradox
• Causal graphs
Guest Lecture
- Monday Feb 24
- Guest Lecture by Saehan Jo
No Class
- Wednesday Feb 26
- :
Causal Graphs and Data labelling
- Monday March 3
- Lecture #11: Causal graphs and Data Labelling
- Slides
- Topics:
• Causal discovery
• Data labelling
Data labelling
- Wednesday March 5
- Lecture #12: Data Labelling
- Slides
- Topics:
• Data labelling
Data visualization
- Monday March 10
- Lecture #12: Data Labelling
- Slides
- Topics:
• Data Visualization