Schedule (will be updated over the course of the semester.)

Please use cornell email to access the slides (any other email won’t work). Also available on canvas.

Introduction and Class Overview

Wednesday Jan 22
Lecture #1: Overview
Slides
Topics:
• Overview
• Course outline and syllabus
• Data Science Pipelines

Lecture 2

Monday Jan 27
Lecture #2: How to show datasets to the user
Slides
Topics:
• Data Discovery
• Max-cover problem
• Data search interface

Bloom Filters

Wednesday Jan 29
Lecture #3: Joining Data from different sources
Slides
Topics:
• Query-by-example
• Join Paths
• Bloom Filters

Key Estimation

Monday Feb 3
Lecture #4: Join Keys and overlap estimation
Slides
Topics:
• Key estimation
• Distinct value estimation
• Minhash and Jaccard similarity
Assignment 1 released

Minhash

Wednesday Feb 5
Lecture #5: Jaccard Similarity and LSH
Slides
Topics:
• Minhash
• Jaccard Similarity and LSH

Feb 10

Monday Feb 10
Lecture #6: LSH wrap up
Slides
Topics:
•LSH

Dataset search wrap up

Wednesday Feb 12
Lecture #7: Dataset search wrap up
Slides
Topics:
• Dataset Search
&#x2022Exploratory data analysis

Exploratory Data Analysis

Wednesday Feb 19
Lecture #8: Intro to Causal graphs
Slides
Topics:
• Simpson’s Paradox
• Causal graphs

Guest Lecture

Monday Feb 24
Guest Lecture by Saehan Jo

No Class

Wednesday Feb 26
:

Causal Graphs and Data labelling

Monday March 3
Lecture #11: Causal graphs and Data Labelling
Slides
Topics:
• Causal discovery
• Data labelling

Data labelling

Wednesday March 5
Lecture #12: Data Labelling
Slides
Topics:
• Data labelling

Data visualization

Monday March 10
Lecture #12: Data Labelling
Slides
Topics:
• Data Visualization