Day 1 AM Session Plan
This session introduces students to the Spyder IDE and the data science process
Session Introduction
TEACHER LED
- Start with the course overview on slides 1 & 2 that tells students what they will learn during the course.
What Are We Doing?
The Data Science Process Using Python
Using Machine Learning Algorithms
Natural Language Processing
Creating Games with Advanced AI
Spyder IDE
SELF-PACED
Students will learn:
What the Spyder IDE is used for
How to create a project
How to create files
How to write code in the IDE
Data Collection and Treatment
SELF-PACED
- Students will:
Create a new file and save it as learningVariables.py
Learn about Data Treatment
Collect Data from Online Sources
Profile Data Sources
Format Data
Perform Feature Engineering
Split Data
Find Interesting Data Sources
Notes for Teachers
Students who have take the Week 1 course have already partially done this activity. This lesson can be performed as a self-paced activity. But we recommend having students break up into small groups, lead by a TA.
Students should pick at least 1 data set from each of the sources. This is so they can learn how to read the data descriptions that are on each of those websites.
After they have picked their three sources, they should pick one source to go into detail for. They should be answering questions like they would if they were a journalist:
What? What does the dataset actually include? Is there data about the topic that isn't included that perhaps should be?
Who? Who put together the dataset? What are their credentials? Are they qualified to collect data and make decisions about the dataset? Did they go out and collect data on their own, or did they have help from more people?
When? When was the dataset put together, a long time ago or recently? How long did it take them to put together the dataset? Could circumstances have changed in the time it took them to put the dataset together?
Where? Where was the data collected? At a specific location, or online? Which data site is the dataset from?
Why? Why did the author put together the dataset? Could there be bias in the dataset from the reason they collected the data?
How? How was the data collected and recorded? What types of variables are included in the dataset? Was the data collected manually, or with a computer program?
Students should be in small groups so that each student gets a chance to share their dataset and explain it to their fellow students. The teachers should be actively engaged and asking questions to the students about their datasets.
If you think that this instruction will be confusing, the lead teacher might go first in explaining one a dataset they are familiar with. If you're at a loss as to a dataset, the classical machine-learning dataset to use is the iris dataset.
Another dataset with a very storied history is the Netflix Prize Dataset. It is a good example of a dataset that is very fascinating, but needs a lot of cleanup before you can do machine learning analysis on it.