Day 1 AM Session Plan

This session introduces students to the Spyder IDE and the data science process

Presentation Slides

Session Introduction

TEACHER LED
  • Start with the course overview on slides 1 & 2 that tells students what they will learn during the course.

    What Are We Doing?

    The Data Science Process Using Python
    Using Machine Learning Algorithms
    Natural Language Processing
    Creating Games with Advanced AI

Spyder IDE

SELF-PACED


    Students will learn:

    What the Spyder IDE is used for
    How to create a project
    How to create files
    How to write code in the IDE

Data Collection and Treatment

SELF-PACED
  • Students will:

    Create a new file and save it as learningVariables.py
    Learn about Data Treatment
    Collect Data from Online Sources
    Profile Data Sources
    Format Data
    Perform Feature Engineering
    Split Data
    Find Interesting Data Sources


Notes for Teachers

Students who have take the Week 1 course have already partially done this activity. This lesson can be performed as a self-paced activity. But we recommend having students break up into small groups, lead by a TA.

Students should pick at least 1 data set from each of the sources. This is so they can learn how to read the data descriptions that are on each of those websites.

After they have picked their three sources, they should pick one source to go into detail for. They should be answering questions like they would if they were a journalist:

What? What does the dataset actually include? Is there data about the topic that isn't included that perhaps should be?
Who? Who put together the dataset? What are their credentials? Are they qualified to collect data and make decisions about the dataset? Did they go out and collect data on their own, or did they have help from more people?
When? When was the dataset put together, a long time ago or recently? How long did it take them to put together the dataset? Could circumstances have changed in the time it took them to put the dataset together?
Where? Where was the data collected? At a specific location, or online? Which data site is the dataset from?
Why? Why did the author put together the dataset? Could there be bias in the dataset from the reason they collected the data?
How? How was the data collected and recorded? What types of variables are included in the dataset? Was the data collected manually, or with a computer program?


Students should be in small groups so that each student gets a chance to share their dataset and explain it to their fellow students. The teachers should be actively engaged and asking questions to the students about their datasets.

If you think that this instruction will be confusing, the lead teacher might go first in explaining one a dataset they are familiar with. If you're at a loss as to a dataset, the classical machine-learning dataset to use is the iris dataset.

Another dataset with a very storied history is the Netflix Prize Dataset. It is a good example of a dataset that is very fascinating, but needs a lot of cleanup before you can do machine learning analysis on it.