Machine Learning

This session introduces students to natural language processing with web content

Presentation Slides

Session Introduction

SLIDES 1 - 2

TEACHER LED

What Are We Doing?

Getting Data from Websites
Identifying HTML Elements
Tokenizing and Processing Text Content

Web Scraping

SLIDE 3

SELF-PACED

Students will learn:

What web scraping is
Looking at website code
Downloading web pages using code
Extracting web page information through tags
Extracting web page information with classes and IDs
Extracting web page information from tables

The students can perform addition cleaning actions on the dataset so that it can be properly used for data analysis.

Processing Webpage Content

SLIDE 4

SELF-PACED

Students will learn:

Setting up the project
Tokenizing text with python
Counting word frequency using frequency distribution functions
Using NLTK tokenization to tokenize sentences and words
Using the WordNet database to find synonyms and homonyms
Performing stemming to clean the words in the dataset
Performing lemmatization to clean the words in the dataset

Day 3 AM Session Plan

Session Introduction

Web Scraping

Processing Webpage Content