Intro to Python

What is Data Science?

Raw Data

Alice in Wonderland

There was a table set out under a tree in front of the house, and the March Hare and the Hatter were having tea at it: a Dormouse was sitting between them, fast asleep, and the other two were using it as a cushion, resting their elbows on it, and talking over its head. “Very uncomfortable for the Dormouse,” thought Alice; “only, as it’s asleep, I suppose it doesn’t mind.”

The table was a large one, but the three were all crowded together at one corner of it: “No room! No room!” they cried out when they saw Alice coming. “There’s plenty of room!” said Alice indignantly, and she sat down in a large arm-chair at one end of the table.

“Have some wine,” the March Hare said in an encouraging tone.

Alice looked all round the table, but there was nothing on it but tea. “I don’t see any wine,” she remarked.

“There isn’t any,” said the March Hare.

“Then it wasn’t very civil of you to offer it,” said Alice angrily.

“It wasn’t very civil of you to sit down without being invited,” said the March Hare.

“I didn’t know it was your table,” said Alice; “it’s laid for a great many more than three.”

“Your hair wants cutting,” said the Hatter. He had been looking at Alice for some time with great curiosity, and this was his first speech.

“You should learn not to make personal remarks,” Alice said with some severity; “it’s very rude.”

The Hatter opened his eyes very wide on hearing this; but all he said was, “Why is a raven like a writing-desk?”

“Come, we shall have some fun now!” thought Alice. “I’m glad they’ve begun asking riddles.—I believe I can guess that,” she added aloud.

“Do you mean that you think you can find out the answer to it?” said the March Hare.

“Exactly so,” said Alice.

Structured Data

Character	Line of Dialogue
March Hare, Dormouse, Hattar	No room! No room!
Alice	There’s plenty of room!
March Hare	Have some wine
Alice	I don’t see any wine
March Hare	There isn’t any
Alice	Then it wasn’t very civil of you to offer it
March Hare	It wasn’t very civil of you to sit down without being invited
Alice	I didn’t know it was your table, it’s laid for a great many more than three
Hattar	Your hair wants cutting
Alice	You should learn not to make personal remarks, it’s very rude
Hattar	Why is a raven like a writing-desk?
Alice	I believe I can guess that
March Hare	Do you mean that you think you can find out the answer to it?
Alice	Exactly so

"Data Science Disciplines", by Calvin Andrus, licensed under CC BY-SA 3.0

What is Dataset Analysis?

"What is Artificial Intelligence", by Ravirajbhat154, licensed under CC BY-SA 4.0

vision intelligence,

natural language processing

robotics.

Describing Statistics

The quantitative approach

The quantitative approach describes and summarizes data numerically. For example, if you have a dataset of people, you might try to use numbers to describe the dataset. Here's some questions you might want to answer with the dataset.

What is the average age of people in the dataset?

If the dataset includes data about which state they live in, how many people live in each state?

How many businesses and of what type are in each state?

If you're interested in this type of data, the US Census Data for Students Website provides quantitative approach breakdowns for each state.

Quantitative approaches can provide a lot of detailed information, but when you look at lots of numbers on the screen at once, it can sometimes cause you to lose sight of the bigger picture of the data.

The visual approach

The visual approach illustrates data with charts, plots and graphs. Visual approaches usually provide less information than quantitative approaches, but they are easier to understand at a glance.

The US Census website also has a Visualizations section that shows charts of data to you rather than just giving you numbers.

Visualizations can be useful to show complicated concepts in a way that non-scientists can understand. For example, this chart about Changes in Non-English Languages Spoken in the US lets you see which languages are spoken in the US and how their rankings have changed over time. If they wanted to, they could have just provided the raw numbers and let you make your own conclusions. But by creating a chart, it is easier for you to understand the concepts they want to convey to you.

Statistics Libraries

NumPy is a third-party library for numerical computing and working with arrays. Arrays are similar to lists and you will learn about them with NumPy.
Numpy also has many useful functions for statistical analysis.

Pandas is a third-party library for numerical computing based on NumPy. It is great with handling one-dimensional and two-dimensional data.

Matplotlib is a third-party library for data visualization. It works well in combination with NumPy, Pandas and other libraries.

Finding and Reading Datasets

Kaggle

Kaggle Datasets

Video Game Sales Dataset

Netflix Titles Dataset

Goodreads-books Dataset

Looking at Datasets