# Difference between revisions of "Statistics for Intelligence"

Line 21: | Line 21: | ||

* [http://bloomberg.github.io/foml/#lectures Bloomberg Lectures] | * [http://bloomberg.github.io/foml/#lectures Bloomberg Lectures] | ||

* [http://towardsdatascience.com/data-science-concepts-explained-to-a-five-year-old-ad440c7b3cbd Data Science Concepts Explained to a Five-year-old | Megan Dibble - Toward Data Science] | * [http://towardsdatascience.com/data-science-concepts-explained-to-a-five-year-old-ad440c7b3cbd Data Science Concepts Explained to a Five-year-old | Megan Dibble - Toward Data Science] | ||

− | + | ||

<youtube>sxQaBpKfDRk</youtube> | <youtube>sxQaBpKfDRk</youtube> | ||

Line 29: | Line 29: | ||

<youtube>KBnwMSFdK6E</youtube> | <youtube>KBnwMSFdK6E</youtube> | ||

<youtube>MdHtK7CWpCQ</youtube> | <youtube>MdHtK7CWpCQ</youtube> | ||

+ | |||

+ | |||

+ | * The [[Evaluation Measures - Classification Performance#Confusion Matrix|Confusion Matrix]] - one of the fundamental concepts in machine learning is the Confusion Matrix. Combined with Cross Validation, it's how one decides which machine learning method would be best for a particular dataset. | ||

== Data Representation == | == Data Representation == |

## Revision as of 14:13, 26 April 2020

YouTube search... ...Google search

- Math for Intelligence
- Statistics ...articles | Wikipedia
- Causation vs. Correlation
- StatQuest YouTube Channel | Josh Starmer
- Probability Cheatsheet
- Statistical Learning | T. Hastie, R. Tibshirani - Stanford
- Fundamental Statistics Jupyter Notebook | Jon Tupitza
- Statistics and Probability | Khan Academy
- Google's Crash Course
- Neural Networks and Deep Learning - online book | Michael A. Nielsen
- Brilliant.org
- Bloomberg Lectures
- Data Science Concepts Explained to a Five-year-old | Megan Dibble - Toward Data Science

- The Confusion Matrix - one of the fundamental concepts in machine learning is the Confusion Matrix. Combined with Cross Validation, it's how one decides which machine learning method would be best for a particular dataset.

## Data Representation

### Stem and Leaf Plot

a special table where each data value is split into a "stem" (the first digit or digits) and a "leaf" (usually the last digit). | Math is Fun A stem and leaf plot is a great way to organize data by the frequency. It is a great visual that also includes the data. So if needed, you can just take a look to get an idea of the spread of the data or you can use the values to calculate the mean, median or mode. SoftSchools

### Histograms

Histograms are one of the most basic statistical tools that we have. They are also one of the most powerful and most frequently used.

### Mean, Median, and Mode

- Mean : The sum of all the data divided by the number of data sets. Example: 8 + 7 + 3 + 9 + 11 + 4 = 42 ÷ 6 = Mean of 7.0
- Median : The mid data point in a data series organised in sequence. Example : 2 5 7 8 11 14 18 21 22 25 29 (five data values either side)
- Mode : The most frequently occurring data value in a series. Example : 2 2 4 4 4 7 9 9 9 9 12 12 13 ( ‘9’ occurs four times, so is the ‘mode’)

### Interquartile Range (IQR)

The interquartile range is a measure of where the “middle fifty” is in a data set. Where a range is a measure of where the beginning and end are in a set, an interquartile range is a measure of where the bulk of the values lie. That’s why it’s preferred over many other measures of spread (i.e. the average or median) when reporting things like school performance or SAT scores. The interquartile range formula is the first quartile subtracted from the third quartile. Interquartile Range (IQR): What it is and How to Find it | Statistics How To

### Box & Whisker Plots (Boxplot)

presents information from a five-number summary especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set. Box and whisker plots are also very useful when large numbers of observations are involved and when two or more data sets are being compared. Constructing box and whisker plots | Statistics Canada

### Standard Deviation

Greek letter sigma σ for the population standard deviation or the Latin letter s for the sample standard deviation) is a measure that is used to quantify the amount of variation or dispersion of a set of data values. A low standard deviation indicates that the data points tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values. Standard Deviation | Wikipedia

## Statistical Tests

## Probability

Probability is the likelihood or chance of an event occurring. Probability = the number of ways of achieving success. the total number of possible outcomes.

### Conditional Probability

the probability of an event ( A ), given that another ( B ) has already occurred.

### Probability Independence

In probability theory, two events are independent, statistically independent, or stochastically independent[1] if the occurrence of one does not affect the probability of occurrence of the other (equivalently, does not affect the odds). Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other. The concept of independence extends to dealing with collections of more than two events or random variables, in which case the events are pairwise independent if each pair are independent of each other, and the events are mutually independent if each event is independent of each other combination of events. Independence (probability theory) | Wikipedia

### Bayes' Theorem

the probability of an event, based on prior knowledge of conditions that might be related to the event. Bayes' Theorem | Wikipedia

#### Bayesian Statistics

#### Bayesian Hypothesis Testing

## Regression

- Regression | Wikipedia a statistical technique for estimating the relationships among variables.
- Regression Analysis

## Books

- Cartoon Guide to Statistics | Larry Gonick & Woollcott Smith
- The Cartoon Introduction to Statistics | Grady Klein & Alan Dabney