# Deep Dive: Statistical Concepts in AI

In this 2-day workshop, we will break down the 6 foundational statistical principals used in almost all AI applications. We will introduce the theories, go over how the formulas are derived, and finally code them in Python. No background in Python is required. This workshop is designed for TKS students only.

# Agenda:

**From Variance to Regression**

Learn how to statistically describe your individual variables and assess the relationship between multiple variables. Understanding how variance and covariance are calculated will help you understand how weights are assigned to your features when you run a regression model. Learning the basics of regression will help you grasp the more complex machine learning models, such as neural networks.

Summary statistics

Variance , covariance and correlation calculations

Regression

*Python Application*: Correlation visualization using heatmaps

**Probability and Hypothesis Testing**

Probability is a fundamental concept in machine learning and data science. In this section, we will go over the different types of probability distributions and demonstrate when they are used. We will also break-down how A/B tests are executed and properly interpreted. You will gain knowledge in understanding what p-values are and how they are used to assess significance of any statistical test.

Probability distributions

Hypothesis testing

P-values

A/B testing

T-test, Chisq test and Shapiro wilk test

*Python Application:* A/B test implementation

**Dealing With Missing Data**

Any data you will be working with will contain gaps. In this section, we will cover the various statistical techniques that are used to deal with missing data, including data imputation.

Outlier detection

Missing data imputation.

*Python Application:* Missing Data Imputation

**Dimensionality Reduction**

Data might contain thousands of features that are redundant. Running prediction on the full dataset is not computationally efficient. In statistics, machine learning, and information theory, dimensionality reduction is the process of reducing the number of features in your dataset by obtaining a set of principal features. Principal component analysis (PCA) is a very popular and powerful dimensionality reduction method used in data science. In this section, we will break down how PCA works and implement the analysis in python.

PCA

*Python Application*: PCA Implementation

**Model Tuning and Performance Assessment**

Fitting a prediction model might be an easy task, the challenge is to assess its performance and identify the set of parameters that maximize it. In this section we will cover methods that are used to tune and assess the quality of your fitted models.

Cost functions

Gradient descent

Numeric modelling assessment

Classification modelling assessment

AUC

*Python Application*: AUC calculation and visualization

**Bayesian Statistics**

Bayes’ theorem is considered to be one of most important theorems in the field of mathematical statistics and probability theory. Bayesian Inference is widely used by data scientists in various fields and applications. In this section, we will walk through Bayes’ theorem derivation and application in developing a spelling collector.

Bayes’ theorem

*Python Application*: Spelling corrector

### Date & Location

Date: Sunday **February 3rd 2019**

Time: 10am-5pm

Level: TKS Students Only

Location: **DMZ Sandbox**