Deep Dive: Statistical Concepts in AI

In this 2-day workshop, we will break down the 6 foundational statistical principals used in almost all AI applications. We will introduce the theories, go over how the formulas are derived, and finally code them in Python. No background in Python is required. This workshop is designed for TKS students only.

Agenda:

From Variance to Regression

Learn how to statistically describe your individual variables and assess the relationship between multiple variables. Understanding how variance and covariance are calculated will help you understand how weights are assigned to your features when you run a regression model. Learning the basics of regression will help you grasp the more complex machine learning models, such as neural networks.

Summary statistics
Variance , covariance and correlation calculations
Regression

Python Application: Correlation visualization using heatmaps

Probability and Hypothesis Testing

Probability is a fundamental concept in machine learning and data science. In this section, we will go over the different types of probability distributions and demonstrate when they are used. We will also break-down how A/B tests are executed and properly interpreted. You will gain knowledge in understanding what p-values are and how they are used to assess significance of any statistical test.

Probability distributions
Hypothesis testing
P-values
A/B testing
T-test, Chisq test and Shapiro wilk test

Python Application: A/B test implementation

Dealing With Missing Data

Any data you will be working with will contain gaps. In this section, we will cover the various statistical techniques that are used to deal with missing data, including data imputation.

Outlier detection
Missing data imputation.

Python Application: Missing Data Imputation

Dimensionality Reduction

Data might contain thousands of features that are redundant. Running prediction on the full dataset is not computationally efficient. In statistics, machine learning, and information theory, dimensionality reduction is the process of reducing the number of features in your dataset by obtaining a set of principal features. Principal component analysis (PCA) is a very popular and powerful dimensionality reduction method used in data science. In this section, we will break down how PCA works and implement the analysis in python.

Python Application: PCA Implementation

Model Tuning and Performance Assessment

Fitting a prediction model might be an easy task, the challenge is to assess its performance and identify the set of parameters that maximize it. In this section we will cover methods that are used to tune and assess the quality of your fitted models.

Cost functions
Gradient descent
Numeric modelling assessment
Classification modelling assessment
AUC

Python Application: AUC calculation and visualization

Bayesian Statistics

Bayes’ theorem is considered to be one of most important theorems in the field of mathematical statistics and probability theory. Bayesian Inference is widely used by data scientists in various fields and applications. In this section, we will walk through Bayes’ theorem derivation and application in developing a spelling collector.