Deep Dive: Statistical Concepts in AI
In this 2-day workshop, we will break down the 6 foundational statistical principals used in almost all AI applications. We will introduce the theories, go over how the formulas are derived, and finally code them in Python. No background in Python is required. This workshop is designed for TKS students only.
From Variance to Regression
Learn how to statistically describe your individual variables and assess the relationship between multiple variables. Understanding how variance and covariance are calculated will help you understand how weights are assigned to your features when you run a regression model. Learning the basics of regression will help you grasp the more complex machine learning models, such as neural networks.
Variance , covariance and correlation calculations
Python Application: Correlation visualization using heatmaps
Probability and Hypothesis Testing
Probability is a fundamental concept in machine learning and data science. In this section, we will go over the different types of probability distributions and demonstrate when they are used. We will also break-down how A/B tests are executed and properly interpreted. You will gain knowledge in understanding what p-values are and how they are used to assess significance of any statistical test.
T-test, Chisq test and Shapiro wilk test
Python Application: A/B test implementation
Dealing With Missing Data
Any data you will be working with will contain gaps. In this section, we will cover the various statistical techniques that are used to deal with missing data, including data imputation.
Missing data imputation.
Python Application: Missing Data Imputation
Data might contain thousands of features that are redundant. Running prediction on the full dataset is not computationally efficient. In statistics, machine learning, and information theory, dimensionality reduction is the process of reducing the number of features in your dataset by obtaining a set of principal features. Principal component analysis (PCA) is a very popular and powerful dimensionality reduction method used in data science. In this section, we will break down how PCA works and implement the analysis in python.
Python Application: PCA Implementation
Model Tuning and Performance Assessment
Fitting a prediction model might be an easy task, the challenge is to assess its performance and identify the set of parameters that maximize it. In this section we will cover methods that are used to tune and assess the quality of your fitted models.
Numeric modelling assessment
Classification modelling assessment
Python Application: AUC calculation and visualization
Bayes’ theorem is considered to be one of most important theorems in the field of mathematical statistics and probability theory. Bayesian Inference is widely used by data scientists in various fields and applications. In this section, we will walk through Bayes’ theorem derivation and application in developing a spelling collector.
Python Application: Spelling corrector
Date & Location
Date: Sunday February 3rd 2019
Level: TKS Students Only
Location: DMZ Sandbox