From Data Analytics to Artificial Intelligence

As data generation has increased rapidly, statistical knowledge and machine learning have become critical assets in any career, business, or academic program.

In this 4 day workshop, we will break down popular and complex methods in artificial intelligence into simple concepts using a variety of hands-on exercises. We will first lay the foundation by covering the basics of data analytics and summary statistics.

Using a selected dataset, we will build a classifier using logistic regression, and then work our way up to building models using random forests and neural networks. At each step, we will walk through the R code used and teach you how to interpret the generated output.

During the last day, we will have successful Data Scientists speaking to you about their careers and the academic paths that led them to where they are now. All of the speakers have worked in both health research and industry. The theme of the talks will be the transferrable skills in Data Science. They will share with you the core skills, they found essential, in both of these fields. 



Day 1 - Saturday, April 28 2018

Data Acquisition

  • Types of big data
  • Data sources: We will go over a list of data repositories that provides access to free data
  • Workshop dataset overview 

Programming in R

  • A quick R programming overview

Data Analytics

  • Data Distributions
  • Summary Statistics
  • Correlations
  • P-values

Day 2 - Sunday, April 29 2018

Data Cleaning

  • Feature engineering and data restructuring 
  • Dealing with missing data (imputations) 
  • Dealing with highly correlated features
  • Dimension reduction and univariate analysis 


  • Linear Regression
  • Logistic Regression

Machine Learning and Prediction I

  • Training and validation 
  • Random Forest

Day 3 - Saturday, May 5 2018

Assessing Model Performance

  • Calculating AUC, accuracy and other performance metrics
  • How to use plots to visualize performance

Machine Learning and Prediction II

  • Deep Learning and Neural Networks
  • Cross Validation
  • Dealing with imbalanced data

Day 4 - Sunday, May 6 2018

Careers and Networking

  • Presentations by students
  • Guest data scientists and Networking
  • Wrap up and survey


NEW: Microsoft Student Partners will be speaking to you about Microsoft Cognitive Services and its AI applications. 


Pre-Workshop Prep:

Please bring your laptop to this workshop. You will need to install the following two tools on the laptop prior to the workshop day:


     The following videos provide step by step instructions on how to install the above tools:


  • Pre-Readings

    • 50 Years of Data Science by David Donoho 
    • Module 0, Introduction to Coding in R
    • both of the above documents can be found here:







    If you have any questions or if you have any problems with the installations, please feel free to email us at and we will be more than happy to help you troubleshoot. 

    Date & Location

    Date: April 28, April 29, May 5, and May 6 2018

    Time: 9am-12pm

    Level: Advanced

    Location: McKinsey Experience Studio

    23 Sultan Street, Toronto


    Guest Speakers: 


    Dr. Leila Pishdad, Senior Data Scientist, Royal Bank of Canada

    Dr. Pishdad received her Ph.D. from McGill University in Electrical Engineering. Her research was on using statistical signal processing, Bayesian inference and machine learning for indoor localization and positioning. She worked as a research scientist at Elekta LTD in Montreal for two years where her main focus was on the prediction and estimation of tumor location during radiation therapy sessions. She then joined a startup as a research and development engineer, and as of last year she has been working as a senior data scientist at Royal Bank of Canada. Her main areas of expertise are Bayesian inference and machine learning.


    Hossein Sarshar, Data Scientist & AI Architect, Microsoft.

    Hossein is a Data Scientist and AI Architect at Microsoft Canada which helps teams build their data science solutions on Microsoft Azure. Prior to his three-year data science journey, he had been a full-stack software engineer for almost a decade. Hossein Holds an MSC in Computer Science with focus on Machine Learning. 

    Dr. Geoffrey Hunter, AI Lead, TribalScale

    Geoffrey Hunter leads the Artificial Intelligence and Machine Learning practice at TribalScale -- an innovation firm that creates digital products for web, mobile, and emerging platforms. At TribalScale, Geoffrey leverages the latest machine-intelligence enabled technologies to solve business problems and steer future AI initiatives for organizations.

    Geoffrey has a combined seven years of data science and consulting experience for clients spanning the public sector, healthcare, finance, and media. Before joining TribalScale, Geoffrey led Deloitte’s design and implementation of end-to-end data science solutions. Geoffrey served as a subject matter expert and thought leader in data science, cognitive technologies, and robotic process automation at Deloitte and at Widgets and Digits: Data Science Consultants. Prior to consulting, he was a cancer researcher at the Ontario Institute for Cancer Research where he used machine learning to improve the prognosis of cancer patients.

    Geoffrey holds a PhD and MS in Mathematical Physiology from the University of Utah and a B.Math in Applied Math from the University of Waterloo.


    Erik Drysdale, Data Scientist, Ontario Institute for Cancer Research

    Erik started his career as an Economist with the Bank of the Canada after obtaining his Bachelor's and Master's in Economics. He worked in Ottawa and Vancouver and was focused on analyzing and forecasting the Canadian housing market. While still enjoying his work, Erik's academic interests began to expand towards machine learning and topics in Biostatistics. He went back to university and obtained an MSc in Statistics from Queen's University to be able to transition to biomedical research. He currently works as a Bioinformatician/Data-Scientist in a genomics lab focused on cancer research in Toronto.

    Erik research interests are focused on the intersection of statistics and machine learning as well as survival modelling. He has been coding in R, Python, and Matlab for more than five years and is passionate about reproducible research and the tidyverse ecosystem.

    download (1).png

    Workshop Instructors: 


    Fouad Yousif, Data Scientist

    Fouad completed his bachelor degree in science from McMaster University. During his final year, he took a course in computational biology that sparked his interest in learning programming and using it to solve biological questions. To continue the development of his programming skills after graduation, he enrolled in Seneca College and completed a certificate in Bioinformatics.

    Since then, he has obtained a Masters in Biostatistics from the University of Toronto, worked as a data scientist dealing with big sequencing data in cancer research, and been part of many high impact cancer research publications and computational developments that aim to improve cancer prognosis and enhance patients' quality of life. 

    Fouad is also a very active member in the teaching community. He has taught a variety of bioinformatics workshops in Toronto, New York, Montreal and Brazil that helped scientists learn programming and data analysis skills. In addition, he has extensive teaching experience through teaching statistics courses at Seneca College and tutoring high school students in the subjects of mathematics and statistics.

    Fouad believes that one course CAN change a career, as it did for him. He is very excited to share his knowledge and passion with all the students joining The Coding Hive.


    Cindy Yao, Data Scientist

    Cindy lived in Shanghai and Vancouver prior to settling in Toronto for university and work. She obtained a Honours Bachelor’s Degree in Human Biology at the University of Toronto. During her final year of university, she became fascinated by the concept of combining computational work with understanding genomics data.

    She went on to pursue her Master's degree at the Department of Medical Biophysics at the University of Toronto. She is now working as a data scientist in a research institute, developing biomarkers that improve cancer prognosis. She has over 6 years of experience working with R and big data visualization. She is excited to be joining The Coding Hive and to share some of her experiences and insights with the students.


    Veronica Craine, Data Scientist

    Veronica has always been fascinated by statistical and mathematical tools and techniques which can be used to solve numerous real life problems. After obtaining a M.A. in Applied Statistics from York University, it was a summer semester studying in Finland amongst interdisciplinary professionals that sparked her interest in biology. Afterwards she completed a M.Sc. in Biostatistics from the University of Victoria, with research focusing on applications to cancer patients. 

    She continued her research at OICR and was trained in bioinformatics techniques. Since 2013, Veronica has contributed to a multitude of genomic megaprojects, to developing in-house R software packages, and to creating biomarkers using machine learning techniques. 

    Veronica enjoys teaching and over the years has participated in running mathematics workshops for elementary school students, tutored math and stats to high school students, and was a teaching assistant and guest lecturer in University. She continues to provide statistical support as well as mentorship to new hires and co-ops at OICR. 

    Veronica is excited to inspire and be inspired at The Coding Hive.


    Katie Houlahan, PhD Student, University of Toronto, Ontario Institute for Cancer Research

    Katie started her training as a chemist earning an Honours Bachelor’s Degree in Chemical Biology at McMaster University. Early on in her degree she was introduced to the world of computational biology; first through an internship at the Ontario Institute for Cancer Research where she worked on an international collaboration to evaluate machine learning methods to detect cancer causing mutations. Excited by the application of machine learning in healthcare, she next worked on methods to predict cancer drug sensitivity alongside experts at the University of California, Santa Cruz and the Oregon Health & Science University.
    Now, Katie is a PhD student in the Department of Medical Biophysics at the University of Toronto studying the genetic underpinnings of prostate cancer. She is both a Vanier Scholar as well as Vector Institute Postgraduate Affiliate and is excited to share her journey and experience.