Data Science in Genomics and Drug Discovery

Rapid advancements in DNA sequencing technologies over the past decade have introduced a vast amount of genomic data. Scientists around the world are analyzing these data to gain insights into the genetic architecture of complex diseases. As the cost of data generation is decreasing, personalized medicine is becoming more accessible and the demand for talent in data science is growing. 

In this four-day workshop, you will have the chance to work with DNA sequencing data and learn about the computational and machine learning methods used in analyzing such data. You will gain an understanding of how this information is used in disease diagnosis, drug discovery, and ultimately improving patients' health.



Day 1:  Data Science and its Applications in Genomics (Saturday, February 3, 2018)

During the first day of the workshop, we will break down the term Data Science and explain its relation to Big Data, Artificial Intelligence, and Machine Learning. We will also discuss what a genome is, how genomic data are generated using Next Generation Sequencing technologies (NGS), and how Data Science utilizes genomic data to improve patients' health.  

Day 2: Introduction to the R Programming Language (Sunday, February 4, 2018)

You will learn a variety of useful programming skills, from parsing files to plotting and creating your own functions. We will also go over a breast cancer dataset which you will be analyzing on Day 3.

Day 3: Statistical Modelling and Machine Learning (Saturday, February 10, 2018)

We will introduce you to statistical concepts such as correlations, linear regression, model assessment and machine learning. You will apply the programming skills you gained in Day 2 to build a machine learning classifier that predicts breast cancer patients' risk groups using genomic data.  

NEW: A couple of MSPs (Microsoft Student Partners) will be speaking to you about Microsoft Cognitive Services, Bots/Health Applications, and how a project that won Imagine Cup utilized Microsoft's Cognitive Services. 


Day 4: How To Become a Successful Data Scientist (Sunday, February 11, 2018)

Where do you go from here? We will guide you through steps that you can take to further enrich your Data Science skills. We will have successful Data Scientists from the fields of genomics and drug discovery speaking about their careers and the academic paths that led them to where they are now.

Pre-Workshop Installations:

Please bring your laptop to this workshop. You will need to install the following two tools on the laptop prior to the workshop day:



  • Data and Slides



The following videos provide step by step instructions on how to install the above tools:

If you have any questions or if you have any problems with the installations, please feel free to email us at and we will be more than happy to help you troubleshoot. 

Date & Location

Date: February 3rd, 4th, 10th, and 11th 2018

Time: 9am-1pm

Location: McKinsey Experience Studio

23 Sultan Street, Toronto


Guest Speakers: 


Dr. Lauren Erdman, Data Scientist & Project Manager, SickKids Hospital

Lauren Erdman is the Project Manager for SickKids Hospital’s new Data Science team and a PhD student in Computer Science at University of Toronto under the supervision of Anna Goldenberg. Previously she completed a MSc in Computer Science and a MSc in Biostatistics at University of Toronto under the supervision of Anna Goldenberg and Lisa Strug, respectively. Her research is focused on developing and applying machine learning methods primarily for data integration and improved translational discovery in the areas of population genetics, genome biology, and complex disease.


Marc Laforet, Software Developer, Cyclica Inc.

Marc discovered programming while using Next-Generation Sequencing (NGS) for his undergraduate research. After teaching himself how to code, he now applies his passion for programming, statistics and biochemistry to contribute to Cyclica's Software product. He has used machine learning to make data driven insights about pharmaceuticals. In his free time, Marc can be found volunteering for Ladies Learning Code, making art with his computer or eating ice cream. 


Dr. Ann Meyer, Knowledge and Research Exchange Manager, Ontario Institute for Cancer Research

Ann is the coordinator for the Canadian Bioinformatics Workshops series (, the Knowledge and Research Exchange manager at OICR, and the Chair of the Promotion and Outreach committee for the Global Organization for Bioinformatics Learning, Education and Training (  She received her Ph.D. in Plant Agriculture with a focus on genetics and plant breeding from the University of Guelph.  She started  a post doctoral fellowship in the Department of Pathobiology before moving to her current role at OICR.


Katie Houlahan, PhD Student, University of Toronto, Ontario Institute for Cancer Research

Katie started her training as a chemist earning an Honours Bachelor’s Degree in Chemical Biology at McMaster University. Early on in her degree she was introduced to the world of computational biology; first through an internship at the Ontario Institute for Cancer Research where she worked on an international collaboration to evaluate machine learning methods to detect cancer causing mutations. Excited by the application of machine learning in healthcare, she next worked on methods to predict cancer drug sensitivity alongside experts at the University of California, Santa Cruz and the Oregon Health & Science University.
Now, Katie is a PhD student in the Department of Medical Biophysics at the University of Toronto studying the genetic underpinnings of prostate cancer. She is both a Vanier Scholar as well as Vector Institute Postgraduate Affiliate and is excited to share her journey and experience.

download (1).png

Workshop Instructors: 


Fouad Yousif, Data Scientist

Fouad completed his bachelor degree in science from McMaster University. During his final year, he took a course in computational biology that sparked his interest in learning programming and using it to solve biological questions. To continue the development of his programming skills after graduation, he enrolled in Seneca College and completed a certificate in Bioinformatics.

Since then, he has obtained a Masters in Biostatistics from the University of Toronto, worked as a data scientist dealing with big sequencing data in cancer research, and been part of many high impact cancer research publications and computational developments that aim to improve cancer prognosis and enhance patients' quality of life. 

Fouad is also a very active member in the teaching community. He has taught a variety of bioinformatics workshops in Toronto, New York, Montreal and Brazil that helped scientists learn programming and data analysis skills. In addition, he has extensive teaching experience through teaching statistics courses at Seneca College and tutoring high school students in the subjects of mathematics and statistics.

Fouad believes that one course CAN change a career, as it did for him. He is very excited to share his knowledge and passion with all the students joining The Coding Hive.


Michelle Chan-Seng-Yue, Bioinformatician & College Professor

Michelle is a Research Associate at the Princess Margaret Cancer Centre studying acute lymphoblastic leukemia. She has been working in the field of bioinformatics, the computational analysis of biological data, for over 9 years and has a strong background in high throughput sequencing. She is also a professor at Seneca College teaching data analysis as part of the Bioinformatics certificate program.

Over the years she has gained strong computational skills including programming in perl and R, data processing and visualization. She is excited to be involved in The Coding Hive and can’t wait to meet motivated young individuals.

IMG_9482-02-01 (1).jpeg

Cindy Yao, Data Scientist

Cindy lived in Shanghai and Vancouver prior to settling in Toronto for university and work. She obtained a Honours Bachelor’s Degree in Human Biology at the University of Toronto. During her final year of university, she became fascinated by the concept of combining computational work with understanding genomics data.

She went on to pursue her Master's degree at the Department of Medical Biophysics at the University of Toronto. She is now working as a data scientist in a research institute, developing biomarkers that improve cancer prognosis. She has over 6 years of experience working with R and big data visualization. She is excited to be joining The Coding Hive and to share some of her experiences and insights with the students.