## Data Science

### What does this corse contain?

This course will introduce you to what data science is and what data scientists do. You’ll discover the applicability of data science across fields, and learn how data analysis can help you make data driven decisions, as well as build machine learning models and deploy data science solution to a web app.

### What do you need?

Basic knowledge of Python programming language.

### Who is this course for?

▣ Software Engineer

▣ Computer Science Engineer

▣ Data Analyst

### What are the Objectives?

▣ Use principles of statistics and probability to design and execute A/B tests and recommendation engines to assist businesses in making data-automated decisions.

▣ Build Machine Learning Models & make predictions

▣ Deploy a data science solution to a web app.

▣ Manipulate and analysedistributed datasets using Apache Spark

▣ Communicate results effectively to stakeholders.

### Course Outlines: Basic level Data Scientist program

**Module 1: **

Introduction to Data Science

The Data Science Process

Communicating to Stakeholders

**Module 2:**

Linear Algebra

Vectors

Linear Combination

Linear Transformation and Matrices

**Module 3: **

Practical Statistics

Data Types

Measures of centre(mean, median, mode)

Standard Deviation, Variance, Outliers

Probability

Binomial Distribution

Conditional Probability

Bayes Rule

Normal Distribution theory

Sampling distributions and the Central Limit Theorem

Confidence Intervals

Hypothesis Testing

Type I and type II errors

P-values

Null Hypothesis, Alternate Hypothesis

**Module 4: **

Data Engineering

ETL pipelines

Extract –CSV, JSON, XML, SQL databases

Transform –combining, cleaning, encoding, missing data, duplicate data

Dummy data, Outlier data, scaling data

Feature Engineering

Load

**Module 5: **

Database Programming

**Module 6: **

Data Preparation, Data

Wrangling

**Module 7: **

Data Visualization

Data Visualization in Data Analysis

Design of Visualizations

Univariate Exploration of Data

Bar Charts

Pie Charts

Histograms

Bivariate Exploration of Data

Scatterplots and Correlation

Overplotting, Transparency, and Jitter

Heat Maps

Violin Plots

Box Plots

Clustered Bar Charts

Faceting

Line Plots

Swarm Plots

Multivariate Exploration of Data

Feature Engineering

Explanatory Visualizations

A Data Visualization in Data Analysis –Case study

### Advanced level Data Scientist program

**Module 1: **

Time Series Analysis & Forecasting

**Module 2: **

Natural Language Processing

Tokenization

Stop Words

Speech tagging

Named Entity Recognition

Stemming and Lemmatization

Feature extraction

Bag of words

TF-IDF

One-Hot Encoding

Word Embeddings

Modeling

**Module 3: **

Deep Learning

Introduction to Neural Networks

Implementing Gradient Descent

Training Neural Networks

Keras

Deep Learning with PyTorch

Image classifier project

**Module 4: **

Deploying model on Web / Dashboard

Front-End, HTML, Flask

Deploying model

**Module 5: **

Experimental Design and Recommendations

Intro to Experiment Design and Recommendation Engines

Experiment Design & A/B Testing

Concepts in Experiment Design

Types of Experiment

Types of Sampling

Measuring Outcomes

Creating Metrics

Controlling Variables

Checking Validity

Checking Bias

Ethics in Experimentation

A SMART Mnemonic for Experiment Design

Statistical Considerations in Testing

A/B Testing Case Study

**Module 6: **

Recommendation Engines