Skip To Content
Data Science Essentials is a Program

Data Science Essentials

Started May 2, 2018

$150 Enroll

Full program description

These professional development modules are self-paced, or asynchronous. They are do not follow a sequential order so you may choose which modules you complete and in what order. These modules are designed in text format and a few in video format; the entire package is designed to be completed within 12 weeks, or approximately 3 hours per week.

As you work through the modules, we provide Teaching Assistant (TA) support through email and virtual office hours. While this is a non-credit baring package, each module has its own grading policy to measure your learning outcomes. In general, grading is based on assignments/projects, online discussions, and quizzes. A grade will be aggregated at the end of the 12 weeks so that you can tailor your future learning initiatives.  

For short descriptions of each module offered in the package, please see information below:

Introduction to R

The goal of this course is to teach you how to program in R and how to use R for effective data analysis. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. 

Basics of Python

In this course, we are going to learn about the basics of python. It is good to have hands on experience of the concepts you have learned so, at the end, you will get introduced to one of the most commonly performed tasks of classification in data mining and machine learning and we will implement a recommendation system using Scikit-learn package. We will also introduce you to other useful packages, like Matplotlib and Scikit-learn, which are being widely used in the industry.

Introduction to SQL

This is a course on SQL(Structured Query Language) which covers the basic concepts of databases and SQL as a language for accessing databases. We will start the course by introducing the need to use databases, building entity-relationship models, installing MySQL and then we will proceed to programming(querying) the database using SQL. At the end of the course, you will work to create a sample database, input data and perform complex SQL operations on the database to retrieve the desired output.

Basics of Java

The course is an interweaving of two parts. The first part is knowledge about basic programming with Java, and the second part is to use the knowledge to work on a project. The two parts will not be strictly separated. Once we are equipped with enough knowledge to take a small step, we make some progress in our project. In the process, we may encounter some confusion and wonder what could be done next. Then we learn something new that helps us resolve the problem, and make further progress. This cycle repeats.

Introduction to C++

This course requires no prior knowledge of any programming languages, and we will not cover programming languages other than C/C++. However, we will go through the C++ appendix of Eddelbuettel's "Seamless R and C++ integration with Rcpp," which provides the necessary knowledge of C/C++ for reading how to use Rcpp in the main text. After learning a solid knowledge of standard C/C++, you will have a basic idea how to write a C/C++ program. Whenever you need to optimize their R and Python, you will have enough knowledge to start learning Rcpp and Cython by yourselves. 


This course is for those students who are not familiar with DataBase concepts and want to learn NoSQL databases like MongoDB. Students who know MongoDB and wants to work on a project using python and MongoDB to develop their cross-technology skills will also be benefited by taking this course. In this course, we will work on yelp data set which contains more than 1.5 million records and hence is ideal to learn MongoDB MapReduce concept.

Linear Algebra

In this course, we will cover basic Linear Algebra and Calculus used in Machine Learning with PythonMachine Learning with Python course breaks the topics of machine learning into different Pages. Each page uses different levels of Linear Algebra and Calculus from easy to hard. Thus, we organize our course in different parts. By understanding each part (with their previous parts), we understand the Linear Algebra and Calculus used in a particular page in Machine Learning with Python course.

In particular, Part A and B cover one variable differentiable Calculus used in Parameter ESTIMATION. Part C covers basic matrix operations. In particular, we will explain the details when a matrix is invertible. Part D explains multi-variable differentiable Calculus which is used in Curve Fitting and Error Function.