Description
The course's central concept is random variable — that is, a variable whose values are determined by random experiment. We use random variables as a model for the data generation processes we want to investigate. Data properties, such as expected value, variance, and correlations, are inextricably linked to random variable properties. Dependencies between random variables are an important factor in predicting unknown quantities based on known values, which is the foundation of supervised machine learning. We begin with the concept of independent events and conditional probability, then introduce and investigate the properties of two major classes of random variables: discrete and continuous. Finally, we learn about various types of data and their relationships to random variables.
Syllabus :
1. Conditional probability and Independence
- Conditional probability. Motivation and Example
- Conditional probability. Definition
- Independent events. Example
- Independent events. Definition
- Mosaic Plot. Visualization of conditional probabilities and Independence
- Using independence to find probabilities. Examples
- Pairwise and mutual independence
- Bernoulli Scheme
- Law of total probability
- Bayes's rule
- Python for conditional probabilities
2. Random variables
- Examples of random variables
- Mathematical definition of random variable
- Probability distribution and probability mass function (PMF)
- Binomial distribution
- Expected value of random variable. Motivation and definition
- Expected value example and calculation
- Expected value as best prediction
- Variance of random variable. Motivation and definition
- Discrete random variables with infinite number of values
- Saint Petersburg Paradox. Example of infinite expected value
- Geometric and Poisson distributions
- Generating discrete random variables with Python
- Numpy, scipy and matplotlib for generation and visualization of common distributions
3. Systems of random variables; properties of expectation and variance, covariance and correlation.
- Linear transformations of random variables
- Linearity of expected value
- Symmetric distributions and their expected values
- Functions of random variables
- Properties of variance
- Sum of random variables. Expected value and variance
- Joint probability distribution
- Marginal distribution
- Independent random variables
- Another example of non-independent random variables
- Expected value of product of independent random variables
- Variance of sum of random variables. Covariance
- Properties of covariance
- Correlation of two random variables
4. Continuous random variables
- Continuous random variables. Motivation and Example
- Probability density function (PDF)
- Cumulative distribution function (CDF)
- Properties of CDF
- Linking PDF and CDF
- Examples of probability density functions
- Histogram as approximation to a graph of PDF
- Expected value of continuous random variable
- Variance of continuous random variable. Properties of expected value and variance
- Transformations of continuous random variables and their PDFs
- Joint CDF and PDF. Level charts. Marginal PDF
- Independence, covariance and correlation of continuous random variables
- Mixed random variables. Example
- Generating and visualizing continuous random variables with Python
- Generating correlated random variables with Python
5. From random variables to statistical data. Data summarization and descriptive statistics.
- Basic statistical model
- Variable types in statistics. Conversion of categorical variables to numeric
- Measures of central tendency. Average, median and mode
- Measures of statistical dispersion. Sample variance, quartiles and interquartile range
- Distribution visualization. Histograms and bar plots
- Descriptive statistics of sample vs population
- Descriptive statistics in Pandas
- Basic visualizations of statistical data in Python
- Converting columns in dataframes
6. Correlations and visualizations
- Two numeric variables. Scatter plot and levels of population's joint PDF
- Sample covariance and Pearson's correlation
- Correlation vs causation
- Rank correlations for non-linearly dependent data and ordered categorical data
- Finding correlations in Pandas