Remarks on R2

Weighted regression

Mar 3, 2022

Weighted regression consists on assigning different weights to each observation and hence more or less importance at the time of fitting the regression. On way to look at it is to think as solving the regression problem minimizing Weighted Mean Squared Error(WSME) instead of Mean Squared Error(MSE) \[WMSE(\beta, w) = \frac{1}{N} \sum_{i=1}^n w_i(y_i - \overrightarrow {x_i} \beta)^2\] Intuitively, we are looking fot the coefficients that minimize MSE but putting different weights to each observation.

Inference is not valid in the dataset used for model selection.

Jan 1, 2022

Let’s say we have a dataset and we want to fit a model to it and do some inference such as obtaining the coefficients and look for their confidence intervals. For such a task we would first need to find a model that we think approximates to the real data generating process behind the phenomenon. This will be the model selection step. Then we would look at the output of our model and get the standard error of the coefficients or calculate the confidence interval or any other similar task.

Remarks on R2

Jan 1, 2022

R2 depends on the variance on the variance of the predictors Quoting from Shalizi1 Assuming a true linear model \[ Y = aX + \epsilon\] and assuming we know $a$ exactly. The variance of Y will be $a^2\mathbb{V}[X] + \mathbb{V}[\epsilon]$. So $R^2 = \frac{a^2\mathbb{V}[X]}{a^2\mathbb{V}[X] + \mathbb{V}[\epsilon]}$ This goes to 0 as $\mathbb{V}[X] \rightarrow 0$ and it goes to 1 as $\mathbb{V}[X] \rightarrow \infty$. “It thus has little to do with the quality of the fit, and a lot to do with how spread out the predictor variable is.

Linear Smoothers

Jan 1, 2022

Linear regression as smoothing Let’s assume the DGP (data generating process) is: \[ Y = \mu(x) + \epsilon\] where $\mu(x)$ is the mean Y value for that particular x and $\epsilon$ is an error with mean 0. When running OLS we are trying to approximate $\mu(x)$ with a linear function of the form $\alpha + \beta x$ and trying to retrieve the best $\alpha$ and $\beta$ minimizing the mean-squared error.

Bias Variance Tradeoff

Jan 1, 2022

Mean squared error (MSE) is a measure of how far our prediction is from the true values of the dependent variable. It’s the expectation of the squared error. The squared error being: \[(Y - \hat \mu(x))^2\] where Y is the true value and $ (x)$ is the prediction for a given x. We can decompose it into: \[ (Y - \hat \mu(x))^2 \\ = (Y - \mu(x) + \mu(x) - \hat \mu(x)^2) \\ = (Y - \mu(x))^2 + 2(Y - \mu(x))(\mu(x) - \hat \mu(x)) + (\mu(x) - \hat \mu(x))^2 \]

Spark and Pyspark

Jul 7, 2021

What’s Spark? prueba The definition says: Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters >through YARN or Spark’s standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any >Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new >workloads like streaming, interactive queries, and machine learning. Basically is a framework to work with big amounts of data stored in distributed systems instead of just one machine.

Softmax vs sigmoid

Jan 1, 2021

When using Neural Nets for a multiclass classification problem it’s standard to have a softmax layer at the end to normalize the probabilities for each class. This means that the output of our net is a vector of probabilities (one for each class) that sums to 1. If there isn’t a softmax layer at the end, then the net will output a value in each of the last cells (one for each class) but without a delimited range.

Counter Strike: chance of winning

Oct 10, 2020

This CS GO Kaggle link has data about several competitive CS GO matches. In a few words: those are 5 vs 5 matches where each team tries to kill the other or complete a task (planting or defusing the bomb depending the role you are playing) before the time expires. The goal is to win 16 rounds before the other team. After 15 rounds both teams switch sides/role.

Non Negative Matrix Factorization

Oct 10, 2020

Please follow this link It was made with Flexboard (a package to do dashboards in R) so I think it’s only visualized correctly in laptops/pc because of the layout.

Distribucion Dirichlet como prior de Multinomial

Jul 7, 2020

Basado en: http://www.mas.ncl.ac.uk/~nmf16/teaching/mas3301/week6.pdf http://www.inf.ed.ac.uk/teaching/courses/mlpr/assignments/multinomial.pdf La distribución Dirichlet es una distribución multivariada para un conjunto de cantidades $\theta_i,...,\theta_m$ donde $\theta_i >= 0$ y $\sum_{i=1}^m \theta_i = 1$. Esto la hace una candidata útil para modelar un conjunto de probabilidades de una partición (un un conjunto de eventos mutuamente excluyentes). Es decir, un grupo de probabilides de eventos excluyentes, que sumen 1. Podemos remplazar los $\theta$ por $p$ si es más claro que hablamos de probabilidades luego.

Remarks on R2

R2 depends on the variance on the variance of the predictors

Franco Betteo

Weighted regression

Inference is not valid in the dataset used for model selection.

Remarks on R2

Linear Smoothers

Bias Variance Tradeoff

Spark and Pyspark

Softmax vs sigmoid

Counter Strike: chance of winning

Non Negative Matrix Factorization

Distribucion Dirichlet como prior de Multinomial