Regularization: A way to select features¶

We can add a penalty term to the loss/objective function to put a check on the weights' values.

Type	Penalty Term
Lasso	L1 norm (\(\lambda\cdot\sum_{i=1}^p \lvert w_i \rvert\))
Ridge	L2 norm (\(\lambda\cdot\sum_{i=1}^pw_i^2\))

where \(\lambda\) is a positive value. Deciding a good value for it is critical.

When \(\lambda = 0\), penalty term does not have any effect on the loss.

If value is too large, al weights correspond to 0 (null model).

Ridge Regression¶

This penalty term can reduce the variance between the weights a lot.
Should be applied after standardising the predictors.
If number of features is large, subset selection requires large number of possbiile models. But here, we need to fit only model for a given \(\lambda\), and computation turns out to be very simple.
Weights tend towards 0 but never actually become 0.
This leads to final model including all predictors, creating a challenge in model interpretation.