Skip to content

Regularization: A way to select features

We can add a penalty term to the loss/objective function to put a check on the weights' values.

Type Penalty Term
Lasso L1 norm (\(\lambda\cdot\sum_{i=1}^p \lvert w_i \rvert\))
Ridge L2 norm (\(\lambda\cdot\sum_{i=1}^pw_i^2\))

where \(\lambda\) is a positive value. Deciding a good value for it is critical.

When \(\lambda = 0\), penalty term does not have any effect on the loss.

If value is too large, al weights correspond to 0 (null model).

Ridge Regression

  • This penalty term can reduce the variance between the weights a lot.
  • Should be applied after standardising the predictors.
  • If number of features is large, subset selection requires large number of possbiile models. But here, we need to fit only model for a given \(\lambda\), and computation turns out to be very simple.
  • Weights tend towards 0 but never actually become 0.
  • This leads to final model including all predictors, creating a challenge in model interpretation.

Lasso Regression

  • Weights can become 0 where importance is low.
  • Produces more interpretabe modelss as they involve only a subset of predictors.
  • As \(\lambda \uparrow\), variance \(\downarrow\) and bias \(\uparrow\).