Logistic Regression for Advanced Learners

How Logistic Regression is better than other algorithms?

Theoretical Answer: No algorithm is in general ‘better’ than another. It basically states that any two optimization algorithms are equivalent when their performance is averaged across all possible problems.

Practical Answer: If you already have your data setup for one of them, simply run both with a holdout set and compare which one does better using whatever appropriate measure of performance you care about. It can make a huge difference how you represent your features to make one model perform better than another on the exact same task and dataset.

Scientific Answer: Most of those reasons center around the bias-variance trade-off. But there is also some empirical work comparing various algorithms across many datasets and drawing some conclusions, what types of problems tend to do better with trees vs logistic regression.

How to get better Accuracy results in logistic regression model?

Generally, in case your model is biased you need to come up with a more complex model — one way to achieve this is to add more (hopefully good) features as been pointed out previously. But this can be a no easy matter and probably depends on your domain expertise. What you, however, could try out of the box is features interaction.

If you were to stick with logistic regression, perhaps oversampling your smaller cohort may help to enrich your predictive performance. This could be achieved using a weighted logistic regression weighted logistic regression.

Another strategy could be to use either forward or backward feature selection.

How to analyze this ROC curve and improve the performance?

Try to add class weights when computing your loss function derived from your label distribution. This way you will give more emphasis on labels that are misrepresented in your dataset due to low frequency.

An ROC curve demonstrates several things:

  1. It shows the trade-off between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity).
  2. The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test.
  3. The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.
  4. The area under the curve is a measure of text accuracy.

R Code:

Logistic regression

Data Files:

cr<-read.csv(“C:/Users/Stock_BGL/Desktop/R/R_Module_Day_7.2_Credit_Risk_Train_data.csv”, na.strings = c(“”, ” “, “NA”)) 

In the above example of reading the dataset, (C:/Users/Stock_BGL/Desktop/R) this path changes according to the path it is saved on the respective machine.

For ex., if you have saved the file (R_Module_Day_7.2_Credit_Risk_Train_data) in desktop, the reading data set will be (Desktop) followed by the  /dataset name

Credit Risk Train data
Credit Risk Test data
Credit Risk Validate data

Leave a Reply

Notify of