Simple NN Demo for Binary Classification

 

Pima Indian Dataset is Binary classification problem in which we are trying to predict whether a person is suffering from diabetes based on the given 8 features in the dataset.This is the classic medical problem where it would be great to predict whether performing tests would be necessary or not. If it is predicted to be a possibility of a diabetic person, only then it would be advisable to have these tests done. The data is of about 768 females, all over 21 years and their condition of diabetes. The first few data entries are:

Pregnancies Glucose Blood Pressure Skin Thickness Insulin BMI Diabetes Pedigree Function Age Outcome
6 148 72 35 0 33.6 0.627 50 1
1 85 66 29 0 26.6 0.351 31 0
8 183 64 0 0 23.3 0.672 32 1
1 89 66 23 94 28.1 0.167 21 0
0 137 40 35 168 43.1 2.288 33 1

With the description as:

  Pregnancies Glucose Blood Pressure Skin Thickness Insulin BMI Diabetes Pedigree Function Age
Count 768 768 768 768 768 768 768 768
Mean 3.8450 120.8945 69.1054 20.5364 79.7994 31.9925 0.4718 33.2408
Std 3.3695 31.972618 19.355807 15.952218 115.244002 7.884160 0.331329 11.760232
Min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.078000 21.000000
25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 0.243750 24.000000
50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 0.372500 29.000000
75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 0.626250 41.000000
Max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 2.420000 81.000000

Let us check out the correlation between different data features for distinguishing a diabetic and non-diabetic person in tabular form

  Pregnancies Glucose Blood

Pressure

Skin

Thickness

Insulin BMI Diabetes

Pedigree

Function

Age Outcome
Pregnancies 1.000000 0.129459 0.141282 -0.081672 -0.073535 0.017683 -0.033523 0.544341 0.221898
Glucose 0.129459 1.000000 0.152590 0.057328 0.331357 0.221071 0.137337 0.263514 0.466581
Blood

Pressure

0.141282 0.152590 1.000000 0.207371 0.088933 0.281805 0.041265 0.239528 0.065068
Skin

Thickness

-0.081672 0.057328 0.207371 1.000000 0.436783 0.392573 0.183928 -0.113970 0.074752
Insulin -0.073535 0.331357 0.088933 0.436783 1.000000 0.197859 0.185071 -0.042163 0.130548
BMI 0.017683 0.221071 0.281805 0.392573 0.197859 1.000000 0.140647 0.036242 0.292695
Diabetes

Pedigree

Function

-0.033523 0.137337 0.041265 0.183928 0.185071 0.140647 1.000000 0.033561 0.173844
Age 0.544341 0.263514 0.239528 -0.113970 -0.042163 0.036242 0.033561 1.000000 0.238356
Outcome 0.221898 0.466581 0.065068 0.074752 0.130548 0.292695 0.173844 0.238356 1.000000

It is quite obvious from above that chance of having diabetes depends on Pregnancies, BMI and Age of the person along with Glucose level of the person. Since its binary classification problem, we can use traditional machine learning classification algorithms. The accuracy we are receiving,

Linear Regression scoring:  mean: 0.732609 (Std. Deviation: 0.075963)

K Neighbors scoring:  mean: 0.680435 (Std. Deviation: 0.065254)

Decision Tree Classifiers scoring:  mean: 0.654348 (Std. Deviation: 0.066260)

Gaussian naive Bayes scoring:  mean: 0.726087 (Std. Deviation: 0.061641)

Support Vector Machine scoring:  mean: 0.619565 (Std. Deviation: 0.087092)

Now, let us try to create a feed forward system of neural networks using Keras. The developed system will be having 3 layers, input, output and one hidden layer. The activation function of the first layer is ‘Relu’ while for the next layer, it is ‘sigmoid’. The loss function is ‘Binary Cross entropy’ and optimizer is ‘adam’. The number of Epochs set is 150 and the batch size is 10. The model results out accuracy of about 75% which is much better than traditional machine learning algorithms and that just only 3 layers.

This proves how powerful neural network can be.

You can find the working code here

 

About the author: sagarjain2030

Has one comment to “Simple NN Demo for Binary Classification”

You can leave a reply or Trackback this post.

Leave a Reply

Your email address will not be published.Email address is required.