Introduction

Logistic Regression in Python – Introduction ”; Previous Next Logistic Regression is a statistical method of classification of objects. This chapter will give an introduction to logistic regression with the help of some examples. Classification To understand logistic regression, you should know what classification means. Let us consider the following examples to understand this better − A doctor classifies the tumor as malignant or benign. A bank transaction may be fraudulent or genuine. For many years, humans have been performing such tasks – albeit they are error-prone. The question is can we train machines to do these tasks for us with a better accuracy? One such example of machine doing the classification is the email Client on your machine that classifies every incoming mail as “spam” or “not spam” and it does it with a fairly large accuracy. The statistical technique of logistic regression has been successfully applied in email client. In this case, we have trained our machine to solve a classification problem. Logistic Regression is just one part of machine learning used for solving this kind of binary classification problem. There are several other machine learning techniques that are already developed and are in practice for solving other kinds of problems. If you have noted, in all the above examples, the outcome of the predication has only two values – Yes or No. We call these as classes – so as to say we say that our classifier classifies the objects in two classes. In technical terms, we can say that the outcome or target variable is dichotomous in nature. There are other classification problems in which the output may be classified into more than two classes. For example, given a basket full of fruits, you are asked to separate fruits of different kinds. Now, the basket may contain Oranges, Apples, Mangoes, and so on. So when you separate out the fruits, you separate them out in more than two classes. This is a multivariate classification problem. Print Page Previous Next Advertisements ”;

Testing

Logistic Regression in Python – Testing ”; Previous Next We need to test the above created classifier before we put it into production use. If the testing reveals that the model does not meet the desired accuracy, we will have to go back in the above process, select another set of features (data fields), build the model again, and test it. This will be an iterative step until the classifier meets your requirement of desired accuracy. So let us test our classifier. Predicting Test Data To test the classifier, we use the test data generated in the earlier stage. We call the predict method on the created object and pass the X array of the test data as shown in the following command − In [24]: predicted_y = classifier.predict(X_test) This generates a single dimensional array for the entire training data set giving the prediction for each row in the X array. You can examine this array by using the following command − In [25]: predicted_y The following is the output upon the execution the above two commands − Out[25]: array([0, 0, 0, …, 0, 0, 0]) The output indicates that the first and last three customers are not the potential candidates for the Term Deposit. You can examine the entire array to sort out the potential customers. To do so, use the following Python code snippet − In [26]: for x in range(len(predicted_y)): if (predicted_y[x] == 1): print(x, end=”t”) The output of running the above code is shown below − The output shows the indexes of all rows who are probable candidates for subscribing to TD. You can now give this output to the bank’s marketing team who would pick up the contact details for each customer in the selected row and proceed with their job. Before we put this model into production, we need to verify the accuracy of prediction. Verifying Accuracy To test the accuracy of the model, use the score method on the classifier as shown below − In [27]: print(”Accuracy: {:.2f}”.format(classifier.score(X_test, Y_test))) The screen output of running this command is shown below − Accuracy: 0.90 It shows that the accuracy of our model is 90% which is considered very good in most of the applications. Thus, no further tuning is required. Now, our customer is ready to run the next campaign, get the list of potential customers and chase them for opening the TD with a probable high rate of success. Print Page Previous Next Advertisements ”;

Summary

Logistic Regression in Python – Summary ”; Previous Next Logistic Regression is a statistical technique of binary classification. In this tutorial, you learned how to train the machine to use logistic regression. Creating machine learning models, the most important requirement is the availability of the data. Without adequate and relevant data, you cannot simply make the machine to learn. Once you have data, your next major task is cleansing the data, eliminating the unwanted rows, fields, and select the appropriate fields for your model development. After this is done, you need to map the data into a format required by the classifier for its training. Thus, the data preparation is a major task in any machine learning application. Once you are ready with the data, you can select a particular type of classifier. In this tutorial, you learned how to use a logistic regression classifier provided in the sklearn library. To train the classifier, we use about 70% of the data for training the model. We use the rest of the data for testing. We test the accuracy of the model. If this is not within acceptable limits, we go back to selecting the new set of features. Once again, follow the entire process of preparing data, train the model, and test it, until you are satisfied with its accuracy. Before taking up any machine learning project, you must learn and have exposure to a wide variety of techniques which have been developed so far and which have been applied successfully in the industry. Print Page Previous Next Advertisements ”;

Quick Guide

Logistic Regression in Python – Quick Guide ”; Previous Next Logistic Regression in Python – Introduction Logistic Regression is a statistical method of classification of objects. This chapter will give an introduction to logistic regression with the help of some examples. Classification To understand logistic regression, you should know what classification means. Let us consider the following examples to understand this better − A doctor classifies the tumor as malignant or benign. A bank transaction may be fraudulent or genuine. For many years, humans have been performing such tasks – albeit they are error-prone. The question is can we train machines to do these tasks for us with a better accuracy? One such example of machine doing the classification is the email Client on your machine that classifies every incoming mail as “spam” or “not spam” and it does it with a fairly large accuracy. The statistical technique of logistic regression has been successfully applied in email client. In this case, we have trained our machine to solve a classification problem. Logistic Regression is just one part of machine learning used for solving this kind of binary classification problem. There are several other machine learning techniques that are already developed and are in practice for solving other kinds of problems. If you have noted, in all the above examples, the outcome of the predication has only two values – Yes or No. We call these as classes – so as to say we say that our classifier classifies the objects in two classes. In technical terms, we can say that the outcome or target variable is dichotomous in nature. There are other classification problems in which the output may be classified into more than two classes. For example, given a basket full of fruits, you are asked to separate fruits of different kinds. Now, the basket may contain Oranges, Apples, Mangoes, and so on. So when you separate out the fruits, you separate them out in more than two classes. This is a multivariate classification problem. Logistic Regression in Python – Case Study Consider that a bank approaches you to develop a machine learning application that will help them in identifying the potential clients who would open a Term Deposit (also called Fixed Deposit by some banks) with them. The bank regularly conducts a survey by means of telephonic calls or web forms to collect information about the potential clients. The survey is general in nature and is conducted over a very large audience out of which many may not be interested in dealing with this bank itself. Out of the rest, only a few may be interested in opening a Term Deposit. Others may be interested in other facilities offered by the bank. So the survey is not necessarily conducted for identifying the customers opening TDs. Your task is to identify all those customers with high probability of opening TD from the humongous survey data that the bank is going to share with you. Fortunately, one such kind of data is publicly available for those aspiring to develop machine learning models. This data was prepared by some students at UC Irvine with external funding. The database is available as a part of UCI Machine Learning Repository and is widely used by students, educators, and researchers all over the world. The data can be downloaded from here. In the next chapters, let us now perform the application development using the same data. Setting Up a Project In this chapter, we will understand the process involved in setting up a project to perform logistic regression in Python, in detail. Installing Jupyter We will be using Jupyter – one of the most widely used platforms for machine learning. If you do not have Jupyter installed on your machine, download it from here. For installation, you can follow the instructions on their site to install the platform. As the site suggests, you may prefer to use Anaconda Distribution which comes along with Python and many commonly used Python packages for scientific computing and data science. This will alleviate the need for installing these packages individually. After the successful installation of Jupyter, start a new project, your screen at this stage would look like the following ready to accept your code. Now, change the name of the project from Untitled1 to “Logistic Regression” by clicking the title name and editing it. First, we will be importing several Python packages that we will need in our code. Importing Python Packages For this purpose, type or cut-and-paste the following code in the code editor − In [1]: # import statements import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn import preprocessing from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split Your Notebook should look like the following at this stage − Run the code by clicking on the Run button. If no errors are generated, you have successfully installed Jupyter and are now ready for the rest of the development. The first three import statements import pandas, numpy and matplotlib.pyplot packages in our project. The next three statements import the specified modules from sklearn. Our next task is to download the data required for our project. We will learn this in the next chapter. Logistic Regression in Python – Getting Data The steps involved in getting data for performing logistic regression in Python are discussed in detail in this chapter. Downloading Dataset If you have not already downloaded the UCI dataset mentioned earlier, download it now from here. Click on the Data Folder. You will see the following screen − Download the bank.zip file by clicking on the given link. The zip file contains the following files − We will use the bank.csv file for our model development. The bank-names.txt file contains the description of the database that you are going to need later. The bank-full.csv contains a much larger dataset that you may use for more advanced developments. Here we have included the bank.csv file in the

Setting up a Project

Setting Up a Project ”; Previous Next In this chapter, we will understand the process involved in setting up a project to perform logistic regression in Python, in detail. Installing Jupyter We will be using Jupyter – one of the most widely used platforms for machine learning. If you do not have Jupyter installed on your machine, download it from here. For installation, you can follow the instructions on their site to install the platform. As the site suggests, you may prefer to use Anaconda Distribution which comes along with Python and many commonly used Python packages for scientific computing and data science. This will alleviate the need for installing these packages individually. After the successful installation of Jupyter, start a new project, your screen at this stage would look like the following ready to accept your code. Now, change the name of the project from Untitled1 to “Logistic Regression” by clicking the title name and editing it. First, we will be importing several Python packages that we will need in our code. Importing Python Packages For this purpose, type or cut-and-paste the following code in the code editor − In [1]: # import statements import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn import preprocessing from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split Your Notebook should look like the following at this stage − Run the code by clicking on the Run button. If no errors are generated, you have successfully installed Jupyter and are now ready for the rest of the development. The first three import statements import pandas, numpy and matplotlib.pyplot packages in our project. The next three statements import the specified modules from sklearn. Our next task is to download the data required for our project. We will learn this in the next chapter. Print Page Previous Next Advertisements ”;