Regression Algorithms – Overview ”; Previous Next Introduction to Regression Regression is another important and broadly used statistical and machine learning tool. The key objective of regression-based tasks is to predict output labels or responses which are continues numeric values, for the given input data. The output will be based on what the model has learned in training phase. Basically, regression models use the input data features (independent variables) and their corresponding continuous numeric output values (dependent or outcome variables) to learn specific association between inputs and corresponding outputs. Types of Regression Models Regression models are of following two types − Simple regression model − This is the most basic regression model in which predictions are formed from a single, univariate feature of the data. Multiple regression model − As name implies, in this regression model the predictions are formed from multiple features of the data. Building a Regressor in Python Regressor model in Python can be constructed just like we constructed the classifier. Scikit-learn, a Python library for machine learning can also be used to build a regressor in Python. In the following example, we will be building basic regression model that will fit a line to the data i.e. linear regressor. The necessary steps for building a regressor in Python are as follows − Step 1: Importing necessary python package For building a regressor using scikit-learn, we need to import it along with other necessary packages. We can import the by using following script − import numpy as np from sklearn import linear_model import sklearn.metrics as sm import matplotlib.pyplot as plt Step 2: Importing dataset After importing necessary package, we need a dataset to build regression prediction model. We can import it from sklearn dataset or can use other one as per our requirement. We are going to use our saved input data. We can import it with the help of following script − input = r”C:linear.txt” Next, we need to load this data. We are using np.loadtxt function to load it. input_data = np.loadtxt(input, delimiter=”,”) X, y = input_data[:, :-1], input_data[:, -1] Step 3: Organizing data into training & testing sets As we need to test our model on unseen data hence, we will divide our dataset into two parts: a training set and a test set. The following command will perform it − training_samples = int(0.6 * len(X)) testing_samples = len(X) – num_training X_train, y_train = X[:training_samples], y[:training_samples] X_test, y_test = X[training_samples:], y[training_samples:] Step 4: Model evaluation & prediction After dividing the data into training and testing we need to build the model. We will be using LineaRegression() function of Scikit-learn for this purpose. Following command will create a linear regressor object. reg_linear= linear_model.LinearRegression() Next, train this model with the training samples as follows − reg_linear.fit(X_train, y_train) Now, at last we need to do the prediction with the testing data. y_test_pred = reg_linear.predict(X_test) Step 5: Plot & visualization After prediction, we can plot and visualize it with the help of following script − Example plt.scatter(X_test, y_test, color=”red”) plt.plot(X_test, y_test_pred, color=”black”, linewidth=2) plt.xticks(()) plt.yticks(()) plt.show() Output In the above output, we can see the regression line between the data points. Step 6: Performance computation We can also compute the performance of our regression model with the help of various performance metrics as follows − Example print(“Regressor model performance:”) print(“Mean absolute error(MAE) =”, round(sm.mean_absolute_error(y_test, y_test_pred), 2)) print(“Mean squared error(MSE) =”, round(sm.mean_squared_error(y_test, y_test_pred), 2)) print(“Median absolute error =”, round(sm.median_absolute_error(y_test, y_test_pred), 2)) print(“Explain variance score =”, round(sm.explained_variance_score(y_test, y_test_pred), 2)) print(“R2 score =”, round(sm.r2_score(y_test, y_test_pred), 2)) Output Regressor model performance: Mean absolute error(MAE) = 1.78 Mean squared error(MSE) = 3.89 Median absolute error = 2.01 Explain variance score = -0.09 R2 score = -0.09 Types of ML Regression Algorithms The most useful and popular ML regression algorithm is Linear regression algorithm which further divided into two types namely − Simple Linear Regression algorithm Multiple Linear Regression algorithm. We will discuss about it and implement it in Python in the next chapter. Applications The applications of ML regression algorithms are as follows − Forecasting or Predictive analysis − One of the important uses of regression is forecasting or predictive analysis. For example, we can forecast GDP, oil prices or in simple words the quantitative data that changes with the passage of time. Optimization − We can optimize business processes with the help of regression. For example, a store manager can create a statistical model to understand the peek time of coming of customers. Error correction − In business, taking correct decision is equally important as optimizing the business process. Regression can help us to take correct decision as well in correcting the already implemented decision. Economics − It is the most used tool in economics. We can use regression to predict supply, demand, consumption, inventory investment etc. Finance − A financial company is always interested in minimizing the risk portfolio and want to know the factors that affects the customers. All these can be predicted with the help of regression model. Print Page Previous Next Advertisements ”;
Category: Machine Learning
Logistic Regression
Classification Algorithms – Logistic Regression ”; Previous Next Introduction to Logistic Regression Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable. The nature of target or dependent variable is dichotomous, which means there would be only two possible classes. In simple words, the dependent variable is binary in nature having data coded as either 1 (stands for success/yes) or 0 (stands for failure/no). Mathematically, a logistic regression model predicts P(Y=1) as a function of X. It is one of the simplest ML algorithms that can be used for various classification problems such as spam detection, Diabetes prediction, cancer detection etc. Types of Logistic Regression Generally, logistic regression means binary logistic regression having binary target variables, but there can be two more categories of target variables that can be predicted by it. Based on those number of categories, Logistic regression can be divided into following types − Binary or Binomial In such a kind of classification, a dependent variable will have only two possible types either 1 and 0. For example, these variables may represent success or failure, yes or no, win or loss etc. Multinomial In such a kind of classification, dependent variable can have 3 or more possible unordered types or the types having no quantitative significance. For example, these variables may represent “Type A” or “Type B” or “Type C”. Ordinal In such a kind of classification, dependent variable can have 3 or more possible ordered types or the types having a quantitative significance. For example, these variables may represent “poor” or “good”, “very good”, “Excellent” and each category can have the scores like 0,1,2,3. Logistic Regression Assumptions Before diving into the implementation of logistic regression, we must be aware of the following assumptions about the same − In case of binary logistic regression, the target variables must be binary always and the desired outcome is represented by the factor level 1. There should not be any multi-collinearity in the model, which means the independent variables must be independent of each other . We must include meaningful variables in our model. We should choose a large sample size for logistic regression. Binary Logistic Regression model The simplest form of logistic regression is binary or binomial logistic regression in which the target or dependent variable can have only 2 possible types either 1 or 0. It allows us to model a relationship between multiple predictor variables and a binary/binomial target variable. In case of logistic regression, the linear function is basically used as an input to another function such as 𝑔 in the following relation − $$h_{theta}{(x)}=g(theta^{T}x)𝑤ℎ𝑒𝑟𝑒 0≤h_{theta}≤1$$ Here, 𝑔 is the logistic or sigmoid function which can be given as follows − $$g(z)= frac{1}{1+e^{-z}}𝑤ℎ𝑒𝑟𝑒 𝑧=theta ^{T}𝑥$$ To sigmoid curve can be represented with the help of following graph. We can see the values of y-axis lie between 0 and 1 and crosses the axis at 0.5. The classes can be divided into positive or negative. The output comes under the probability of positive class if it lies between 0 and 1. For our implementation, we are interpreting the output of hypothesis function as positive if it is ≥0.5, otherwise negative. We also need to define a loss function to measure how well the algorithm performs using the weights on functions, represented by theta as follows − ℎ=𝑔(𝑋𝜃) $$J(theta) = frac{1}{m}.(-y^{T}log(h) – (1 -y)^Tlog(1-h))$$ Now, after defining the loss function our prime goal is to minimize the loss function. It can be done with the help of fitting the weights which means by increasing or decreasing the weights. With the help of derivatives of the loss function w.r.t each weight, we would be able to know what parameters should have high weight and what should have smaller weight. The following gradient descent equation tells us how loss would change if we modified the parameters − $$frac{𝛿𝐽(𝜃)}{𝛿theta_{j}}=frac{1}{m}X^{T}(𝑔(𝑋𝜃)−𝑦)$$ Implementation in Python Now we will implement the above concept of binomial logistic regression in Python. For this purpose, we are using a multivariate flower dataset named ‘iris’ which have 3 classes of 50 instances each, but we will be using the first two feature columns. Every class represents a type of iris flower. First, we need to import the necessary libraries as follows − import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn import datasets Next, load the iris dataset as follows − iris = datasets.load_iris() X = iris.data[:, :2] y = (iris.target != 0) * 1 We can plot our training data s follows − plt.figure(figsize=(6, 6)) plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color=”g”, label=”0”) plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color=”y”, label=”1”) plt.legend(); Next, we will define sigmoid function, loss function and gradient descend as follows − class LogisticRegression: def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, verbose=False): self.lr = lr self.num_iter = num_iter self.fit_intercept = fit_intercept self.verbose = verbose def __add_intercept(self, X): intercept = np.ones((X.shape[0], 1)) return np.concatenate((intercept, X), axis=1) def __sigmoid(self, z): return 1 / (1 + np.exp(-z)) def __loss(self, h, y): return (-y * np.log(h) – (1 – y) * np.log(1 – h)).mean() def fit(self, X, y): if self.fit_intercept: X = self.__add_intercept(X) Now, initialize the weights as follows − self.theta = np.zeros(X.shape[1]) for i in range(self.num_iter): z = np.dot(X, self.theta) h = self.__sigmoid(z) gradient = np.dot(X.T, (h – y)) / y.size self.theta -= self.lr * gradient z = np.dot(X, self.theta) h = self.__sigmoid(z) loss = self.__loss(h, y) if(self.verbose ==True and i % 10000 == 0): print(f”loss: {loss} t”) With the help of the following script, we can predict the output probabilities − def predict_prob(self, X): if self.fit_intercept: X = self.__add_intercept(X) return self.__sigmoid(np.dot(X, self.theta)) def predict(self, X):
Home
Machine Learning with Python Tutorial PDF Version Quick Guide Resources Job Search Discussion Machine Learning with Python Tutorial Machine Learning (ML) is basically that field of computer science with the help of which computer systems can provide sense to data in much the same way as human beings do. In simple words, ML is a type of artificial intelligence that extract patterns out of raw data by using an algorithm or method. The key focus of ML is to allow computer systems to learn from experience without being explicitly programmed or human intervention. Audience This tutorial will be useful for graduates, postgraduates, and research students who either have an interest in this subject or have this subject as a part of their curriculum. The reader can be a beginner or an advanced learner. This tutorial has been prepared for the students as well as professionals to ramp up quickly. This tutorial is a stepping stone to your Machine Learning journey. Prerequisites The reader must have basic knowledge of Artificial Intelligence. They should have a good knowledge of Python and some of its libraries such as NumPy, Pandas, Scikit-learn, Scipy and Matplotlib for effective data manipulation and analysis. In addition, the readers should have a strong understanding of the fundamental concepts in mathematics including calculus, linear algebra, probability, statistics, algorithms and data structures. If you are new to any of these concepts, we recommend you to take up tutorials concerning these topics, before you dig further into this tutorial Frequently Asked Questions about ML with Python There are some very Frequently Asked Questions(FAQ) about ML with Python. In this section, we will have some of these FAQs answered − What is Machine Learning? Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on developing algorithms that improve automatically through experience and by using the hidden patterns of the data. In simple terms, ML enables computers to learn from data and make predictions or decisions without being explicitly programmed. This capability allows computers to automate tasks and solve complex problems across different fields. Why is Machine Learning Important? The amount of data generated by businesses and individuals continues to grow at an exponential rate. Machine learning has become an important topic as it revolutionizes how computers process and interpret data. ML empowers computers to learn from data, enhancing accuracy and efficiency in various tasks. It enables data-driven decision-making and boosts productivity. What are the different types of Machine Learning? Different types of Machine Learning include − Supervised Learning − In supervised learning, the algorithm is trained on labeled data i.e., the correct answer or output is provided for each input. Unsupervised Learning − In unsupervised learning, the algorithm is trained on unlabeled data i.e., the correct output or answer is not provided for each input. Reinforcement Learning − In reinforcement learning, the algorithm learns by receiving feedback in the form of rewards or punishments based on its actions. Semi-supervised Learning − In semi-supervised learning, the algorithm is trained on combined labeled and unlabeled data. What are some common applications of Machine Learning? Some of the common applications of Machine Learning include − Recommendation systems for personalized content. Image and speech recognition for authentication and security. Natural language processing for sentiment analysis and chatbots. Predictive analytics for forecasting sales and trends. Autonomous vehicles for navigation and decision-making. Fraud detection in the banking sector and finance. Medical diagnosis and healthcare management. Virtual assistants for customer service and support. What are the basic components of a Machine Learning system? The basic components of a Machine Learning system − Data − It is the raw information used to train and test the model. Model − It is a mathematical representation that learns from the input data. Features − These are the input variables or attributes used by the model to make predictions. Training − Process of feeding data into the model to make accurate predictions by adjusting its internal parameters. Evaluation − Process of assessing the performance of model on separate dataset. Prediction − Process of using the trained model to make predictions on new data. What programming languages are commonly used in Machine Learning? Some of the commonly used programming languages in Machine Learning include Python, R, Java, C++, Julia, and JavaScript. Python, due to its simplicity and extensive libraries like TensorFlow, Keras, Scikit-learn, and OpenCV is the preferred choice for both beginners as well as experts in the field of machine learning. What is the difference between supervised and unsupervised learning? In supervised learning, an algorithm is trained using the labeled data to find the relationship between the input variables and the desired output. On the other hand, in unsupervised learning, an algorithm is trained using unlabeled data to find the structure and patterns from the input data. Supervised learning can be used for classification and regression while unsupervised learning can be used for clustering and dimensionality reduction. What are some popular algorithms used in Machine Learning? Here is a list of some popular algorithms used in Machine Learning − Linear Regression Logistic Regression Decision Trees Random Forests Support Vector Machines (SVM) k-Nearest Neighbors (k-NN) Naive Bayes Gradient Boosting Machines (GBM) K-Means Clustering Hierarchical Clustering How do I evaluate the performance of a Machine Learning model? For classification tasks, we can evaluate the performance of a Machine Learning model using various metrics such as accuracy,
Improving Performance of ML Model (Contdâ¦) ”; Previous Next Performance Improvement with Algorithm Tuning As we know that ML models are parameterized in such a way that their behavior can be adjusted for a specific problem. Algorithm tuning means finding the best combination of these parameters so that the performance of ML model can be improved. This process sometimes called hyperparameter optimization and the parameters of algorithm itself are called hyperparameters and coefficients found by ML algorithm are called parameters. Here, we are going to discuss about some methods for algorithm parameter tuning provided by Python Scikit-learn. Grid Search Parameter Tuning It is a parameter tuning approach. The key point of working of this method is that it builds and evaluate the model methodically for every possible combination of algorithm parameter specified in a grid. Hence, we can say that this algorithm is having search nature. Example In the following Python recipe, we are going to perform grid search by using GridSearchCV class of sklearn for evaluating various alpha values for the Ridge Regression algorithm on Pima Indians diabetes dataset. First, import the required packages as follows − import numpy from pandas import read_csv from sklearn.linear_model import Ridge from sklearn.model_selection import GridSearchCV Now, we need to load the Pima diabetes dataset as did in previous examples − path = r”C:pima-indians-diabetes.csv” headernames = [”preg”, ”plas”, ”pres”, ”skin”, ”test”, ”mass”, ”pedi”, ”age”, ”class”] data = read_csv(path, names=headernames) array = data.values X = array[:,0:8] Y = array[:,8] Next, evaluate the various alpha values as follows − alphas = numpy.array([1,0.1,0.01,0.001,0.0001,0]) param_grid = dict(alpha=alphas) Now, we need to apply grid search on our model − model = Ridge() grid = GridSearchCV(estimator=model, param_grid=param_grid) grid.fit(X, Y) Print the result with following script line − print(grid.best_score_) print(grid.best_estimator_.alpha) Output 0.2796175593129722 1.0 The above output gives us the optimal score and the set of parameters in the grid that achieved that score. The alpha value in this case is 1.0. Random Search Parameter Tuning It is a parameter tuning approach. The key point of working of this method is that it samples the algorithm parameters from a random distribution for a fixed number of iterations. Example In the following Python recipe, we are going to perform random search by using RandomizedSearchCV class of sklearn for evaluating different alpha values between 0 and 1 for the Ridge Regression algorithm on Pima Indians diabetes dataset. First, import the required packages as follows − import numpy from pandas import read_csv from scipy.stats import uniform from sklearn.linear_model import Ridge from sklearn.model_selection import RandomizedSearchCV Now, we need to load the Pima diabetes dataset as did in previous examples − path = r”C:pima-indians-diabetes.csv” headernames = [”preg”, ”plas”, ”pres”, ”skin”, ”test”, ”mass”, ”pedi”, ”age”, ”class”] data = read_csv(path, names=headernames) array = data.values X = array[:,0:8] Y = array[:,8] Next, evaluate the various alpha values on Ridge regression algorithm as follows − param_grid = {”alpha”: uniform()} model = Ridge() random_search = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=50, random_state=7) random_search.fit(X, Y) Print the result with following script line − print(random_search.best_score_) print(random_search.best_estimator_.alpha) Output 0.27961712703051084 0.9779895119966027 The above output gives us the optimal score just similar to the grid search. Print Page Previous Next Advertisements ”;
Basics
Machine Learning – Basic Concepts ”; Previous Next Machine learning, as we know, is a subset of artificial intelligence that involves training computer algorithms to automatically learn patterns and relationships in data. Here are some basic concepts of machine learning − Data Data is the foundation of machine learning. Without data, there would be nothing for the algorithm to learn from. Data can come in many forms, including structured data (such as spreadsheets and databases) and unstructured data (such as text and images). The quality and quantity of the data used to train the machine learning algorithm are crucial factors that can significantly impact its performance. Feature In machine learning, features are the variables or attributes used to describe the input data. The goal is to select the most relevant and informative features that will allow the algorithm to make accurate predictions or decisions. Feature selection is a crucial step in the machine learning process because the performance of the algorithm is heavily dependent on the quality and relevance of the features used. Model A machine learning model is a mathematical representation of the relationship between the input data (features) and the output (predictions or decisions). The model is created using a training dataset and then evaluated using a separate validation dataset. The goal is to create a model that can accurately generalize to new, unseen data. Training Training is the process of teaching the machine learning algorithm to make accurate predictions or decisions. This is done by providing the algorithm with a large dataset and allowing it to learn from the patterns and relationships in the data. During training, the algorithm adjusts its internal parameters to minimize the difference between its predicted output and the actual output. Testing Testing is the process of evaluating the performance of the machine learning algorithm on a separate dataset that it has not seen before. The goal is to determine how well the algorithm generalizes to new, unseen data. If the algorithm performs well on the testing dataset, it is considered to be a successful model. Overfitting Overfitting occurs when a machine learning model is too complex and fits the training data too closely. This can lead to poor performance on new, unseen data because the model is too specialized to the training dataset. To prevent overfitting, it is important to use a validation dataset to evaluate the model”s performance and to use regularization techniques to simplify the model. Underfitting Underfitting occurs when a machine learning model is too simple and cannot capture the patterns and relationships in the data. This can lead to poor performance on both the training and testing datasets. To prevent underfitting, we can use several techniques such as increasing model complexity, collect more data, reduce regularization, and feature engineering. It is important to note that preventing underfitting is a balancing act between model complexity and the amount of data available. Increasing model complexity can help prevent underfitting, but if there is not enough data to support the increased complexity, overfitting may occur instead. Therefore, it is important to monitor the model”s performance and adjust the complexity as necessary. Why & When to Make Machines Learn? We have already discussed the need for machine learning, but another question arises that in what scenarios we must make the machine learn? There can be several circumstances where we need machines to take data-driven decisions with efficiency and at a huge scale. The followings are some of such circumstances where making machines learn would be more effective − Lack of human expertise The very first scenario in which we want a machine to learn and take data-driven decisions, can be the domain where there is a lack of human expertise. The examples can be navigations in unknown territories or spatial planets. Dynamic scenarios There are some scenarios which are dynamic in nature i.e. they keep changing over time. In case of these scenarios and behaviors, we want a machine to learn and take data-driven decisions. Some of the examples can be network connectivity and availability of infrastructure in an organization. Difficulty in translating expertise into computational tasks There can be various domains in which humans have their expertise,; however, they are unable to translate this expertise into computational tasks. In such circumstances we want machine learning. The examples can be the domains of speech recognition, cognitive tasks etc. Machine Learning Model Before discussing the machine learning model, we must need to understand the following formal definition of ML given by professor Mitchell − “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” The above definition is basically focusing on three parameters, also the main components of any learning algorithm, namely Task(T), Performance(P) and experience (E). In this context, we can simplify this definition as − ML is a field of AI consisting of learning algorithms that − Improve their performance (P) At executing some task (T) Over time with experience (E) Based on the above, the following diagram represents a Machine Learning Model − Let us discuss them more in detail now − Task(T) From the perspective of problem, we may define the task T as the real-world problem to be solved. The problem can be anything like finding best house price in a specific location or to find best marketing strategy etc. On the other hand, if we talk about machine learning, the definition of task is different because it is difficult to solve ML based tasks by conventional programming approach. A task T is said to be a ML based task when it is based on
Naïve Bayes
Classification Algorithms – Naïve Bayes ”; Previous Next Introduction to Naïve Bayes Algorithm Naïve Bayes algorithms is a classification technique based on applying Bayesâ theorem with a strong assumption that all the predictors are independent to each other. In simple words, the assumption is that the presence of a feature in a class is independent to the presence of any other feature in the same class. For example, a phone may be considered as smart if it is having touch screen, internet facility, good camera etc. Though all these features are dependent on each other, they contribute independently to the probability of that the phone is a smart phone. In Bayesian classification, the main interest is to find the posterior probabilities i.e. the probability of a label given some observed features, ð(ð¿ | ðððð¡ð¢ððð ). With the help of Bayes theorem, we can express this in quantitative form as follows − $$P(L |features)= frac{P(L)P(features |L)}{ð(ðððð¡ð¢ððð )}$$ Here, ð(ð¿ | ðððð¡ð¢ððð ) is the posterior probability of class. ð(ð¿) is the prior probability of class. ð(ðððð¡ð¢ððð | ð¿) is the likelihood which is the probability of predictor given class. ð(ðððð¡ð¢ððð ) is the prior probability of predictor. Building model using Naïve Bayes in Python Python library, Scikit learn is the most useful library that helps us to build a Naïve Bayes model in Python. We have the following three types of Naïve Bayes model under Scikit learn Python library − Gaussian Naïve Bayes It is the simplest Naïve Bayes classifier having the assumption that the data from each label is drawn from a simple Gaussian distribution. Multinomial Naïve Bayes Another useful Naïve Bayes classifier is Multinomial Naïve Bayes in which the features are assumed to be drawn from a simple Multinomial distribution. Such kind of Naïve Bayes are most appropriate for the features that represents discrete counts. Bernoulli Naïve Bayes Another important model is Bernoulli Naïve Bayes in which features are assumed to be binary (0s and 1s). Text classification with âbag of wordsâ model can be an application of Bernoulli Naïve Bayes. Example Depending on our data set, we can choose any of the Naïve Bayes model explained above. Here, we are implementing Gaussian Naïve Bayes model in Python − We will start with required imports as follows − import numpy as np import matplotlib.pyplot as plt import seaborn as sns; sns.set() Now, by using make_blobs() function of Scikit learn, we can generate blobs of points with Gaussian distribution as follows − from sklearn.datasets import make_blobs X, y = make_blobs(300, 2, centers=2, random_state=2, cluster_std=1.5) plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=”summer”); Next, for using GaussianNB model, we need to import and make its object as follows − from sklearn.naive_bayes import GaussianNB model_GBN = GaussianNB() model_GNB.fit(X, y); Now, we have to do prediction. It can be done after generating some new data as follows − rng = np.random.RandomState(0) Xnew = [-6, -14] + [14, 18] * rng.rand(2000, 2) ynew = model_GNB.predict(Xnew) Next, we are plotting new data to find its boundaries − plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=”summer”) lim = plt.axis() plt.scatter(Xnew[:, 0], Xnew[:, 1], c=ynew, s=20, cmap=”summer”, alpha=0.1) plt.axis(lim); Now, with the help of following line of codes, we can find the posterior probabilities of first and second label − yprob = model_GNB.predict_proba(Xnew) yprob[-10:].round(3) Output array([[0.998, 0.002], [1. , 0. ], [0.987, 0.013], [1. , 0. ], [1. , 0. ], [1. , 0. ], [1. , 0. ], [1. , 0. ], [0. , 1. ], [0.986, 0.014]] ) Pros & Cons Pros The followings are some pros of using Naïve Bayes classifiers − Naïve Bayes classification is easy to implement and fast. It will converge faster than discriminative models like logistic regression. It requires less training data. It is highly scalable in nature, or they scale linearly with the number of predictors and data points. It can make probabilistic predictions and can handle continuous as well as discrete data. Naïve Bayes classification algorithm can be used for binary as well as multi-class classification problems both. Cons The followings are some cons of using Naïve Bayes classifiers − One of the most important cons of Naïve Bayes classification is its strong feature independence because in real life it is almost impossible to have a set of features which are completely independent of each other. Another issue with Naïve Bayes classification is its âzero frequencyâ which means that if a categorial variable has a category but not being observed in training data set, then Naïve Bayes model will assign a zero probability to it and it will be unable to make a prediction. Applications of Naïve Bayes classification The following are some common applications of Naïve Bayes classification − Real-time prediction − Due to its ease of implementation and fast computation, it can be used to do prediction in real-time. Multi-class prediction − Naïve Bayes classification algorithm can be used to predict posterior probability of multiple classes of target variable. Text classification − Due to the feature of multi-class prediction, Naïve Bayes classification algorithms are well suited for text classification. That is why it is also used to solve problems like spam-filtering and sentiment analysis. Recommendation system − Along with the algorithms like collaborative filtering, Naïve Bayes makes a Recommendation system which can be used to filter unseen information and to predict weather a user would like the given resource or not. Print Page Previous Next Advertisements ”;
Machine Learning – Statistics ”; Previous Next Statistics is a crucial tool in machine learning because it helps us understand the underlying patterns in the data. It provides us with methods to describe, summarize, and analyze data. Let”s see some of the basics of statistics for machine learning. Descriptive Statistics Descriptive statistics is a branch of statistics that deals with the summary and analysis of data. It includes measures such as mean, median, mode, variance, and standard deviation. These measures help us understand the central tendency, variability, and distribution of the data. In machine learning, descriptive statistics can be used to summarize the data, identify outliers, and detect patterns. For example, we can use the mean and standard deviation to describe the distribution of a dataset. In Python, we can calculate descriptive statistics using libraries such as NumPy and Pandas. Below is an example − Example import numpy as np import pandas as pd data = np.array([1, 2, 3, 4, 5]) df = pd.DataFrame(data, columns=[“Values”]) print(df.describe()) Output This will output a summary of the dataset, including the count, mean, standard deviation, minimum, and maximum values as follows − Values count 5.000000 mean 3.000000 std 1.581139 min 1.000000 25% 2.000000 50% 3.000000 75% 4.000000 max 5.000000 Inferential Statistics Inferential statistics is a branch of statistics that deals with making predictions and inferences about a population based on a sample of data. It involves using hypothesis testing, confidence intervals, and regression analysis to draw conclusions about the data. In machine learning, inferential statistics can be used to make predictions about new data based on existing data. For example, we can use regression analysis to predict the price of a house based on its features, such as the number of bedrooms and bathrooms. In Python, we can perform inferential statistics using libraries such as Scikit-Learn and StatsModels. Below is an example − Example import statsmodels.api as sm import numpy as np X = np.array([1, 2, 3, 4, 5]) y = np.array([2, 4, 6, 8, 10]) X = sm.add_constant(X) model = sm.OLS(y, X).fit() print(model.summary()) Output This will output a summary of the regression model, including the coefficients, standard errors, t-statistics, and p-values as follows − In the next chapter, we will discuss various descriptive and inferential statistics measures, which are commonly used in machine learning, in detail along with Python implementation example. Print Page Previous Next Advertisements ”;
Methods for Machine Learning
Machine Learning – Models ”; Previous Next There are various Machine Learning algorithms, techniques and methods that can be used to build models for solving real-life problems by using data. In this chapter, we are going to discuss such different kinds of methods. There are four main types of machine learning methods classified based on human supervision − Supervised Learning Unsupervised Learning Semi-supervised Learning Reinforcement Learning In the next four chapters, we will discuss each of these machine learning models in detail. Here, let”s have a brief overview of these methods: Supervised Learning Supervised learning algorithms or methods are the most commonly used ML algorithms. This method or learning algorithm takes the data sample i.e. the training data and its associated output i.e. labels or responses with each data sample during the training process. The main objective of supervised learning algorithms is to learn an association between input data samples and corresponding outputs after performing multiple training data instances. For example, we have x: Input variables and Y: Output variable Now, apply an algorithm to learn the mapping function from the input to output as follows − Y=f(x) Now, the main objective would be to approximate the mapping function so well that even when we have new input data (x), we can easily predict the output variable (Y) for that new input data. It is called supervised because the whole process of learning can be thought as it is being supervised by a teacher or supervisor. Examples of supervised machine learning algorithms includes Decision tree, Random Forest, KNN, Logistic Regression etc. Based on the ML tasks, supervised learning algorithms can be divided into the following two broad classes − Classification Regression Classification The key objective of classification-based tasks is to predict categorial output labels or responses for the given input data. The output will be based on what the model has learned in the training phase. As we know the categorial output responses means unordered and discrete values, hence each output response will belong to a specific class or category. We will discuss Classification and associated algorithms in detail in the upcoming chapters also. Classification Models Followings are some common classification models − Logistic Regression Decision Trees Random Forest K-nearest Neighbor Support Vector Machine Naive Bayes Linear Discriminant Analysis Neural Networks Regression The key objective of regression-based tasks is to predict output labels or responses, which are continuous numeric values, for the given input data. The output will be based on what the model has learned in its training phase. Basically, regression models use the input data features (independent variables) and their corresponding continuous numeric output values (dependent or outcome variables) to learn specific associations between inputs and corresponding outputs. We will discuss regression and associated algorithms in detail in further chapters. Regression Models Followings are some common regression models − Linear Regression Ridge regression Decision Trees Random Forest K-nearest Neighbor Neural Network Regression Unsupervised Learning As the name suggests, unsupervised learning is opposite to supervised ML methods or algorithms in which we do not have any supervisor to provide any sort of guidance. Unsupervised learning algorithms are handy in the scenario in which we do not have the liberty, like in supervised learning algorithms, of having pre-labeled training data and we want to extract useful pattern from input data. For example, it can be understood as follows − Suppose we have − x: Input variables, then there would be no corresponding output variable and the algorithms need to discover the interesting pattern in data for learning. Examples of unsupervised machine learning algorithms includes K-means clustering, K-nearest neighbors etc. Based on the ML tasks, unsupervised learning algorithms can be divided into the following broad classes − Clustering Association Dimensionality Reduction Clustering Clustering methods are one of the most useful unsupervised ML methods. These algorithms used to find similarity as well as relationship patterns among data samples and then cluster those samples into groups having similarity based on features. The real-world example of clustering is to group the customers by their purchasing behavior. Clustering Models Followings are some common clustering models − K-Means Clustering Hierarchical Clustering Mean-shift Clustering DBSCAN Clustering HDBSCAN Clustering BIRCH Clustering Affinity Propagation Agglomerative Clustering Association Another useful unsupervised ML method is Association which is used to analyze large dataset to find patterns which further represents the interesting relationships between various items. It is also termed as Association Rule Mining or Market basket analysis which is mainly used to analyze customer shopping patterns. Association Models Followings are some common association models − Apriori Algorithm Eclat algorithm FP-growth algorithm Dimensionality Reduction This unsupervised ML method is used to reduce the number of feature variables for each data sample by selecting set of principal or representative features. A question arises here is that why we need to reduce the dimensionality? The reason behind is the problem of feature space complexity which arises when we start analyzing and extracting millions of features from data samples. This problem generally refers to “curse of dimensionality”. PCA (Principal Component Analysis), K-nearest neighbors and discriminant analysis are some of the popular algorithms for this purpose. Dimensionality Reduction Models Followings are some common dimensionality Reduction models − Principal Component Analysis(PCA) Autoencoders Singular value decomposition (SVD) Anomaly Detection This unsupervised ML method is used to find out the occurrences of rare events or observations that generally do not occur. By using the learned knowledge, anomaly
Python Ecosystem
Machine Learning – Ecosystem ”; Previous Next Python has become one of the most popular programming languages for machine learning due to its simplicity, versatility, and extensive ecosystem of libraries and tools. There are various programming languages such as Java, C++, Lisp, Julia, Python, etc., that can be used in machine learning. Among them, Python programming language has gained a huge popularity. Here, we will explore the Python ecosystem for machine learning and highlight some of the most popular libraries and frameworks. Python Machine Learning Ecosystem The machine learning ecosystem refers to the collection of tools and technologies that are used to develop the machine learning applications. Python provides various libraries and tools that form the components of Python machine learning ecosystem. These useful components make Python an important language for Machine Learning & Data Science. Though there are many such components, let us discuss some of the importance components of Python ecosystem here − Programming Language: Python Integrated Development Environment Python Libraries Programming Language: Python The programming languages such are the important components of any development ecosystem. Python programming language is extensively used in machine learning and data science. Let”s discuss why Python is the best choice for machine learning. Why Python for Machine Learning? According to Stack OverFlow Developer Survey 2023, Python is third most popular programming language as well as the most popular language for machine learning and data science. The following are the features of Python that makes it the preferred choice of language for data science − Extensive set of packages Python has an extensive and powerful set of packages which are ready to be used in various domains. It also has packages like numpy, scipy, pandas, scikit-learn etc. which are required for machine learning and data science. Easy prototyping Another important feature of Python that makes it the choice of language for data science is the easy and fast prototyping. This feature is useful for developing new algorithm. Collaboration feature The field of data science basically needs good collaboration and Python provides many useful tools that make this extremely. One language for many domains A typical data science project includes various domains like data extraction, data manipulation, data analysis, feature extraction, modelling, evaluation, deployment and updating the solution. As Python is a multi-purpose language, it allows the data scientist to address all these domains from a common platform. Strengths and Weaknesses of Python Every programming language has some strengths as well as weaknesses, so does Python too. Strengths According to studies and surveys, Python is the fifth most important language as well as the most popular language for machine learning and data science. It is because of the following strengths that Python has − Easy to learn and understand − The syntax of Python is simpler; hence it is relatively easy, even for beginners also, to learn and understand the language. Multi-purpose language − Python is a multi-purpose programming language because it supports structured programming, object-oriented programming as well as functional programming. Huge number of modules − Python has huge number of modules for covering every aspect of programming. These modules are easily available for use hence making Python an extensible language. Support of open source community − As being open source programming language, Python is supported by a very large developer community. Due to this, the bugs are easily fixed by the Python community. This characteristic makes Python very robust and adaptive. Scalability − Python is a scalable programming language because it provides an improved structure for supporting large programs than shell-scripts. Weakness Although Python is a popular and powerful programming language, it has its own weakness of slow execution speed. The execution speed of Python is slow as compared to compiled languages because Python is an interpreted language. This can be the major area of improvement for Python community. Installing Python For working in Python, we must first have to install it. You can perform the installation of Python in any of the following two ways − Installing Python individually Using Pre-packaged Python distribution − Anaconda Let us discuss these each in detail. Installing Python Individually If you want to install Python on your computer, then then you need to download only the binary code applicable for your platform. Python distribution is available for Windows, Linux and Mac platforms. The following is a quick overview of installing Python on the above-mentioned platforms − On Unix and Linux platform With the help of following steps, we can install Python on Unix and Linux platform − First, go to www.python.org/downloads/. Next, click on the link to download zipped source code available for Unix/Linux. Now, Download and extract files. Next, we can edit the Modules/Setup file if we want to customize some options. Next, write the command run ./configure script make make install On Windows platform With the help of following steps, we can install Python on Windows platform − First, go to www.python.org/downloads/. Next, click on the link for Windows installer python-XYZ.msi file. Here XYZ is the version we wish to install. Now, we must run the file that is downloaded. It will take us to the Python install wizard, which is easy to use. Now, accept the default settings and wait until the install is finished. On Macintosh platform For Mac OS X, Homebrew, a great and easy to use package installer is recommended to install Python 3. In case if you don”t have Homebrew, you can install it with the help of following command − $ ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)” It can be updated with
Discussion
Discuss Logistic Regression in Python ”; Previous Next Logistic Regression is a statistical method of classification of objects. In this tutorial, we will focus on solving binary classification problem using logistic regression technique. This tutorial also presents a case study that will let you learn how to code and apply Logistic Regression in Python. Print Page Previous Next Advertisements ”;