Machine Learning – Supervised Supervised learning algorithms or methods are the most commonly used ML algorithms. This method or learning algorithm take the data sample i.e. training data and associated output i.e. labels or responses with each data samples during the training process. The main objective of supervised learning algorithms is to learn an association between input data samples and corresponding outputs after performing multiple training data instances. For example, we have − x − Input variables and Y − Output variable Now, apply an algorithm to learn the mapping function from the input to output as follows − Y=f(x) Now, the main objective would be to approximate the mapping function so well that even when we have new input data (x), we can easily predict the output variable (Y) for that new input data. It is called supervised because the whole process of learning can be thought as it is being supervised by a teacher or supervisor. Examples of supervised machine learning algorithms includes Decision tree, Random Forest, KNN, Logistic Regression etc. Based on the ML tasks, supervised learning algorithms can be divided into two broad classes − Classification and Regression. Classification The key objective of classification-based tasks is to predict categorial output labels or responses for the given input data. The output will be based on what the model has learned in its training phase. As we know that the categorial output responses means unordered and discrete values, hence each output response will belong to a specific class or category. We will discuss Classification and associated algorithms in detail in further chapters also. Regression The key objective of regression-based tasks is to predict output labels or responses which are continues numeric values, for the given input data. The output will be based on what the model has learned in training phase. Basically, regression models use the input data features (independent variables) and their corresponding continuous numeric output values (dependent or outcome variables) to learn specific association between inputs and corresponding outputs. We will discuss regression and associated algorithms in detail in further chapters also. Algorithms for Supervised Learning Supervised learning is one of the important models of learning involved in training machines. This chapter talks in detail about the same. There are several algorithms available for supervised learning. Some of the widely used algorithms of supervised learning are as shown below − k-Nearest Neighbours Decision Trees Naive Bayes Logistic Regression Support Vector Machines As we move ahead in this chapter, let us discuss in detail about each of the algorithms. k-Nearest Neighbours The k-Nearest Neighbours, which is simply called kNN is a statistical technique that can be used for solving for classification and regression problems. Let us discuss the case of classifying an unknown object using kNN. Consider the distribution of objects as shown in the image given below − Source: The diagram shows three types of objects, marked in red, blue and green colors. When you run the kNN classifier on the above dataset, the boundaries for each type of object will be marked as shown below − Source: Now, consider a new unknown object that you want to classify as red, green or blue. This is depicted in the figure below. As you see it visually, the unknown data point belongs to a class of blue objects. Mathematically, this can be concluded by measuring the distance of this unknown point with every other point in the data set. When you do so, you will know that most of its neighbours are of blue color. The average distance to red and green objects would be definitely more than the average distance to blue objects. Thus, this unknown object can be classified as belonging to blue class. The kNN algorithm can also be used for regression problems. The kNN algorithm is available as ready-to-use in most of the ML libraries. Decision Trees A simple decision tree in a flowchart format is shown below − You would write a code to classify your input data based on this flowchart. The flowchart is self-explanatory and trivial. In this scenario, you are trying to classify an incoming email to decide when to read it. In reality, the decision trees can be large and complex. There are several algorithms available to create and traverse these trees. As a Machine Learning enthusiast, you need to understand and master these techniques of creating and traversing decision trees. Naive Bayes Naive Bayes is used for creating classifiers. Suppose you want to sort out (classify) fruits of different kinds from a fruit basket. You may use features such as color, size and shape of a fruit, For example, any fruit that is red in color, is round in shape and is about 10 cm in diameter may be considered as Apple. So to train the model, you would use these features and test the probability that a given feature matches the desired constraints. The probabilities of different features are then combined to arrive at a probability that a given fruit is an Apple. Naive Bayes generally requires a small number of training data for classification. Logistic Regression Look at the following diagram. It shows the distribution of data points in XY plane. From the diagram, we can visually inspect the separation of red dots from green dots. You may draw a boundary line to separate out these dots. Now, to classify a new data point, you will just need to determine on which side of the line the point lies. Support Vector Machines Look at the following distribution of data. Here the three classes of data cannot be linearly separated. The boundary curves are non-linear. In such a case, finding the equation of the curve becomes a complex job. Source: The Support Vector Machines (SVM) comes handy in determining the separation boundaries in such situations.
Category: Machine Learning
Machine Learning – Models There are various Machine Learning algorithms, techniques and methods that can be used to build models for solving real-life problems by using data. In this chapter, we are going to discuss such different kinds of methods. There are four main types of machine learning methods classified based on human supervision − In the next four chapters, we will discuss each of these machine learning models in detail. Here, let”s have a brief overview of these methods: Supervised Learning algorithms or methods are the most commonly used ML algorithms. This method or learning algorithm takes the data sample i.e. the training data and its associated output i.e. labels or responses with each data sample during the training process. The main objective of supervised learning algorithms is to learn an association between input data samples and corresponding outputs after performing multiple training data instances. For example, we have x: Input variables and Y: Output variable Now, apply an algorithm to learn the mapping function from the input to output as follows − Y=f(x) Now, the main objective would be to approximate the mapping function so well that even when we have new input data (x), we can easily predict the output variable (Y) for that new input data. It is called supervised because the whole process of learning can be thought as it is being supervised by a teacher or supervisor. Examples of supervised machine learning algorithms includes Decision tree, Random Forest, KNN, Logistic Regression etc. Based on the ML tasks, supervised learning algorithms can be divided into the following two broad classes − Classification Regression Classification The key objective of classification-based tasks is to predict categorial output labels or responses for the given input data. The output will be based on what the model has learned in the training phase. As we know the categorial output responses means unordered and discrete values, hence each output response will belong to a specific class or category. We will discuss Classification and associated algorithms in detail in the upcoming chapters also. Classification Models Followings are some common classification models − Linear Discriminant Analysis Regression The key objective of regression-based tasks is to predict output labels or responses, which are continuous numeric values, for the given input data. The output will be based on what the model has learned in its training phase. Basically, regression models use the input data features (independent variables) and their corresponding continuous numeric output values (dependent or outcome variables) to learn specific associations between inputs and corresponding outputs. We will discuss regression and associated algorithms in detail in further chapters. Regression Models Followings are some common regression models − Ridge regression Unsupervised Learning As the name suggests, is opposite to supervised ML methods or algorithms in which we do not have any supervisor to provide any sort of guidance. Unsupervised learning algorithms are handy in the scenario in which we do not have the liberty, like in supervised learning algorithms, of having pre-labeled training data and we want to extract useful pattern from input data. For example, it can be understood as follows − Suppose we have − x: Input variables, then there would be no corresponding output variable and the algorithms need to discover the interesting pattern in data for learning. Examples of unsupervised machine learning algorithms includes K-means clustering, K-nearest neighbors etc. Based on the ML tasks, unsupervised learning algorithms can be divided into the following broad classes − Clustering Association Dimensionality Reduction Clustering Clustering methods are one of the most useful unsupervised ML methods. These algorithms used to find similarity as well as relationship patterns among data samples and then cluster those samples into groups having similarity based on features. The real-world example of clustering is to group the customers by their purchasing behavior. Clustering Models Followings are some common clustering models − Association Another useful unsupervised ML method is Association which is used to analyze large dataset to find patterns which further represents the interesting relationships between various items. It is also termed as Association Rule Mining or Market basket analysis which is mainly used to analyze customer shopping patterns. Association Models Followings are some common association models − Eclat algorithm FP-growth algorithm Dimensionality Reduction This unsupervised ML method is used to reduce the number of feature variables for each data sample by selecting set of principal or representative features. A question arises here is that why we need to reduce the dimensionality? The reason behind is the problem of feature space complexity which arises when we start analyzing and extracting millions of features from data samples. This problem generally refers to “curse of dimensionality”. PCA (Principal Component Analysis), K-nearest neighbors and discriminant analysis are some of the popular algorithms for this purpose. Dimensionality Reduction Models Followings are some common dimensionality Reduction models − Autoencoders Singular value decomposition (SVD) Anomaly Detection This unsupervised ML method is used to find out the occurrences of rare events or observations that generally do not occur. By using the learned knowledge, anomaly detection methods would be able to differentiate between anomalous or a normal data point. Some of the unsupervised algorithms like clustering, KNN can detect anomalies based on the data and its features. Semi-supervised Learning algorithms or methods are neither fully supervised nor fully unsupervised. They basically fall between the two i.e. supervised and unsupervised learning methods. These kinds of algorithms generally use small supervised learning component i.e. small amount of pre-labeled annotated data and large unsupervised learning component i.e. lots of unlabeled data for training. We can follow any of the following approaches for implementing semi-supervised learning methods − The first and simple approach is to build the supervised model based on small amount of labeled and annotated data and then build the unsupervised model by applying the same to the large amounts of unlabeled data to get more labeled samples. Now, train the model on them and repeat the process. The second approach needs some extra efforts. In this approach, we can
Time Series Tutorial Job Search A time series is a sequence of observations over a certain period. The simplest example of a time series that all of us come across on a day to day basis is the change in temperature throughout the day or week or month or year. The analysis of temporal data is capable of giving us useful insights on how a variable changes over time. This tutorial will teach you how to analyze and forecast time series data with the help of various statistical and machine learning models in elaborate and easy to understand way! Audience This tutorial is for the inquisitive minds who are looking to understand time series and time series forecasting models from scratch. At the end of this tutorial you will have a good understanding on time series modelling. Prerequisites This tutorial only assumes a preliminary understanding of Python language. Although this tutorial is self-contained, it will be useful if you have understanding of statistical mathematics. If you are new to either Python or Statistics, we suggest you to pick up a tutorial based on these subjects first before you embark on your journey with Time Series.
Time Series – Further Scope Machine learning deals with various kinds of problems. In fact, almost all fields have a scope to be automatized or improved with the help of machine learning. A few such problems on which a great deal of work is being done are given below. Time Series Data This is the data which changes according to time, and hence time plays a crucial role in it, which we largely discussed in this tutorial. Non-Time Series Data It is the data independent of time, and a major percentage of ML problems are on nontime series data. For simplicity, we shall categorize it further as − Numerical Data − Computers, unlike humans, only understand numbers, so all kinds of data ultimately is converted to numerical data for machine learning, for example, image data is converted to (r,b,g) values, characters are converted to ASCII codes or words are indexed to numbers, speech data is converted to mfcc files containing numerical data. Image Data − Computer vision has revolutionized the world of computers, it has various application in the field of medicine, satellite imaging etc. Text Data − Natural Language Processing (NLP) is used for text classification, paraphrase detection and language summarization. This is what makes Google and Facebook smart. Speech Data − Speech Processing involves speech recognition and sentiment understanding. It plays a crucial role in imparting computers the human-like qualities.
Time Series – Modeling Introduction A time series has 4 components as given below − Level − It is the mean value around which the series varies. Trend − It is the increasing or decreasing behavior of a variable with time. Seasonality − It is the cyclic behavior of time series. Noise − It is the error in the observations added due to environmental factors. Time Series Modeling Techniques To capture these components, there are a number of popular time series modelling techniques. This section gives a brief introduction of each technique, however we will discuss about them in detail in the upcoming chapters − Naïve Methods These are simple estimation techniques, such as the predicted value is given the value equal to mean of preceding values of the time dependent variable, or previous actual value. These are used for comparison with sophisticated modelling techniques. Auto Regression Auto regression predicts the values of future time periods as a function of values at previous time periods. Predictions of auto regression may fit the data better than that of naïve methods, but it may not be able to account for seasonality. ARIMA Model An auto-regressive integrated moving-average models the value of a variable as a linear function of previous values and residual errors at previous time steps of a stationary timeseries. However, the real world data may be non-stationary and have seasonality, thus Seasonal-ARIMA and Fractional-ARIMA were developed. ARIMA works on univariate time series, to handle multiple variables VARIMA was introduced. Exponential Smoothing It models the value of a variable as an exponential weighted linear function of previous values. This statistical model can handle trend and seasonality as well. LSTM Long Short-Term Memory model (LSTM) is a recurrent neural network which is used for time series to account for long term dependencies. It can be trained with large amount of data to capture the trends in multi-variate time series. The said modelling techniques are used for time series regression. In the coming chapters, let us now explore all these one by one.
Time Series – Auto Regression For a stationary time series, an auto regression models sees the value of a variable at time ‘t’ as a linear function of values ‘p’ time steps preceding it. Mathematically it can be written as − $$y_{t} = :C+:phi_{1}y_{t-1}:+:phi_{2}Y_{t-2}+…+phi_{p}y_{t-p}+epsilon_{t}$$ Where,‘p’ is the auto-regressive trend parameter $epsilon_{t}$ is white noise, and $y_{t-1}, y_{t-2}:: …y_{t-p}$ denote the value of variable at previous time periods. The value of p can be calibrated using various methods. One way of finding the apt value of ‘p’ is plotting the auto-correlation plot. Note − We should separate the data into train and test at 8:2 ratio of total data available prior to doing any analysis on the data because test data is only to find out the accuracy of our model and assumption is, it is not available to us until after predictions have been made. In case of time series, sequence of data points is very essential so one should keep in mind not to lose the order during splitting of data. An auto-correlation plot or a correlogram shows the relation of a variable with itself at prior time steps. It makes use of Pearson’s correlation and shows the correlations within 95% confidence interval. Let’s see how it looks like for ‘temperature’ variable of our data. Showing ACP In [141]: split = len(df) – int(0.2*len(df)) train, test = df[”T”][0:split], df[”T”][split:] In [142]: from statsmodels.graphics.tsaplots import plot_acf plot_acf(train, lags = 100) plt.show() All the lag values lying outside the shaded blue region are assumed to have a csorrelation.
Time Series – Naive Methods Introduction Naive Methods such as assuming the predicted value at time ‘t’ to be the actual value of the variable at time ‘t-1’ or rolling mean of series, are used to weigh how well do the statistical models and machine learning models can perform and emphasize their need. In this chapter, let us try these models on one of the features of our time-series data. First we shall see the mean of the ‘temperature’ feature of our data and the deviation around it. It is also useful to see maximum and minimum temperature values. We can use the functionalities of numpy library here. Showing statistics In [135]: import numpy print ( ”Mean: ”,numpy.mean(df[”T”]), Standard Deviation: ”,numpy.std(df[”T”]), nMaximum Temperature: ”,max(df[”T”]), Minimum Temperature: ”,min(df[”T”]) ) We have the statistics for all 9357 observations across equi-spaced timeline which are useful for us to understand the data. Now we will try the first naive method, setting the predicted value at present time equal to actual value at previous time and calculate the root mean squared error(RMSE) for it to quantify the performance of this method. Showing 1st naïve method In [136]: df[”T”] df[”T_t-1”] = df[”T”].shift(1) In [137]: df_naive = df[[”T”,”T_t-1”]][1:] In [138]: from sklearn import metrics from math import sqrt true = df_naive[”T”] prediction = df_naive[”T_t-1”] error = sqrt(metrics.mean_squared_error(true,prediction)) print (”RMSE for Naive Method 1: ”, error) RMSE for Naive Method 1: 12.901140576492974 Let us see the next naive method, where predicted value at present time is equated to the mean of the time periods preceding it. We will calculate the RMSE for this method too. Showing 2nd naive method In [139]: df[”T_rm”] = df[”T”].rolling(3).mean().shift(1) df_naive = df[[”T”,”T_rm”]].dropna() In [140]: true = df_naive[”T”] prediction = df_naive[”T_rm”] error = sqrt(metrics.mean_squared_error(true,prediction)) print (”RMSE for Naive Method 2: ”, error) RMSE for Naive Method 2: 14.957633272839242 Here, you can experiment with various number of previous time periods also called ‘lags’ you want to consider, which is kept as 3 here. In this data it can be seen that as you increase the number of lags and error increases. If lag is kept 1, it becomes same as the naïve method used earlier. Points to Note You can write a very simple function for calculating root mean squared error. Here, we have used the mean squared error function from the package ‘sklearn’ and then taken its square root. In pandas df[‘column_name’] can also be written as df.column_name, however for this dataset df.T will not work the same as df[‘T’] because df.T is the function for transposing a dataframe. So use only df[‘T’] or consider renaming this column before using the other syntax.
Time Series – ARIMA We have already understood that for a stationary time series a variable at time ‘t’ is a linear function of prior observations or residual errors. Hence it is time for us to combine the two and have an Auto-regressive moving average (ARMA) model. However, at times the time series is not stationary, i.e the statistical properties of a series like mean, variance changes over time. And the statistical models we have studied so far assume the time series to be stationary, therefore, we can include a pre-processing step of differencing the time series to make it stationary. Now, it is important for us to find out whether the time series we are dealing with is stationary or not. Various methods to find the stationarity of a time series are looking for seasonality or trend in the plot of time series, checking the difference in mean and variance for various time periods, Augmented Dickey-Fuller (ADF) test, KPSS test, Hurst’s exponent etc. Let us see whether the ‘temperature’ variable of our dataset is a stationary time series or not using ADF test. In [74]: from statsmodels.tsa.stattools import adfuller result = adfuller(train) print(”ADF Statistic: %f” % result[0]) print(”p-value: %f” % result[1]) print(”Critical Values:”) for key, value In result[4].items() print(”t%s: %.3f” % (key, value)) ADF Statistic: -10.406056 p-value: 0.000000 Critical Values: 1%: -3.431 5%: -2.862 10%: -2.567 Now that we have run the ADF test, let us interpret the result. First we will compare the ADF Statistic with the critical values, a lower critical value tells us the series is most likely non-stationary. Next, we see the p-value. A p-value greater than 0.05 also suggests that the time series is non-stationary. Alternatively, p-value less than or equal to 0.05, or ADF Statistic less than critical values suggest the time series is stationary. Hence, the time series we are dealing with is already stationary. In case of stationary time series, we set the ‘d’ parameter as 0. We can also confirm the stationarity of time series using Hurst exponent. In [75]: import hurst H, c,data = hurst.compute_Hc(train) print(“H = {:.4f}, c = {:.4f}”.format(H,c)) H = 0.1660, c = 5.0740 The value of H<0.5 shows anti-persistent behavior, and H>0.5 shows persistent behavior or a trending series. H=0.5 shows random walk/Brownian motion. The value of H<0.5, confirming that our series is stationary. For non-stationary time series, we set ‘d’ parameter as 1. Also, the value of the auto-regressive trend parameter ‘p’ and the moving average trend parameter ‘q’, is calculated on the stationary time series i.e by plotting ACP and PACP after differencing the time series. ARIMA Model, which is characterized by 3 parameter, (p,d,q) are now clear to us, so let us model our time series and predict the future values of temperature. In [156]: from statsmodels.tsa.arima_model import ARIMA model = ARIMA(train.values, order=(5, 0, 2)) model_fit = model.fit(disp=False) In [157]: predictions = model_fit.predict(len(test)) test_ = pandas.DataFrame(test) test_[”predictions”] = predictions[0:1871] In [158]: plt.plot(df[”T”]) plt.plot(test_.predictions) plt.show() In [167]: error = sqrt(metrics.mean_squared_error(test.values,predictions[0:1871])) print (”Test RMSE for ARIMA: ”, error) Test RMSE for ARIMA: 43.21252940234892
Machine Learning – Data Understanding While working with machine learning projects, usually we ignore two most important parts called mathematics and data. What makes data understanding a critical step in ML is its data driven approach. Our ML model will produce only as good or as bad results as the data we provided to it. Data understanding basically involves analyzing and exploring the data to identify any patterns or trends that may be present. The data understanding phase typically involves the following steps − Data Collection − This involves gathering the relevant data that you will be using for your analysis. The data can be collected from various sources such as databases, websites, and APIs. Data Cleaning − This involves cleaning the data by removing any irrelevant or duplicate data, and dealing with missing data values. The data should be formatted in a way that makes it easy to analyze. Data Exploration − This involves exploring the data to identify any patterns or trends that may be present. This can be done using various statistical techniques such as histograms, scatter plots, and correlation analysis. Data Visualization − This involves creating visual representations of the data to help you understand it better. This can be done using tools such as graphs, charts, and maps. Data Preprocessing − This involves transforming the data to make it suitable for use in machine learning algorithms. This can include scaling the data, transforming it into a different format, or reducing its dimensionality. Understand the Data before Uploading It in ML Projects Understanding our data before uploading it into our ML project is important for several reasons − Identify Data Quality Issues By understanding your data, you can identify data quality issues such as missing values, outliers, incorrect data types, and inconsistencies that can affect the performance of your ML model. By addressing these issues, you can improve the quality and accuracy of your model. Determine Data Relevance You can determine if the data you have collected is relevant to the problem you are trying to solve. By understanding your data, you can determine which features are important for your model and which ones can be ignored. Select Appropriate ML Techniques Depending on the characteristics of your data, you may need to choose a particular ML technique or algorithm. For example, if your data is categorical, you may need to use classification techniques, while if your data is continuous, you may need to use regression techniques. Understanding your data can help you select the appropriate ML technique for your problem. Improve Model Performance By understanding your data, you can engineer new features, preprocess your data, and select the appropriate ML technique to improve the performance of your model. This can result in better accuracy, precision, recall, and F1 score. Data Understanding with Statistics In the previous chapter, we discussed how we can upload CSV data into our ML project, but it would be good to understand the data before uploading it. We can understand the data by two ways, with statistics and with visualization. In this chapter, with the help of following Python recipes, we are going to understand ML data with statistics. Looking at Raw Data The very first recipe is for looking at your raw data. It is important to look at raw data because the insight we will get after looking at raw data will boost our chances to better pre-processing as well as handling of data for ML projects. Following is a Python script implemented by using head() function of Pandas DataFrame on Pima Indians diabetes dataset to look at the first 10 rows to get better understanding of it − Example from pandas import read_csv path = r”C:pima-indians-diabetes.csv” headernames = [”preg”, ”plas”, ”pres”, ”skin”, ”test”, ”mass”, ”pedi”, ”age”, ”class”] data = read_csv(path, names=headernames) print(data.head(10)) Output preg plas pres skin test mass pedi age class 0 6 148 72 35 0 33.6 0.627 50 1 1 1 85 66 29 0 26.6 0.351 31 0 2 8 183 64 0 0 23.3 0.672 32 1 3 1 89 66 23 94 28.1 0.167 21 0 4 0 137 40 35 168 43.1 2.288 33 1 5 5 116 74 0 0 25.6 0.201 30 0 6 3 78 50 32 88 31.0 0.248 26 1 7 10 115 0 0 0 35.3 0.134 29 0 8 2 197 70 45 543 30.5 0.158 53 1 9 8 125 96 0 0 0.0 0.232 54 1 10 4 110 92 0 0 37.6 0.191 30 0 We can observe from the above output that first column gives the row number which can be very useful for referencing a specific observation. Checking Dimensions of Data It is always a good practice to know how much data, in terms of rows and columns, we are having for our ML project. The reasons behind are − Suppose if we have too many rows and columns then it would take long time to run the algorithm and train the model. Suppose if we have too less rows and columns then it we would not have enough data to well train the model. Following is a Python script implemented by printing the shape property on Pandas Data Frame. We are going to implement it on iris data set for getting the total number of rows and columns in it. Example from pandas import read_csv path = r”C:iris.csv” data = read_csv(path) print(data.shape) Output (150, 4) We can easily observe from the output that iris data set, we are going to use, is having 150 rows and 4 columns. Getting Each Attribute’s Data Type It is another good practice to know data type of each attribute. The reason behind is that, as per to the requirement, sometimes we may need to convert one data type to another. For example, we may need to convert string into floating point or int for representing categorial or ordinal values. We can have an idea about the attribute’s data type by looking at
Time Series – Exponential Smoothing In this chapter, we will talk about the techniques involved in exponential smoothing of time series. Simple Exponential Smoothing Exponential Smoothing is a technique for smoothing univariate time-series by assigning exponentially decreasing weights to data over a time period. Mathematically, the value of variable at time ‘t+1’ given value at time t, y_(t+1|t) is defined as − $$y_{t+1|t}:=:alpha y_{t}:+:alphalgroup1 -alphargroup y_{t-1}:+alphalgroup1-alphargroup^{2}:y_{t-2}:+:…+y_{1}$$ where,$0leqalpha leq1$ is the smoothing parameter, and $y_{1},….,y_{t}$ are previous values of network traffic at times 1, 2, 3, … ,t. This is a simple method to model a time series with no clear trend or seasonality. But exponential smoothing can also be used for time series with trend and seasonality. Triple Exponential Smoothing Triple Exponential Smoothing (TES) or Holt”s Winter method, applies exponential smoothing three times – level smoothing $l_{t}$, trend smoothing $b_{t}$, and seasonal smoothing $S_{t}$, with $alpha$, $beta^{*}$ and $gamma$ as smoothing parameters with ‘m’ as the frequency of the seasonality, i.e. the number of seasons in a year. According to the nature of the seasonal component, TES has two categories − Holt-Winter”s Additive Method − When the seasonality is additive in nature. Holt-Winter’s Multiplicative Method − When the seasonality is multiplicative in nature. For non-seasonal time series, we only have trend smoothing and level smoothing, which is called Holt’s Linear Trend Method. Let’s try applying triple exponential smoothing on our data. In [316]: from statsmodels.tsa.holtwinters import ExponentialSmoothing model = ExponentialSmoothing(train.values, trend= ) model_fit = model.fit() In [322]: predictions_ = model_fit.predict(len(test)) In [325]: plt.plot(test.values) plt.plot(predictions_[1:1871]) Out[325]: [<matplotlib.lines.Line2D at 0x1eab00f1cf8>] Here, we have trained the model once with training set and then we keep on making predictions. A more realistic approach is to re-train the model after one or more time step(s). As we get the prediction for time ‘t+1’ from training data ‘til time ‘t’, the next prediction for time ‘t+2’ can be made using the training data ‘til time ‘t+1’ as the actual value at ‘t+1’ will be known then. This methodology of making predictions for one or more future steps and then re-training the model is called rolling forecast or walk forward validation.