Improving Prediction Performance

Improving Prediction Performance ”; Previous Next In this chapter, we will focus on building a model that helps in the prediction of student’s performance with a number of attributes included in it. The focus is to display the failure result of students in an examination. Process The target value of assessment is G3. This values can be binned and further classified as failure and success. If G3 value is greater than or equal to 10, then the student passes the examination. Example Consider the following example wherein a code is executed to predict the performance if students − import pandas as pd “”” Read data file as DataFrame “”” df = pd.read_csv(“student-mat.csv”, sep=”;”) “”” Import ML helpers “”” from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix from sklearn.model_selection import GridSearchCV, cross_val_score from sklearn.pipeline import Pipeline from sklearn.feature_selection import SelectKBest, chi2 from sklearn.svm import LinearSVC # Support Vector Machine Classifier model “”” Split Data into Training and Testing Sets “”” def split_data(X, Y): return train_test_split(X, Y, test_size=0.2, random_state=17) “”” Confusion Matrix “”” def confuse(y_true, y_pred): cm = confusion_matrix(y_true=y_true, y_pred=y_pred) # print(“nConfusion Matrix: n”, cm) fpr(cm) ffr(cm) “”” False Pass Rate “”” def fpr(confusion_matrix): fp = confusion_matrix[0][1] tf = confusion_matrix[0][0] rate = float(fp) / (fp + tf) print(“False Pass Rate: “, rate) “”” False Fail Rate “”” def ffr(confusion_matrix): ff = confusion_matrix[1][0] tp = confusion_matrix[1][1] rate = float(ff) / (ff + tp) print(“False Fail Rate: “, rate) return rate “”” Train Model and Print Score “”” def train_and_score(X, y): X_train, X_test, y_train, y_test = split_data(X, y) clf = Pipeline([ (”reduce_dim”, SelectKBest(chi2, k=2)), (”train”, LinearSVC(C=100)) ]) scores = cross_val_score(clf, X_train, y_train, cv=5, n_jobs=2) print(“Mean Model Accuracy:”, np.array(scores).mean()) clf.fit(X_train, y_train) confuse(y_test, clf.predict(X_test)) print() “”” Main Program “”” def main(): print(“nStudent Performance Prediction”) # For each feature, encode to categorical values class_le = LabelEncoder() for column in df[[“school”, “sex”, “address”, “famsize”, “Pstatus”, “Mjob”, “Fjob”, “reason”, “guardian”, “schoolsup”, “famsup”, “paid”, “activities”, “nursery”, “higher”, “internet”, “romantic”]].columns: df[column] = class_le.fit_transform(df[column].values) # Encode G1, G2, G3 as pass or fail binary values for i, row in df.iterrows(): if row[“G1”] >= 10: df[“G1”][i] = 1 else: df[“G1”][i] = 0 if row[“G2”] >= 10: df[“G2”][i] = 1 else: df[“G2”][i] = 0 if row[“G3”] >= 10: df[“G3”][i] = 1 else: df[“G3”][i] = 0 # Target values are G3 y = df.pop(“G3”) # Feature set is remaining features X = df print(“nnModel Accuracy Knowing G1 & G2 Scores”) print(“=====================================”) train_and_score(X, y) # Remove grade report 2 X.drop([“G2”], axis = 1, inplace=True) print(“nnModel Accuracy Knowing Only G1 Score”) print(“=====================================”) train_and_score(X, y) # Remove grade report 1 X.drop([“G1”], axis=1, inplace=True) print(“nnModel Accuracy Without Knowing Scores”) print(“=====================================”) train_and_score(X, y) main() Output The above code generates the output as shown below The prediction is treated with reference to only one variable. With reference to one variable, the student performance prediction is as shown below − Print Page Previous Next Advertisements ”;

Implementation of Agile

Agile Data Science – Implementation of Agile ”; Previous Next There are various methodologies used in the agile development process. These methodologies can be used for data science research process as well. The flowchart given below shows the different methodologies − Scrum In software development terms, scrum means managing work with a small team and management of a specific project to reveal the strength and weaknesses of the project. Crystal methodologies Crystal methodologies include innovative techniques for product management and execution. With this method, teams can go about similar tasks in different ways. Crystal family is one of the easiest methodology to apply. Dynamic Software Development Method This delivery framework is primarily used to implement the current knowledge system in software methodology. Future driven development The focus of this development life cycle is features involved in project. It works best for domain object modeling, code and feature development for ownership. Lean Software development This method aims at increasing the speed of software development at low cost and focusses the team on delivering specific value to customer. Extreme Programming Extreme programming is a unique software development methodology, which focusses on improving the software quality. This comes effective when the customer is not sure about the functionality of any project. Agile methodologies are taking root in data science stream and it is considered as the important software methodology. With agile self-organizing, cross-functional teams can work together in effective manner. As mentioned there are six main categories of agile development and each one of them can be streamed with data science as per the requirements. Data science involves an iterative process for statistical insights. Agile helps in breaking down the data science modules and helps in processing iterations and sprints in effective manner. The process of Agile Data Science is an amazing way of understanding how and why data science module is implemented. It solves problems in creative manner. Print Page Previous Next Advertisements ”;

NoSQL & Dataflow programming

NoSQL and Dataflow programming ”; Previous Next There are times when the data is unavailable in relational format and we need to keep it transactional with the help of NoSQL databases. In this chapter, we will focus on the dataflow of NoSQL. We will also learn how it is operational with a combination of agile and data science. One of the major reasons to use NoSQL with agile is to increase the speed with market competition. The following reasons show how NoSQL is a best fit to agile software methodology − Fewer Barriers Changing the model, which at present is going through mid-stream has some real costs even in case of agile development. With NoSQL, the users work with aggregate data instead of wasting time in normalizing data. The main point is to get something done and working with the goal of making model perfect data. Increased Scalability Whenever an organization is creating product, it lays more focus on its scalability. NoSQL is always known for its scalability but it works better when it is designed with horizontal scalability. Ability to leverage data NoSQL is a schema-less data model that allows the user to readily use volumes of data, which includes several parameters of variability and velocity. When considering a choice of technology, you should always consider the one, which leverages the data to a greater scale. Dataflow of NoSQL Let us consider the following example wherein, we have shown how a data model is focused on creating the RDBMS schema. Following are the different requirements of schema − User Identification should be listed. Every user should have mandatory at least one skill. The details of every user’s experience should be maintained properly. The user table is normalized with 3 separate tables − Users User skills User experience The complexity increases while querying the database and time consumption is noted with increased normalization which is not good for Agile methodology. The same schema can be designed with the NoSQL database as mentioned below − NoSQL maintains the structure in JSON format, which is light- weight in structure. With JSON, applications can store objects with nested data as single documents. Print Page Previous Next Advertisements ”;