Python Tutorial

Multiple machine learning algorithms are used in ensemble learning, aiming to improve the correct prediction ratio on a dataset. A dataset is used to train a list of machine learning models, and the distinct predictions made by each of the models applied to the dataset form the basis of an ensemble learning model. The ensemble model then combines the outcomes of different models' predictions to get the final result.

Each model has advantages and disadvantages. By integrating different independent models, ensemble models can effectively mask a particular model's flaws.

Typically, ensemble techniques fall into one of two categories:

Boosting

By transforming poor learning models into strong learners, the boosting machine learning ensemble model uses the approach to lower the bias and variance in the dataset. The dataset is sequentially introduced to poor machine-learning models. The first stage is to make a preliminary model and fit that model into the created training dataset.

Then, a secondary model that aims to correct the previous model's flaws is fitted. Here is the step-wise detail of the complete procedure:

From the original dataset, construct a subset.
Create a preliminary model using this sub-dataset.
Make predictions through this preliminary model using the entire collection of the dataset.
Utilize the predictions made by the model and the true values of those features to calculate the accuracy score.
Give the wrong predictions a greater weight.
Build a second model that is better than the previous one, which tries to correct the flaws in the previous model.
Using this new and better model, make predictions by inputting the complete dataset.
Make several different models; each one must be intended to fix the mistakes made by the one applied before it.

We can determine the final model by weighing the average of each model.

Examples: AdaBoost, Gradient Tree Boosting

Averaging

In averaging, the final output is an average of all predictions. This goes for regression problems. For example, in random forest regression, the final result is the average of the predictions from individual decision trees.

Let's take an example of three regression models that predict the price of a commodity as follows:

regressor 1 ? 200

regressor 2 ? 300

regressor 3 ? 400

The final prediction would be an average of 200, 300, and 400.

Examples: Bagging methods, Forests of randomized trees

Some Sklearn Ensemble Methods

Methods	Description
ensemble.AdaBoostClassifier([ ... ])	This function implements the AdaBoost classifier.
ensemble.AdaBoostRegressor([ base_estimator, ...])	A function to implement the AdaBoost regressor.
ensemble.BaggingClassifier([ base_estimator, ...])	This applies Bagging classifier.
ensemble.BaggingRegressor([ base_estimator, ...])	A function Bagging regressor.
ensemble.ExtraTreesClassifier([ ... ])	This function implements an extra-trees classifier.
ensemble.ExtraTreesRegressor([ n_estimators, ...])	This function applies an extra-trees regressor.
*ensemble.GradientBoostingClassifier( [, ...])**	Gradient Boosting is used for classification.
*ensemble.GradientBoostingRegressor( [, ...])**	Gradient Boosting is used for regression.
*ensemble.IsolationForest( [, n_estimators, ...])**	The function for Isolation Forest Algorithm.
ensemble.RandomForestClassifier([ ... ])	A function to implement a random forest classifier.
ensemble.RandomForestRegressor([ ... ])	A function to implement random forest regressor.
ensemble.RandomTreesEmbedding([ ... ])	An ensemble of random trees.
ensemble.StackingClassifier( estimators[, ...])	This function uses a stack of estimators having the best classifier.
ensemble.StackingRegressor(estimators[, ...])	This function uses a stack of estimators having the best regressor.
*ensemble.VotingClassifier(estimators, [, ...])**	A voting Rule classifier which is particularly used for unfitted estimators.
*ensemble.VotingRegressor(estimators, [, ...])**	Prediction made by voting regressor, which is particularly used for unfitted estimators.
ensemble.HistGradientBoostingRegressor([...])	A histogram is used to implement Gradient Boosting Regression Tree.
ensemble.HistGradientBoostingClassifier([...])	A histogram is used to implement the Gradient Boosting Classification Tree.

AdaBoost

The result of averaging is a mean of all the predictions made by different. This is used for regression cases. The outcome, for instance, in random forest regression is the mean of the forecasts from many decision trees.

Let's look at an illustration of 3 regression models that forecast the cost of a good:

100 by Regressor 1

300 by regressor 2

400 by regressor 3

The final result will be an average of these different predictions.
The weights assigned to each observation of the dataset are initially the same.
A subset of the entire data is used to construct a preliminary model.
Predictions are generated using this preliminary model for the entire dataset.
The accuracy score is computed by evaluating the predicted and true values of the observations.
The data values this model inaccurately predicted are given greater weight when building the subsequent model.
We may calculate weights based on the error value. For instance, the weight attributed to the observation increases with the magnitude of the inaccuracy.
Until the defined error function stops changing or is stationary, or the optimum number of estimators is achieved, this whole process is repeated.

AdaBoost Classifier Example

Code

# Python program to implement AdaBoost Classifier on the diabetes dataset

# Impoting the required modules
import pandas
from sklearn import model_selection
from sklearn.ensemble import AdaBoostClassifier

# Loading the PIMA Indian Diabetes dataset through read_csv
df = pandas.read_csv("diabetes data.csv")

# Separating the dependent and independent features
X = df.iloc[:, :-1].values
Y = df.iloc[:, 1].values

seed = 8
number_trees = 30

# Splitting the dataset into training and testing dataset
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size = 0.4)

# Using KFold cross-validation to calculate accuracy
k_fold = model_selection.KFold(n_splits = 10)

# Creating the model and fitting data into it
abc = AdaBoostClassifier(n_estimators = number_trees)
abc.fit(X_train, Y_train)
results = model_selection.cross_val_score(abc, X, Y, cv = k_fold)

# Mean accuracy score
print(results.mean())

Output

0.0989747095010253

AdaBoost Regressor Example

Code

# Python program to implement AdaBoost Regressor on the Boston Housing dataset

# Importing the required modules
import numpy as np
from sklearn.datasets import load_boston
from sklearn.ensemble import AdaBoostRegressor
from sklearn.model_selection import cross_val_score, KFold
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Loading the Boston Housing dataset
boston = load_boston()
X, Y = boston.data, boston.target

# Splitting the dataset into the training and the testing dataset
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3)

# Creating an instance of the AdaBoost regressor class and fitting the dataset into it
abr = AdaBoostRegressor(n_estimators = 100)
abr.fit(X_train, Y_train)

# Performing cross-validation to calculate the accuracy scores
scores = cross_val_score(abr, X_train, Y_train, cv = 5)
print("The mean of the cross-validation scores: ", scores.mean())

# Using the KFold Cross-Validation method 
k_fold = KFold(n_splits = 10, shuffle = True)
kf_scores = cross_val_score(abr, X_train, Y_train, cv = k_fold )
print("The average Score of KFold CV: ", kf_scores.mean())

# Making predictions using the trained model
Y_pred = abr.predict(X_test)
error = mean_squared_error(Y_test, Y_pred)
print("The Mean Squared Error: ", error)
print("The Root Mean Squared Error: ", np.sqrt(error))

# plotting our results
x_axis = range(len(Y_test))
plt.scatter(x_axis, Y_test, s = 5, color = "red", label = "True Target Values")
plt.plot(x_axis, Y_pred, lw = 0.8, color = "blue", label = "Predicted Target Values")
plt.legend()
plt.show()

Output

The mean of the cross-validation scores:  0.7728539924062154
The average Score of KFold CV:  0.7966820925398042
The Mean Squared Error:  14.201356518866593
The Root Mean Squared Error:  3.768468723349922

Bagging

Bagging is one of the Ensemble construction techniques, which is also known as Bootstrap Aggregation. Bootstrap establishes the foundation of the Bagging technique. Bootstrap is a sampling technique in which we select "n" observations from a population of "n" observations. But the selection is entirely random, i.e., each observation can be chosen from the original population so that each observation is equally likely to be selected in each iteration of the bootstrapping process. After the bootstrapped samples are formed, separate models are trained with the bootstrapped samples. In real experiments, the bootstrapped samples are drawn from the training set, and the sub-models are tested using the testing set. The final output prediction is combined across the projections of all the sub-models.

Bagging Classifier Example

Code

# Python program to implement Bagging Classifier on the diabetes dataset

# Importing the required libraries
import pandas
from sklearn import model_selection
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

# Loading the PIMA Indian Diabetes dataset through read_csv
df = pandas.read_csv("diabetes data.csv")

# Separating the dependent and independent features
X = df.iloc[:, :-1].values
Y = df.iloc[:, 1].values

# Splitting the dataset into training and testing dataset
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size = 0.4)

# Using KFold cross-validation to calculate accuracy
k_fold = model_selection.KFold(n_splits = 10)

# Training the model
dtc = DecisionTreeClassifier()
dtc.fit(X_train, Y_train)
number_trees = 100
bc = BaggingClassifier(base_estimator = dtc, n_estimators = number_trees)
results = model_selection.cross_val_score(bc, X, Y, cv = k_fold)

# The mean score
print(results.mean())

Output

0.9179254955570745

Bagging Regressor Example

Code

# Python program to implement Bagging Regressor on the dataset generated

# Importing the required libraries
import numpy as np
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.datasets import make_regression
from sklearn.ensemble import BaggingRegressor
from sklearn.model_selection import RepeatedKFold

# Generating the dataset
X, Y = make_regression(n_samples = 1500, n_features = 25, n_informative = 20, noise = 0.2, random_state = 5)

# Splitting the dataset into training and testing dataset
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size = 0.4)

# Training the model
br = BaggingRegressor()
br.fit(X, Y)

# evaluating the model
cross_val = RepeatedKFold(n_splits = 10, n_repeats = 3)
scores = cross_val_score(br, X, Y, scoring = 'neg_mean_absolute_error', cv = cross_val, n_jobs = -1, error_score = 'raise')

# Mean scores
print("Mean score and standard deviation: ", np.mean(scores), np.std(scores))

Output

Mean score and standard deviation:  -114.10792855309286 5.633321726584775

Next TopicParse date from string python

← prev next →

For Videos Join Our Youtube Channel: Join Now

Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

Splunk

SPSS

Swagger

Transact-SQL

Tumblr

ReactJS

Regex

Reinforcement Learning

R Programming

RxJS

React Native

Python Design Patterns

Python Pillow

Python Turtle

Keras

Preparation

Aptitude

Reasoning

Verbal Ability

Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

AWS

Selenium

Cloud Computing

Hadoop

ReactJS

Data Science

Angular 7

Blockchain

Git

Machine Learning

DevOps

B.Tech / MCA

DBMS

Data Structures

DAA

Operating System

Computer Network

Compiler Design

Computer Organization

Discrete Mathematics

Ethical Hacking

Computer Graphics

Software Engineering

Web Technology

Cyber Security

Automata

C Programming

C++

Java

.Net

Python

Programs

Control System

Data Mining

Data Warehouse

^{Like/Subscribe us for latest updates or newsletter}

Python Tutorial

Python OOPs

Python MySQL

Python MongoDB

Python SQLite

Python Questions

Plotly

Python Tkinter (GUI)

Python Web Blocker

Python MCQ

Related Tutorials

Python Programs

Sklearn Ensemble

Boosting

Averaging

Some Sklearn Ensemble Methods

AdaBoost

AdaBoost Classifier Example

AdaBoost Regressor Example

Bagging

Bagging Classifier Example

Bagging Regressor Example

Feedback

Help Others, Please Share

Learn Latest Tutorials

Preparation

Trending Technologies

B.Tech / MCA