LHAT Accessors, Methods and Objects

lhat.IO: Data operations on input and output data

lhat.Model: Model parameterisation and grid search

class lhat.Model.MachineLearning(X, y, pathToSavedModel, model_name='SVM', modelExist=False)

Bases: object

Module contains routines required to run machine learning model and parameterise according to best generated accuracy scores. The model determines, based on its input data, the probability of a landslide occurring in a pixel.

Parameters

path – path to the saved model
model_name – ‘SVM’, ‘RF’, ‘LR’
saveModel – True or False

bestModel: Model name is not correctly specified!

evaluateTrainedModel(model, X, y)

The trained model is evaluated for accuracy and a confusion matrix is made

Parameters

model – Machine learning model of choice paramterised by best performing in terms of accuracy
X – Input training data to perform accuracy scoring on
y – Actual landslide class (1 = landslide; 0 = no landslide) to compare with predicted class.

Returns

Prints accuracy score and confusion matrix

loadMLModel()

logisticRegression()

predict_proba(raster_stack, estimator, scaler, file_path, reference, no_data)

Apply class probability prediction of a scikit learn model to a RasterStack.

Parameters: estimator – Estimator object implementing ‘fit’. The object is used to fit the data.

probfun(img, estimator, scaler)

Class probabilities function.

Parameters

img (tuple (window, numpy.ndarray)) – A window object, and a 3d ndarray of raster data with the dimensions in order of (band, rows, columns).
estimator (Estimator object implementing the 'fit') – The object to use to fit the data.

Returns

Multi-band raster as a 3d numpy array containing the probabilities associated with each class. ndarray dimensions are in the order of (class, row, column).

Return type

numpy.ndarray

randomForest()

saveMLModel()

supportVectorMachine()

trainModel(baselineModel, modelParameters)

Trains the machine learning model of choice and performs self parameterisation based on GridSearchCV. Parameters achieving the highest accuracy are selected.

Parameters

baselineModel – This is the model that is used for training (based on choice by user, either Support Vector, Random Forest or Logistic Regression)
modelParameters – This is a set of values for hyper parameters of the model that is used in cross-validation

Returns

The best model

trainTestSplit(): Splits the dataset into training and test sets. Test size is by default determined as 20% of the dataset.