LHAT Accessors, Methods and Objects

lhat.IO: Data operations on input and output data

lhat.Model: Model parameterisation and grid search

class lhat.Model.MachineLearning(X, y, pathToSavedModel, model_name='SVM', modelExist=False)

Bases: object

Module contains routines required to run machine learning model and parameterise according to best generated accuracy scores. The model determines, based on its input data, the probability of a landslide occurring in a pixel.

Parameters
  • path – path to the saved model

  • model_name – ‘SVM’, ‘RF’, ‘LR’

  • saveModel – True or False

bestModel

Model name is not correctly specified!

evaluateTrainedModel(model, X, y)

The trained model is evaluated for accuracy and a confusion matrix is made

Parameters
  • model – Machine learning model of choice paramterised by best performing in terms of accuracy

  • X – Input training data to perform accuracy scoring on

  • y – Actual landslide class (1 = landslide; 0 = no landslide) to compare with predicted class.

Returns

Prints accuracy score and confusion matrix

loadMLModel()
logisticRegression()
predict_proba(raster_stack, estimator, scaler, file_path, reference, no_data)

Apply class probability prediction of a scikit learn model to a RasterStack.

Parameters

estimator – Estimator object implementing ‘fit’. The object is used to fit the data.

probfun(img, estimator, scaler)

Class probabilities function.

Parameters
  • img (tuple (window, numpy.ndarray)) – A window object, and a 3d ndarray of raster data with the dimensions in order of (band, rows, columns).

  • estimator (Estimator object implementing the 'fit') – The object to use to fit the data.

Returns

Multi-band raster as a 3d numpy array containing the probabilities associated with each class. ndarray dimensions are in the order of (class, row, column).

Return type

numpy.ndarray

randomForest()
saveMLModel()
supportVectorMachine()
trainModel(baselineModel, modelParameters)

Trains the machine learning model of choice and performs self parameterisation based on GridSearchCV. Parameters achieving the highest accuracy are selected.

Parameters
  • baselineModel – This is the model that is used for training (based on choice by user, either Support Vector, Random Forest or Logistic Regression)

  • modelParameters – This is a set of values for hyper parameters of the model that is used in cross-validation

Returns

The best model

trainTestSplit()

Splits the dataset into training and test sets. Test size is by default determined as 20% of the dataset.