LHAT Accessors, Methods and Objects
lhat.IO: Data operations on input and output data
lhat.Model: Model parameterisation and grid search
- class lhat.Model.MachineLearning(X, y, pathToSavedModel, model_name='SVM', modelExist=False)
Bases:
objectModule contains routines required to run machine learning model and parameterise according to best generated accuracy scores. The model determines, based on its input data, the probability of a landslide occurring in a pixel.
- Parameters
path – path to the saved model
model_name – ‘SVM’, ‘RF’, ‘LR’
saveModel – True or False
- bestModel
Model name is not correctly specified!
- evaluateTrainedModel(model, X, y)
The trained model is evaluated for accuracy and a confusion matrix is made
- Parameters
model – Machine learning model of choice paramterised by best performing in terms of accuracy
X – Input training data to perform accuracy scoring on
y – Actual landslide class (1 = landslide; 0 = no landslide) to compare with predicted class.
- Returns
Prints accuracy score and confusion matrix
- loadMLModel()
- logisticRegression()
- predict_proba(raster_stack, estimator, scaler, file_path, reference, no_data)
Apply class probability prediction of a scikit learn model to a RasterStack.
- Parameters
estimator – Estimator object implementing ‘fit’. The object is used to fit the data.
- probfun(img, estimator, scaler)
Class probabilities function.
- Parameters
img (tuple (window, numpy.ndarray)) – A window object, and a 3d ndarray of raster data with the dimensions in order of (band, rows, columns).
estimator (Estimator object implementing the 'fit') – The object to use to fit the data.
- Returns
Multi-band raster as a 3d numpy array containing the probabilities associated with each class. ndarray dimensions are in the order of (class, row, column).
- Return type
numpy.ndarray
- randomForest()
- saveMLModel()
- supportVectorMachine()
- trainModel(baselineModel, modelParameters)
Trains the machine learning model of choice and performs self parameterisation based on GridSearchCV. Parameters achieving the highest accuracy are selected.
- Parameters
baselineModel – This is the model that is used for training (based on choice by user, either Support Vector, Random Forest or Logistic Regression)
modelParameters – This is a set of values for hyper parameters of the model that is used in cross-validation
- Returns
The best model
- trainTestSplit()
Splits the dataset into training and test sets. Test size is by default determined as 20% of the dataset.