- Creates an object that can (provide the indices to) split the data in a
training and test for model validation.
holdout(dataX, dataY, percentage_training=0.75)¶
Splits the data in a training and test set using the hold-out method.
dataX: The input data. dataY: The output data (true label/golden standard). percentage_training: Number between 0 and 1 that indicates the
percentage of data that should be in the training data set (default = 0.75).
Returns: Tuple containing (x_train, y_train, x_test, y_test)
x_train: Input variables of the training data.
y_train: Output variables (true label/golden standard) of the training data.
x_test: Input variables of the test data.
y_test: Output variables (true label/golden standard) of the test data.
- Provides the user with indices for ‘k’ number of folds for the training
and testing of the model.
data_length – The total number of instances in the data sets (number of rows).
number_of_folds – The number of folds the data should be split in (default = 10)
A list with k (non-overlapping) sublists each containing the indices for one fold.