Data Splitter

class pyfume.Splitter.DataSplitter

Bases: object

Creates an object that can (provide the indices to) split the data in a

training and test for model validation.

holdout(dataX, dataY, percentage_training=0.75)

Splits the data in a training and test set using the hold-out method.


dataX: The input data. dataY: The output data (true label/golden standard). percentage_training: Number between 0 and 1 that indicates the

percentage of data that should be in the training data set (default = 0.75).

Returns: Tuple containing (x_train, y_train, x_test, y_test)

  • x_train: Input variables of the training data.

  • y_train: Output variables (true label/golden standard) of the training data.

  • x_test: Input variables of the test data.

  • y_test: Output variables (true label/golden standard) of the test data.

kfold(data_length, number_of_folds=10)
Provides the user with indices for ‘k’ number of folds for the training

and testing of the model.

  • data_length – The total number of instances in the data sets (number of rows).

  • number_of_folds – The number of folds the data should be split in (default = 10)


A list with k (non-overlapping) sublists each containing the indices for one fold.