Data Splitter¶

class pyfume.Splitter.DataSplitter¶

Bases: object

Creates an object that can (provide the indices to) split the data in a: training and test for model validation.

holdout(dataX, dataY, percentage_training=0.75)¶

Splits the data in a training and test set using the hold-out method.

Args:
dataX: The input data. dataY: The output data (true label/golden standard). percentage_training: Number between 0 and 1 that indicates the

percentage of data that should be in the training data set (default = 0.75).

Returns: Tuple containing (x_train, y_train, x_test, y_test)

x_train: Input variables of the training data.

y_train: Output variables (true label/golden standard) of the training data.

x_test: Input variables of the test data.

y_test: Output variables (true label/golden standard) of the test data.

kfold(data_length, number_of_folds=10)¶

Provides the user with indices for ‘k’ number of folds for the training: and testing of the model.

Parameters

data_length – The total number of instances in the data sets (number of rows).
number_of_folds – The number of folds the data should be split in (default = 10)

Returns

A list with k (non-overlapping) sublists each containing the indices for one fold.