Data Splitter¶
-
class
pyfume.Splitter.
DataSplitter
¶ Bases:
object
- Creates an object that can (provide the indices to) split the data in a
training and test for model validation.
-
holdout
(dataX, dataY, percentage_training=0.75)¶ Splits the data in a training and test set using the hold-out method.
- Args:
dataX: The input data. dataY: The output data (true label/golden standard). percentage_training: Number between 0 and 1 that indicates the
percentage of data that should be in the training data set (default = 0.75).
Returns: Tuple containing (x_train, y_train, x_test, y_test)
x_train: Input variables of the training data.
y_train: Output variables (true label/golden standard) of the training data.
x_test: Input variables of the test data.
y_test: Output variables (true label/golden standard) of the test data.
-
kfold
(data_length, number_of_folds=10)¶ - Provides the user with indices for ‘k’ number of folds for the training
and testing of the model.
- Parameters
data_length – The total number of instances in the data sets (number of rows).
number_of_folds – The number of folds the data should be split in (default = 10)
- Returns
A list with k (non-overlapping) sublists each containing the indices for one fold.