Sampler(train_x, train_y, number_of_bins=5, histogram=False)¶
Creates a new Sampler object that makes it possible to oversample unbalanced data sets to make them more balanced.
train_x – The input data.
train_y – The output data (true label/golden standard) on basis which will be sampled.
number_of_bins – Number of clusters that should be identified in the data.
histogram – True/False flag that determines whether a histogram of the frequencies of the output data will be plotted of both the old and new (= sampled) situation (default = False). The package ‘matplotlib.pyplot’ is required for this functionality.
Created a more balanced data set by oversampling underrepresented data instances (based on values of the output variable) in the data set.
- Tuple containing (new_train_x, new_train_y)
new_train_x: The oversampled input data metrix.
new_train_y: The oversampled output data matrix.