Feature Selector¶
-
class
pyfume.FeatureSelection.
FeatureSelector
(dataX, dataY, nr_clus, variable_names, model_order='first', performance_metric='MAE', verbose=True, **kwargs)¶ Bases:
object
Creates a new feature selection object.
- Parameters
dataX – The input data.
dataY – The output data (true label/golden standard).
nr_clus – Number of clusters that should be identified in the data.
variable_names – Names of the variables
**kwargs – Additional arguments to change settings of the fuzzy model.
-
fst_pso_feature_selection
(max_iter=100, min_clusters=2, max_clusters=10, performance_metric='MAE', **kwargs)¶ Perform feature selection using the FST-PSO [1] variant of the Integer and Categorical PSO (ICPSO) proposed by Strasser and colleagues [2]. ICPSO hybridizes PSO and Estimation of Distribution Algorithm (EDA), which makes it possible to convert a discrete problem to the (real-valued) problem of estimating the distribution vector of a probabilistic model. Each fitness evaluation a random solution is generated according to the probability distribution encoded by the particle. Because the implementation is a variant on FST-PSO, the optimal settings for the PSO are set automatically.
If the number of clusters is set to None, this method simultaneously choses the optimal number of clusters.
[1] Nobile, M. S., Cazzaniga, P., Besozzi, D., Colombo, R., Mauri, G., & Pasi, G. (2018). Fuzzy Self-Tuning PSO: A settings-free algorithm for global optimization. Swarm and evolutionary computation, 39, 70-85.
[2] Strasser, S., Goodman, R., Sheppard, J., & Butcher, S. (2016). A new discrete particle swarm optimization algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 (pp. 53-60).
- Parameters
max_iter – The maximum number of iterations used in the PSO (default = 10).
min_clusters – The minimum number of clusters to be identified in the data set (only
nr_clusters = None) (when) –
max_clusters – The maximum number of clusters to be identified in the data set (only
nr_clusters = None) –
performance_metric – The performance metric on which each solution is evaluated (default
Absolute Error (Mean) –
**kwargs – Additional arguments to change settings of the fuzzy model.
- Returns
- Tuple containing (selected_features, selected_feature_names, optimal_number_clusters)
selected_features: The indices of the selected features.
selected_feature_names: The names of the selected features.
optimal_number_clusters: If initially nr_clusters = None, this argument encodes the optimal number of clusters in the data set. If nr_clusters is not None, the optimal_number_clusters is set to nr_clusters.
-
log_wrapper
(**kwargs)¶ Performs feature selection using the wrapper method while also checking whether .
- Parameters
**kwargs – Additional arguments to change settings of the fuzzy model.
- Returns
- Tuple containing (selected_features, selected_feature_names)
selected_features: The indices of the selected features.
selected_feature_names: The names of the selected features.
-
wrapper
(**kwargs)¶ Performs feature selection using the wrapper method.
- Parameters
**kwargs – Additional arguments to change settings of the fuzzy model.
- Returns
- Tuple containing (selected_features, selected_feature_names)
selected_features: The indices of the selected features.
selected_feature_names: The names of the selected features.