Feature Selector

class pyfume.FeatureSelection.FeatureSelector(dataX, dataY, nr_clus, variable_names, model_order='first', performance_metric='MAE', verbose=True, **kwargs)

Bases: object

Creates a new feature selection object.

Parameters
  • dataX – The input data.

  • dataY – The output data (true label/golden standard).

  • nr_clus – Number of clusters that should be identified in the data.

  • variable_names – Names of the variables

  • **kwargs – Additional arguments to change settings of the fuzzy model.

fst_pso_feature_selection(max_iter=100, min_clusters=2, max_clusters=10, performance_metric='MAE', **kwargs)

Perform feature selection using the FST-PSO [1] variant of the Integer and Categorical PSO (ICPSO) proposed by Strasser and colleagues [2]. ICPSO hybridizes PSO and Estimation of Distribution Algorithm (EDA), which makes it possible to convert a discrete problem to the (real-valued) problem of estimating the distribution vector of a probabilistic model. Each fitness evaluation a random solution is generated according to the probability distribution encoded by the particle. Because the implementation is a variant on FST-PSO, the optimal settings for the PSO are set automatically.

If the number of clusters is set to None, this method simultaneously choses the optimal number of clusters.

[1] Nobile, M. S., Cazzaniga, P., Besozzi, D., Colombo, R., Mauri, G., & Pasi, G. (2018). Fuzzy Self-Tuning PSO: A settings-free algorithm for global optimization. Swarm and evolutionary computation, 39, 70-85.

[2] Strasser, S., Goodman, R., Sheppard, J., & Butcher, S. (2016). A new discrete particle swarm optimization algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 (pp. 53-60).

Parameters
  • max_iter – The maximum number of iterations used in the PSO (default = 10).

  • min_clusters – The minimum number of clusters to be identified in the data set (only

  • nr_clusters = None) (when) –

  • max_clusters – The maximum number of clusters to be identified in the data set (only

  • nr_clusters = None)

  • performance_metric – The performance metric on which each solution is evaluated (default

  • Absolute Error (Mean) –

  • **kwargs – Additional arguments to change settings of the fuzzy model.

Returns

Tuple containing (selected_features, selected_feature_names, optimal_number_clusters)
  • selected_features: The indices of the selected features.

  • selected_feature_names: The names of the selected features.

  • optimal_number_clusters: If initially nr_clusters = None, this argument encodes the optimal number of clusters in the data set. If nr_clusters is not None, the optimal_number_clusters is set to nr_clusters.

log_wrapper(**kwargs)

Performs feature selection using the wrapper method while also checking whether .

Parameters

**kwargs – Additional arguments to change settings of the fuzzy model.

Returns

Tuple containing (selected_features, selected_feature_names)
  • selected_features: The indices of the selected features.

  • selected_feature_names: The names of the selected features.

wrapper(**kwargs)

Performs feature selection using the wrapper method.

Parameters

**kwargs – Additional arguments to change settings of the fuzzy model.

Returns

Tuple containing (selected_features, selected_feature_names)
  • selected_features: The indices of the selected features.

  • selected_feature_names: The names of the selected features.