fklearn.tuning package¶
Submodules¶
fklearn.tuning.model_agnostic_fc module¶
-
fklearn.tuning.model_agnostic_fc.
correlation_feature_selection
[source]¶ Feature selection based on correlation
Parameters: - train_set (pd.DataFrame) – A Pandas’ DataFrame with the training data
- features (list of str) – The list of features to consider when dropping with correlation
- threshold (float) – The correlation threshold. Will drop features with correlation equal or above this threshold
Returns: Return type: log with feature correlation, features to drop and final features
-
fklearn.tuning.model_agnostic_fc.
variance_feature_selection
[source]¶ Feature selection based on variance
Parameters: - train_set (pd.DataFrame) – A Pandas’ DataFrame with the training data
- features (list of str) – The list of features to consider when dropping with variance
- threshold (float) – The variance threshold. Will drop features with variance equal or bellow this threshold
Returns: Return type: log with feature variance, features to drop and final features
fklearn.tuning.parameter_tuners module¶
fklearn.tuning.samplers module¶
-
fklearn.tuning.samplers.
remove_by_feature_importance
[source]¶ Performs feature selection based on feature importance
Parameters: - log (dict) – Dictionaries evaluations.
- num_removed_by_step (int (default 5)) – The number of features to remove
Returns: features – The remaining features after removing based on feature importance
Return type: list of str
-
fklearn.tuning.samplers.
remove_by_feature_shuffling
[source]¶ Performs feature selection based on the evaluation of the test vs the evaluation of the test with randomly shuffled features
Parameters: - log (LogType) – Dictionaries evaluations.
- predict_fn (function pandas.DataFrame -> pandas.DataFrame) – A partially defined predictor that takes a DataFrame and returns the predicted score for this dataframe
- eval_fn (function DataFrame -> log dict) – A partially defined evaluation function that takes a dataset with prediction and returns the evaluation logs.
- eval_data (pandas.DataFrame) – Data used to evaluate the model after shuffling
- extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict
- metric_name (str) – String with the name of the column that refers to the metric column to be extracted
- max_removed_by_step (int (default 5)) – The maximum number of features to remove. It will only consider the least max_removed_by_step in terms of feature importance. If speed_up_by_importance=True it will first filter the least relevant feature an shuffle only those. If speed_up_by_importance=False it will shuffle all features and drop the last max_removed_by_step in terms of PIMP. In both cases, the features will only be removed if drop in performance is up to the defined threshold.
- threshold (float (default 0.005)) – Threshold for model performance comparison
- speed_up_by_importance (bool (default True)) – If it should narrow search looking at feature importance first before getting PIMP importance. If True, will only shuffle the top num_removed_by_step in terms of feature importance.
- parallel (bool (default False)) –
- nthread (int (default 1)) –
- seed (int (default 7)) – Random seed
Returns: features – The remaining features after removing based on feature importance
Return type: list of str
-
fklearn.tuning.samplers.
remove_features_subsets
[source]¶ Performs feature selection based on the best performing model out of several trained models
Parameters: - log_list (list of dict) – A list of log-like lists of dictionaries evaluations.
- extractor (function string -> float) – A extractor that take a string and returns the value of that string on a dict
- metric_name (str) – String with the name of the column that refers to the metric column to be extracted
- num_removed_by_step (int (default 1)) – The number of features to remove
Returns: keys – The remaining keys of feature sets after choosing the current best subset
Return type: list of str
fklearn.tuning.selectors module¶
fklearn.tuning.stoppers module¶
-
fklearn.tuning.stoppers.
aggregate_stop_funcs
(*stop_funcs) → Callable[[List[List[Dict[str, Any]]]], bool][source]¶ Aggregate stop functions
Parameters: stop_funcs (list of function list of dict -> bool) – Returns: l – Function that performs the Or logic of all stop_fn applied to the logs Return type: function logs -> bool
-
fklearn.tuning.stoppers.
stop_by_iter_num
[source]¶ Checks for logs to see if feature selection should stop
Parameters: - logs (list of list of dict) – A list of log-like lists of dictionaries evaluations.
- iter_limit (int (default 50)) – Limit of Iterations
Returns: stop – A boolean whether to stop recursion or not
Return type: bool
-
fklearn.tuning.stoppers.
stop_by_no_improvement
[source]¶ Checks for logs to see if feature selection should stop
Parameters: - logs (list of list of dict) – A list of log-like lists of dictionaries evaluations.
- extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict
- metric_name (str) – String with the name of the column that refers to the metric column to be extracted
- early_stop (int (default 3)) – Number of iteration without improval before stopping
- threshold (float (default 0.001)) – Threshold for model performance comparison
Returns: stop – A boolean whether to stop recursion or not
Return type: bool
-
fklearn.tuning.stoppers.
stop_by_no_improvement_parallel
[source]¶ Checks for logs to see if feature selection should stop
Parameters: - logs (list of list of dict) – A list of log-like lists of dictionaries evaluations.
- extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict
- metric_name (str) – String with the name of the column that refers to the metric column to be extracted
- early_stop (int (default 3)) – Number of iterations without improvements before stopping
- threshold (float (default 0.001)) – Threshold for model performance comparison
Returns: stop – A boolean whether to stop recursion or not
Return type: bool
-
fklearn.tuning.stoppers.
stop_by_num_features
[source]¶ Checks for logs to see if feature selection should stop
Parameters: - logs (list of list of dict) – A list of log-like lists of dictionaries evaluations.
- min_num_features (int (default 50)) – The minimun number of features the model can have before stopping
Returns: stop – A boolean whether to stop recursion or not
Return type: bool
-
fklearn.tuning.stoppers.
stop_by_num_features_parallel
[source]¶ Selects the best log out of a list to see if feature selection should stop
Parameters: - logs (list of list of list of dict) – A list of log-like lists of dictionaries evaluations.
- extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict
- metric_name (str) – String with the name of the column that refers to the metric column to be extracted
- min_num_features (int (default 50)) – The minimun number of features the model can have before stopping
Returns: stop – A boolean whether to stop recursion or not
Return type: bool