fklearn.tuning package

Submodules

fklearn.tuning.model_agnostic_fc module

fklearn.tuning.model_agnostic_fc.correlation_feature_selection[source]

Feature selection based on correlation

Parameters:
  • train_set (pd.DataFrame) – A Pandas’ DataFrame with the training data
  • features (list of str) – The list of features to consider when dropping with correlation
  • threshold (float) – The correlation threshold. Will drop features with correlation equal or above this threshold
Returns:

Return type:

log with feature correlation, features to drop and final features

fklearn.tuning.model_agnostic_fc.variance_feature_selection[source]

Feature selection based on variance

Parameters:
  • train_set (pd.DataFrame) – A Pandas’ DataFrame with the training data
  • features (list of str) – The list of features to consider when dropping with variance
  • threshold (float) – The variance threshold. Will drop features with variance equal or bellow this threshold
Returns:

Return type:

log with feature variance, features to drop and final features

fklearn.tuning.parameter_tuners module

fklearn.tuning.samplers module

fklearn.tuning.samplers.remove_by_feature_importance[source]

Performs feature selection based on feature importance

Parameters:
  • log (dict) – Dictionaries evaluations.
  • num_removed_by_step (int (default 5)) – The number of features to remove
Returns:

features – The remaining features after removing based on feature importance

Return type:

list of str

fklearn.tuning.samplers.remove_by_feature_shuffling[source]

Performs feature selection based on the evaluation of the test vs the evaluation of the test with randomly shuffled features

Parameters:
  • log (LogType) – Dictionaries evaluations.
  • predict_fn (function pandas.DataFrame -> pandas.DataFrame) – A partially defined predictor that takes a DataFrame and returns the predicted score for this dataframe
  • eval_fn (function DataFrame -> log dict) – A partially defined evaluation function that takes a dataset with prediction and returns the evaluation logs.
  • eval_data (pandas.DataFrame) – Data used to evaluate the model after shuffling
  • extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict
  • metric_name (str) – String with the name of the column that refers to the metric column to be extracted
  • max_removed_by_step (int (default 5)) – The maximum number of features to remove. It will only consider the least max_removed_by_step in terms of feature importance. If speed_up_by_importance=True it will first filter the least relevant feature an shuffle only those. If speed_up_by_importance=False it will shuffle all features and drop the last max_removed_by_step in terms of PIMP. In both cases, the features will only be removed if drop in performance is up to the defined threshold.
  • threshold (float (default 0.005)) – Threshold for model performance comparison
  • speed_up_by_importance (bool (default True)) – If it should narrow search looking at feature importance first before getting PIMP importance. If True, will only shuffle the top num_removed_by_step in terms of feature importance.
  • parallel (bool (default False)) –
  • nthread (int (default 1)) –
  • seed (int (default 7)) – Random seed
Returns:

features – The remaining features after removing based on feature importance

Return type:

list of str

fklearn.tuning.samplers.remove_features_subsets[source]

Performs feature selection based on the best performing model out of several trained models

Parameters:
  • log_list (list of dict) – A list of log-like lists of dictionaries evaluations.
  • extractor (function string -> float) – A extractor that take a string and returns the value of that string on a dict
  • metric_name (str) – String with the name of the column that refers to the metric column to be extracted
  • num_removed_by_step (int (default 1)) – The number of features to remove
Returns:

keys – The remaining keys of feature sets after choosing the current best subset

Return type:

list of str

fklearn.tuning.selectors module

fklearn.tuning.stoppers module

fklearn.tuning.stoppers.aggregate_stop_funcs(*stop_funcs) → Callable[[List[List[Dict[str, Any]]]], bool][source]

Aggregate stop functions

Parameters:stop_funcs (list of function list of dict -> bool) –
Returns:l – Function that performs the Or logic of all stop_fn applied to the logs
Return type:function logs -> bool
fklearn.tuning.stoppers.stop_by_iter_num[source]

Checks for logs to see if feature selection should stop

Parameters:
  • logs (list of list of dict) – A list of log-like lists of dictionaries evaluations.
  • iter_limit (int (default 50)) – Limit of Iterations
Returns:

stop – A boolean whether to stop recursion or not

Return type:

bool

fklearn.tuning.stoppers.stop_by_no_improvement[source]

Checks for logs to see if feature selection should stop

Parameters:
  • logs (list of list of dict) – A list of log-like lists of dictionaries evaluations.
  • extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict
  • metric_name (str) – String with the name of the column that refers to the metric column to be extracted
  • early_stop (int (default 3)) – Number of iteration without improval before stopping
  • threshold (float (default 0.001)) – Threshold for model performance comparison
Returns:

stop – A boolean whether to stop recursion or not

Return type:

bool

fklearn.tuning.stoppers.stop_by_no_improvement_parallel[source]

Checks for logs to see if feature selection should stop

Parameters:
  • logs (list of list of dict) – A list of log-like lists of dictionaries evaluations.
  • extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict
  • metric_name (str) – String with the name of the column that refers to the metric column to be extracted
  • early_stop (int (default 3)) – Number of iterations without improvements before stopping
  • threshold (float (default 0.001)) – Threshold for model performance comparison
Returns:

stop – A boolean whether to stop recursion or not

Return type:

bool

fklearn.tuning.stoppers.stop_by_num_features[source]

Checks for logs to see if feature selection should stop

Parameters:
  • logs (list of list of dict) – A list of log-like lists of dictionaries evaluations.
  • min_num_features (int (default 50)) – The minimun number of features the model can have before stopping
Returns:

stop – A boolean whether to stop recursion or not

Return type:

bool

fklearn.tuning.stoppers.stop_by_num_features_parallel[source]

Selects the best log out of a list to see if feature selection should stop

Parameters:
  • logs (list of list of list of dict) – A list of log-like lists of dictionaries evaluations.
  • extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict
  • metric_name (str) – String with the name of the column that refers to the metric column to be extracted
  • min_num_features (int (default 50)) – The minimun number of features the model can have before stopping
Returns:

stop – A boolean whether to stop recursion or not

Return type:

bool

fklearn.tuning.utils module

fklearn.tuning.utils.gen_dict_extract(key: str, obj: Dict) → Generator[Any, None, None][source]
fklearn.tuning.utils.gen_key_avgs_from_dicts(obj: List) → Dict[str, float][source]
fklearn.tuning.utils.gen_key_avgs_from_iteration(key: str, log: Dict) → Any[source]
fklearn.tuning.utils.gen_key_avgs_from_logs(key: str, logs: List[Dict]) → Dict[str, float][source]
fklearn.tuning.utils.gen_validator_log[source]
fklearn.tuning.utils.get_avg_metric_from_extractor[source]
fklearn.tuning.utils.get_best_performing_log(log_list: List[Dict[str, Any]], extractor: Callable[[str], float], metric_name: str) → Dict[source]
fklearn.tuning.utils.get_used_features(log: Dict) → List[str][source]
fklearn.tuning.utils.order_feature_importance_avg_from_logs(log: Dict) → List[str][source]

Module contents