fklearn.tuning package¶

Submodules¶

fklearn.tuning.model_agnostic_fc module¶

fklearn.tuning.model_agnostic_fc.correlation_feature_selection[source]¶

Feature selection based on correlation

Parameters:	train_set (pd.DataFrame) – A Pandas’ DataFrame with the training data features (list of str) – The list of features to consider when dropping with correlation threshold (float) – The correlation threshold. Will drop features with correlation equal or above this threshold
Returns:
Return type:	log with feature correlation, features to drop and final features

fklearn.tuning.model_agnostic_fc.variance_feature_selection[source]¶

Feature selection based on variance

Parameters:	train_set (pd.DataFrame) – A Pandas’ DataFrame with the training data features (list of str) – The list of features to consider when dropping with variance threshold (float) – The variance threshold. Will drop features with variance equal or bellow this threshold
Returns:
Return type:	log with feature variance, features to drop and final features

fklearn.tuning.parameter_tuners module¶

fklearn.tuning.samplers module¶

fklearn.tuning.samplers.remove_by_feature_importance[source]¶

Performs feature selection based on feature importance

Parameters:	log (dict) – Dictionaries evaluations. num_removed_by_step (int (default 5)) – The number of features to remove
Returns:	features – The remaining features after removing based on feature importance
Return type:	list of str

fklearn.tuning.samplers.remove_by_feature_shuffling[source]¶

Performs feature selection based on the evaluation of the test vs the evaluation of the test with randomly shuffled features

Parameters:	log (LogType) – Dictionaries evaluations. predict_fn (function pandas.DataFrame -> pandas.DataFrame) – A partially defined predictor that takes a DataFrame and returns the predicted score for this dataframe eval_fn (function DataFrame -> log dict) – A partially defined evaluation function that takes a dataset with prediction and returns the evaluation logs. eval_data (pandas.DataFrame) – Data used to evaluate the model after shuffling extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict metric_name (str) – String with the name of the column that refers to the metric column to be extracted max_removed_by_step (int (default 5)) – The maximum number of features to remove. It will only consider the least max_removed_by_step in terms of feature importance. If speed_up_by_importance=True it will first filter the least relevant feature an shuffle only those. If speed_up_by_importance=False it will shuffle all features and drop the last max_removed_by_step in terms of PIMP. In both cases, the features will only be removed if drop in performance is up to the defined threshold. threshold (float (default 0.005)) – Threshold for model performance comparison speed_up_by_importance (bool (default True)) – If it should narrow search looking at feature importance first before getting PIMP importance. If True, will only shuffle the top num_removed_by_step in terms of feature importance. parallel (bool (default False)) – nthread (int (default 1)) – seed (int (default 7)) – Random seed
Returns:	features – The remaining features after removing based on feature importance
Return type:	list of str

fklearn.tuning.samplers.remove_features_subsets[source]¶

Performs feature selection based on the best performing model out of several trained models

Parameters:	log_list (list of dict) – A list of log-like lists of dictionaries evaluations. extractor (function string -> float) – A extractor that take a string and returns the value of that string on a dict metric_name (str) – String with the name of the column that refers to the metric column to be extracted num_removed_by_step (int (default 1)) – The number of features to remove
Returns:	keys – The remaining keys of feature sets after choosing the current best subset
Return type:	list of str

fklearn.tuning.selectors module¶

fklearn.tuning.stoppers module¶

fklearn.tuning.stoppers.aggregate_stop_funcs(*stop_funcs) → Callable[[List[List[Dict[str, Any]]]], bool][source]¶

Aggregate stop functions

Parameters:	stop_funcs (list of function list of dict -> bool) –
Returns:	l – Function that performs the Or logic of all stop_fn applied to the logs
Return type:	function logs -> bool

fklearn.tuning.stoppers.stop_by_iter_num[source]¶

Checks for logs to see if feature selection should stop

Parameters:	logs (list of list of dict) – A list of log-like lists of dictionaries evaluations. iter_limit (int (default 50)) – Limit of Iterations
Returns:	stop – A boolean whether to stop recursion or not
Return type:	bool

fklearn.tuning.stoppers.stop_by_no_improvement[source]¶

Checks for logs to see if feature selection should stop

Parameters:	logs (list of list of dict) – A list of log-like lists of dictionaries evaluations. extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict metric_name (str) – String with the name of the column that refers to the metric column to be extracted early_stop (int (default 3)) – Number of iteration without improval before stopping threshold (float (default 0.001)) – Threshold for model performance comparison
Returns:	stop – A boolean whether to stop recursion or not
Return type:	bool

fklearn.tuning.stoppers.stop_by_no_improvement_parallel[source]¶

Checks for logs to see if feature selection should stop

Parameters:	logs (list of list of dict) – A list of log-like lists of dictionaries evaluations. extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict metric_name (str) – String with the name of the column that refers to the metric column to be extracted early_stop (int (default 3)) – Number of iterations without improvements before stopping threshold (float (default 0.001)) – Threshold for model performance comparison
Returns:	stop – A boolean whether to stop recursion or not
Return type:	bool

fklearn.tuning.stoppers.stop_by_num_features[source]¶

Checks for logs to see if feature selection should stop

Parameters:	logs (list of list of dict) – A list of log-like lists of dictionaries evaluations. min_num_features (int (default 50)) – The minimun number of features the model can have before stopping
Returns:	stop – A boolean whether to stop recursion or not
Return type:	bool

fklearn.tuning.stoppers.stop_by_num_features_parallel[source]¶

Selects the best log out of a list to see if feature selection should stop

Parameters:	logs (list of list of list of dict) – A list of log-like lists of dictionaries evaluations. extractor (function str -> float) – A extractor that take a string and returns the value of that string on a dict metric_name (str) – String with the name of the column that refers to the metric column to be extracted min_num_features (int (default 50)) – The minimun number of features the model can have before stopping
Returns:	stop – A boolean whether to stop recursion or not
Return type:	bool

fklearn.tuning.utils module¶

fklearn.tuning.utils.gen_dict_extract(key: str, obj: Dict) → Generator[Any, None, None][source]¶

fklearn.tuning.utils.gen_key_avgs_from_dicts(obj: List) → Dict[str, float][source]¶

fklearn.tuning.utils.gen_key_avgs_from_iteration(key: str, log: Dict) → Any[source]¶

fklearn.tuning.utils.gen_key_avgs_from_logs(key: str, logs: List[Dict]) → Dict[str, float][source]¶

fklearn.tuning.utils.gen_validator_log[source]¶

fklearn.tuning.utils.get_avg_metric_from_extractor[source]¶

fklearn.tuning.utils.get_best_performing_log(log_list: List[Dict[str, Any]], extractor: Callable[[str], float], metric_name: str) → Dict[source]¶

fklearn.tuning.utils.get_used_features(log: Dict) → List[str][source]¶

fklearn.tuning.utils.order_feature_importance_avg_from_logs(log: Dict) → List[str][source]¶

fklearn.tuning package¶

Submodules¶

fklearn.tuning.model_agnostic_fc module¶

fklearn.tuning.parameter_tuners module¶

fklearn.tuning.samplers module¶

fklearn.tuning.selectors module¶

fklearn.tuning.stoppers module¶

fklearn.tuning.utils module¶

Module contents¶