fklearn.causal.validation package

Submodules

fklearn.causal.validation.auc module

fklearn.causal.validation.auc.area_under_the_cumulative_effect_curve[source]

Orders the dataset by prediction and computes the area under the cumulative effect curve, according to that ordering.

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment (str) – The name of the treatment column in df.
  • outcome (Strings) – The name of the outcome column in df.
  • prediction (Strings) – The name of the prediction column in df.
  • min_rows (int) – Minimum number of observations needed to have a valid result.
  • steps (Integer) – The number of cumulative steps to iterate when accumulating the effect
  • effect_fn (function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int) – A function that computes the treatment effect given a dataframe, the name of the treatment column and the name of the outcome column.
Returns:

area_under_the_cumulative_gain_curve – The area under the cumulative gain curve according to the predictions ordering.

Return type:

float

fklearn.causal.validation.auc.area_under_the_cumulative_gain_curve[source]

Orders the dataset by prediction and computes the area under the cumulative gain curve, according to that ordering.

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment (Strings) – The name of the treatment column in df.
  • outcome (Strings) – The name of the outcome column in df.
  • prediction (Strings) – The name of the prediction column in df.
  • min_rows (Integer) – Minimum number of observations needed to have a valid result.
  • steps (Integer) – The number of cumulative steps to iterate when accumulating the effect
  • effect_fn (function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int) – A function that computes the treatment effect given a dataframe, the name of the treatment column and the name of the outcome column.
Returns:

area_under_the_cumulative_gain_curve – The area under the cumulative gain curve according to the predictions ordering.

Return type:

float

fklearn.causal.validation.auc.area_under_the_relative_cumulative_gain_curve[source]
Orders the dataset by prediction and computes the area under the relative cumulative gain curve, according to that
ordering.
Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment (Strings) – The name of the treatment column in df.
  • outcome (Strings) – The name of the outcome column in df.
  • prediction (Strings) – The name of the prediction column in df.
  • min_rows (Integer) – Minimum number of observations needed to have a valid result.
  • steps (Integer) – The number of cumulative steps to iterate when accumulating the effect
  • effect_fn (function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int) – A function that computes the treatment effect given a dataframe, the name of the treatment column and the name of the outcome column.
Returns:

area under the relative cumulative gain curve – The area under the relative cumulative gain curve according to the predictions ordering.

Return type:

float

fklearn.causal.validation.cate module

fklearn.causal.validation.cate.cate_mean_by_bin(test_data: pandas.core.frame.DataFrame, group_column: str, control_group_name: str, bin_column: str, n_bins: int, allow_dropped_bins: bool, prediction_column: str, target_column: str) → pandas.core.frame.DataFrame[source]

Computes a dataframe with predicted and actual CATEs by bins of a given column.

This is primarily an auxiliary function, but can be used to visualize the CATEs.

Parameters:
  • test_data (DataFrame) – A Pandas’ DataFrame with group_column as a column.
  • group_column (str) – The name of the column that tells whether rows belong to the test or control group.
  • control_group_name (str) – The name of the control group.
  • bin_column (str) – The name of the column from which the quantiles will be created.
  • n_bins (str) – The number of bins to be created.
  • allow_dropped_bins (bool) – Whether to allow the function to drop duplicated quantiles.
  • prediction_column (str) – The name of the column containing the predictions from the model being evaluated.
  • target_column (str) – The name of the column containing the actual outcomes of the treatment.
Returns:

gb – The grouped dataframe with actual and predicted CATEs by bin.

Return type:

DataFrame

fklearn.causal.validation.cate.cate_mean_by_bin_meta_evaluator[source]

Evaluates the predictions of a causal model that outputs treatment outcomes w.r.t. its capabilities to predict the CATE.

Due to the fundamental lack of counterfactual data, the CATEs are computed for bins of a given column. This function then applies a fklearn-like evaluator on top of the aggregated dataframe.

Parameters:
  • test_data (DataFrame) – A Pandas’ DataFrame with group_column as a column.
  • group_column (str) – The name of the column that tells whether rows belong to the test or control group.
  • control_group_name (str) – The name of the control group.
  • bin_column (str) – The name of the column from which the quantiles will be created.
  • n_bins (str) – The number of bins to be created.
  • allow_dropped_bins (bool, optional (default=False)) – Whether to allow the function to drop duplicated quantiles.
  • inner_evaluator (UncurriedEvalFnType, optional (default=r2_evaluator)) – An instance of a fklearn-like evaluator, which will be applied to the .
  • eval_name (str, optional (default=None)) – The name of the evaluator as it will appear in the logs.
  • prediction_column (str, optional (default=None)) – The name of the column containing the predictions from the model being evaluated.
  • target_column (str, optional (default=None)) – The name of the column containing the actual outcomes of the treatment.
Returns:

log – A log-like dictionary with the evaluation by inner_evaluator

Return type:

dict

fklearn.causal.validation.curves module

fklearn.causal.validation.curves.cumulative_effect_curve[source]

Orders the dataset by prediction and computes the cumulative effect curve according to that ordering

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment (Strings) – The name of the treatment column in df.
  • outcome (Strings) – The name of the outcome column in df.
  • prediction (Strings) – The name of the prediction column in df.
  • min_rows (Integer) – Minimum number of observations needed to have a valid result.
  • steps (Integer) – The number of cumulative steps to iterate when accumulating the effect
  • effect_fn (function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int) – A function that computes the treatment effect given a dataframe, the name of the treatment column and the name of the outcome column.
Returns:

cumulative effect curve – The cumulative treatment effect according to the predictions ordering.

Return type:

Numpy’s Array

fklearn.causal.validation.curves.cumulative_gain_curve[source]
Orders the dataset by prediction and computes the cumulative gain (effect * proportional sample size) curve
according to that ordering.
Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment (Strings) – The name of the treatment column in df.
  • outcome (Strings) – The name of the outcome column in df.
  • prediction (Strings) – The name of the prediction column in df.
  • min_rows (Integer) – Minimum number of observations needed to have a valid result.
  • steps (Integer) – The number of cumulative steps to iterate when accumulating the effect
  • effect_fn (function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int) – A function that computes the treatment effect given a dataframe, the name of the treatment column and the name of the outcome column.
Returns:

cumulative gain curve – The cumulative gain according to the predictions ordering.

Return type:

float

fklearn.causal.validation.curves.effect_by_segment[source]

Segments the dataset by a prediction’s quantile and estimates the treatment effect by segment.

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment (Strings) – The name of the treatment column in df.
  • outcome (Strings) – The name of the outcome column in df.
  • prediction (Strings) – The name of the prediction column in df.
  • segments (Integer) – The number of the segments to create. Uses Pandas’ qcut under the hood.
  • effect_fn (function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int) – A function that computes the treatment effect given a dataframe, the name of the treatment column and the name of the outcome column.
Returns:

effect by band – The effect stored in a Pandas’ series were the indexes are the segments

Return type:

Pandas’ Series

fklearn.causal.validation.curves.effect_curves[source]

cumulative effect, cumulative gain and relative cumulative gain. The dataset also contains two columns referencing the data used to compute the curves at each step: number of samples and fraction of samples used. Moreover one column indicating the cumulative gain for a corresponding random model is also included as a benchmark.

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment (Strings) – The name of the treatment column in df.
  • outcome (Strings) – The name of the outcome column in df.
  • prediction (Strings) – The name of the prediction column in df.
  • min_rows (Integer) – Minimum number of observations needed to have a valid result.
  • steps (Integer) – The number of cumulative steps to iterate when accumulating the effect
  • effect_fn (function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int) – A function that computes the treatment effect given a dataframe, the name of the treatment column and the name of the outcome column.
Returns:

summary curves dataset – The dataset with the results for multiple validation causal curves according to the predictions ordering.

Return type:

pd.DataFrame

Type:

Creates a dataset summarizing the effect curves

fklearn.causal.validation.curves.relative_cumulative_gain_curve[source]

Orders the dataset by prediction and computes the relative cumulative gain curve curve according to that ordering. The relative gain is simply the cumulative effect minus the Average Treatment Effect (ATE) times the relative sample size.

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment (Strings) – The name of the treatment column in df.
  • outcome (Strings) – The name of the outcome column in df.
  • prediction (Strings) – The name of the prediction column in df.
  • min_rows (Integer) – Minimum number of observations needed to have a valid result.
  • steps (Integer) – The number of cumulative steps to iterate when accumulating the effect
  • effect_fn (function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int) – A function that computes the treatment effect given a dataframe, the name of the treatment column and the name of the outcome column.
Returns:

relative cumulative gain curve – The relative cumulative gain according to the predictions ordering.

Return type:

float

Module contents