fklearn.causal package

Submodules

fklearn.causal.debias module

fklearn.causal.debias.debias_with_double_ml[source]

Frisch-Waugh-Lovell style debiasing with ML model. To debias, we

  1. fit a regression ml model to predict the treatment from the confounders and take out of fold residuals from
this fit (debias step)
  1. fit a regression ml model to predict the outcome from the confounders and take the out of fold residuals from
this fit (denoise step).

We then add back the average outcome and treatment so that their levels remain unchanged.

Returns a dataframe with the debiased columns with suffix appended to the name

Parameters:
  • df (Pandas DataFrame) – A Pandas’ DataFrame with with treatment, outcome and confounder columns
  • treatment_column (str) – The name of the column in df with the treatment.
  • outcome_column (str) – The name of the column in df with the outcome.
  • confounder_columns (list of str) – A list of confounder present in df
  • ml_regressor (Sklearn's RegressorMixin) – A regressor model that implements a fit and a predict method
  • extra_params (dict) – The hyper-parameters for the model
  • cv (int) – The number of folds to cross predict
  • suffix (str) – A suffix to append to the returning debiased column names.
  • denoise (bool (Default=True)) – If it should denoise the outcome using the confounders or not
  • seed (int) – A seed for consistency in random computation
Returns:

debiased_df – The original df dataframe with debiased columns added.

Return type:

Pandas DataFrame

fklearn.causal.debias.debias_with_fixed_effects[source]

Returns a dataframe with the debiased columns with suffix appended to the name

This is equivalent of debiasing with regression where the forumla is “C(x1) + C(x2) + …”. However, it is much more eficient than runing such a dummy variable regression.

Parameters:
  • df (Pandas DataFrame) – A Pandas’ DataFrame with with treatment, outcome and confounder columns
  • treatment_column (str) – The name of the column in df with the treatment.
  • outcome_column (str) – The name of the column in df with the outcome.
  • confounder_columns (list of str) – Confounders are categorical groups we wish to explain away. Some examples are units (ex: customers), and time (day, months…). We perform a group by on these columns, so they should not be continuous variables.
  • suffix (str) – A suffix to append to the returning debiased column names.
  • denoise (bool (Default=True)) – If it should denoise the outcome using the confounders or not
Returns:

debiased_df – The original df dataframe with debiased columns added.

Return type:

Pandas DataFrame

fklearn.causal.debias.debias_with_regression[source]

Frisch-Waugh-Lovell style debiasing with linear regression. To debias, we

1) fit a linear model to predict the treatment from the confounders and take the residuals from this fit (debias step) 2) fit a linear model to predict the outcome from the confounders and take the residuals from this fit (denoise step).

We then add back the average outcome and treatment so that their levels remain unchanged.

Returns a dataframe with the debiased columns with suffix appended to the name

Parameters:
  • df (Pandas DataFrame) – A Pandas’ DataFrame with with treatment, outcome and confounder columns
  • treatment_column (str) – The name of the column in df with the treatment.
  • outcome_column (str) – The name of the column in df with the outcome.
  • confounder_columns (list of str) – A list of confounder present in df
  • suffix (str) – A suffix to append to the returning debiased column names.
  • denoise (bool (Default=True)) – If it should denoise the outcome using the confounders or not
Returns:

debiased_df – The original df dataframe with debiased columns added.

Return type:

Pandas DataFrame

fklearn.causal.debias.debias_with_regression_formula[source]

Frisch-Waugh-Lovell style debiasing with linear regression. With R formula to define confounders. To debias, we

1) fit a linear model to predict the treatment from the confounders and take the residuals from this fit (debias step) 2) fit a linear model to predict the outcome from the confounders and take the residuals from this fit (denoise step).

We then add back the average outcome and treatment so that their levels remain unchanged.

Returns a dataframe with the debiased columns with suffix appended to the name

Parameters:
  • df (Pandas DataFrame) – A Pandas’ DataFrame with with treatment, outcome and confounder columns
  • treatment_column (str) – The name of the column in df with the treatment.
  • outcome_column (str) – The name of the column in df with the outcome.
  • confounder_formula (str) – An R formula modeling the confounders. Check https://www.statsmodels.org/dev/example_formulas.html for examples.
  • suffix (str) – A suffix to append to the returning debiased column names.
  • denoise (bool (Default=True)) – If it should denoise the outcome using the confounders or not
Returns:

debiased_df – The original df dataframe with debiased columns added.

Return type:

Pandas DataFrame

fklearn.causal.effects module

fklearn.causal.effects.exponential_coefficient_effect[source]

Computes the exponential coefficient between the treatment and the outcome. Finds a1 in the following equation outcome = exp(a0 + a1 treatment) + error

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment_column (str) – The name of the treatment column in df.
  • outcome_column (str) – The name of the outcome column in df.
Returns:

effect – The exponential coefficient between the treatment and the outcome

Return type:

float

fklearn.causal.effects.linear_effect[source]

cov(outcome, treatment)/var(treatment)

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment_column (str) – The name of the treatment column in df.
  • outcome_column (str) – The name of the outcome column in df.
Returns:

effect – The linear coefficient from regressing the outcome on the treatment: cov(outcome, treatment)/var(treatment)

Return type:

float

Type:

Computes the linear coefficient from regressing the outcome on the treatment

fklearn.causal.effects.logistic_coefficient_effect[source]

Computes the logistic coefficient between the treatment and the outcome. Finds a1 in the following equation outcome = logistic(a0 + a1 treatment)

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment_column (str) – The name of the treatment column in df.
  • outcome_column (str) – The name of the outcome column in df.
Returns:

effect – The logistic coefficient between the treatment and the outcome

Return type:

float

fklearn.causal.effects.pearson_effect[source]

Computes the Pearson correlation between the treatment and the outcome

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment_column (str) – The name of the treatment column in df.
  • outcome_column (str) – The name of the outcome column in df.
Returns:

effect – The Pearson correlation between the treatment and the outcome

Return type:

float

fklearn.causal.effects.spearman_effect[source]

Computes the Spearman correlation between the treatment and the outcome

Parameters:
  • df (Pandas' DataFrame) – A Pandas’ DataFrame with target and prediction scores.
  • treatment_column (str) – The name of the treatment column in df.
  • outcome_column (str) – The name of the outcome column in df.
Returns:

effect – The Spearman correlation between the treatment and the outcome

Return type:

float

Module contents