fklearn.causal.cate_learning package

Submodules

fklearn.causal.cate_learning.double_machine_learning module

fklearn.causal.cate_learning.double_machine_learning.non_parametric_double_ml_learner[source]

Fits an Non-Parametric Double/ML Meta Learner for Conditional Average Treatment Effect Estimation. It implements the following steps: 1) fits k instances of the debias model to predict the treatment from the features and get out-of-fold residuals

t_res=t-t_hat;
  1. fits k instances of the denoise model to predict the outcome from the features and get out-of-fold residuals
    y_res=y-y_hat;
  2. fits a final ML model to predict y_res / t_res from the features using weighted regression with weights set to
    t_res^2. Trained like this, the final model will output treatment effect predictions.
Parameters:
  • df (pandas.DataFrame) – A Pandas’ DataFrame with features, treatment and target columns. The model will be trained to predict the target column from the features.
  • feature_columns (list of str) –
    A list os column names that are used as features for the denoise, debias and final models in double-ml. All
    this names should be in df.
  • treatment_column (str) –
    The name of the column in df that should be used as treatment for the double-ml model. It will learn the
    impact of this column with respect to the outcome column.
  • outcome_column (str) – The name of the column in df that should be used as outcome for the double-ml model. It will learn the impact of the treatment column on this outcome column.
  • debias_model (RegressorMixin (default None)) – The estimator for fitting the treatment from the features. Must implement fit and predict methods. It can be an scikit-learn regressor. When None, defaults to GradientBoostingRegressor.
  • debias_feature_columns (list of str (default None)) – A list os column names to be used only for the debias model. If not None, it will replace feature_columns when fitting the debias model.
  • denoise_model (RegressorMixin (default None)) – The estimator for fitting the outcome from the features. Must implement fit and predict methods. It can be an scikit-learn regressor. When None, defaults to GradientBoostingRegressor.
  • denoise_feature_columns (list of str (default None)) – A list os column names to be used only for the denoise model. If not None, it will replace feature_columns when fitting the denoise model.
  • final_model (RegressorMixin (default None)) – The estimator for fitting the outcome residuals from the treatment residuals. Must implement fit and predict methods. It can be an arbitrary scikit-learn regressor. The fit method must accept sample_weight as a keyword argument. When None, defaults to GradientBoostingRegressor.
  • final_model_feature_columns (list of str (default None)) – A list os column names to be used only for the final model. If not None, it will replace feature_columns when fitting the final model.
  • prediction_column (str (default "prediction")) – The name of the column with the treatment effect predictions from the final model.
  • cv_splits (int (default 2)) – Number of folds to split the training data when fitting the debias and denoise models
  • encode_extra_cols (bool (default: True)) – If True, treats all columns in df with name pattern fklearn_feat__col==val` as feature columns.
Returns:

  • p (function pandas.DataFrame -> pandas.DataFrame) – A function that when applied to a DataFrame with the same columns as df returns a new DataFrame with a new column with predictions from the model.
  • new_df (pandas.DataFrame) – A df-like DataFrame with the same columns as the input df plus a column with predictions from the model.
  • log (dict) – A log-like Dict that stores information of the Non Parametric Double/ML model.

fklearn.causal.cate_learning.meta_learners module

fklearn.causal.cate_learning.meta_learners.causal_s_classification_learner[source]

Fits a Causal S-Learner classifier. The S-learner is a meta-learner which learns the Conditional Average Treatment Effect (CATE) through the creation of an auxiliary binary feature T that indicates if the samples is in the treatment (T = 1) or in the control (T = 0) group. Then, this feature can then be used to perform inference by artificially simulating the conversion of a new sample for both scenarios, i.e., with T = 0 and T = 1. The CATE τ is defined as τ(xi) = M(X=xi, T=1) - M(X=xi, T=0), being M a Machine Learning Model.

References:

[1] https://matheusfacure.github.io/python-causality-handbook/21-Meta-Learners.html

[2] https://causalml.readthedocs.io/en/latest/methodology.html

Parameters:
  • df (pd.DataFrame) – A Pandas’ DataFrame with features and target columns. The model will be trained to predict the target column from the features.
  • treatment_col (str) – The name of the column in df which contains the names of the treatments or control to which each data sample was subjected.
  • control_name (str) – The name of the control group.
  • prediction_column (str) – The name of the column with the predictions from the provided learner.
  • learner (Callable) – A fklearn classification learner function.
  • learner_transformers (list) – A list of fklearn transformer functions to be applied after the learner and before estimating the CATE. This parameter may be useful, for example, to estimate the CATE with calibrated classifiers.
Returns:

  • p (function pandas.DataFrame -> pandas.DataFrame) – A function that when applied to a DataFrame with the same columns as df returns a new DataFrame with a new column with predictions from the model.
  • new_df (pandas.DataFrame) – A df-like DataFrame with the same columns as the input df plus a column with predictions from the model.
  • log (dict) – A log-like Dict that stores information of the Causal S-Learner Classifier model.

fklearn.causal.cate_learning.meta_learners.causal_t_classification_learner[source]

Fits a Causal T-Learner classifier. The T-Learner is a meta-learner which learns the Conditional Average Treatment Effect (CATE) through the use of one Machine Learning model for each treatment and for the control group. Each model is fitted in a subset of the data, according to the treatment: the CATE $ au$ is defined as $ au(x_{i}) = M_{1}(X=x_{i}, T=1) - M_{0}(X=x_{i}, T=0)$, being $M_{1}$ a model fitted with treatment data and $M_{0}$ a model fitted with control data. Notice that $M_{0}$ and $M_{1}$ are traditional Machine Learning models such as a LightGBM Classifier and that $x_{i}$ is the feature set of sample $i$.

References:

[1] https://matheusfacure.github.io/python-causality-handbook/21-Meta-Learners.html

[2] https://causalml.readthedocs.io/en/latest/methodology.html

Parameters:
  • df (pd.DataFrame) – A Pandas’ DataFrame with features and target columns. The model will be trained to predict the target column from the features.
  • treatment_col (str) – The name of the column in df which contains the names of the treatments and control to which each data sample was subjected.
  • control_name (str) – The name of the control group.
  • prediction_column (str) – The name of the column with the predictions from the provided learner.
  • learner (LearnerFnType) – A fklearn classification learner function.
  • treatment_learner (LearnerFnType) – An optional fklearn classification learner function.
  • learner_transformers (List[LearnerFnType]) – A list of fklearn transformer functions to be applied after the learner and before estimating the CATE. This parameter may be useful, for example, to estimate the CATE with calibrated classifiers.
Returns:

  • p (function pandas.DataFrame -> pandas.DataFrame) – A function that when applied to a DataFrame with the same columns as df returns a new DataFrame with a new column with predictions from the model.
  • new_df (pandas.DataFrame) – A df-like DataFrame with the same columns as the input df plus a column with predictions from the model.
  • log (dict) – A log-like Dict that stores information of the Causal T-Learner Classifier model.

Module contents