fklearn.causal.cate_learning package


fklearn.causal.cate_learning.double_machine_learning module


Fits an Non-Parametric Double/ML Meta Learner for Conditional Average Treatment Effect Estimation. It implements the following steps: 1) fits k instances of the debias model to predict the treatment from the features and get out-of-fold residuals

  1. fits k instances of the denoise model to predict the outcome from the features and get out-of-fold residuals
  2. fits a final ML model to predict y_res / t_res from the features using weighted regression with weights set to
    t_res^2. Trained like this, the final model will output treatment effect predictions.
  • df (pandas.DataFrame) – A Pandas’ DataFrame with features, treatment and target columns. The model will be trained to predict the target column from the features.
  • feature_columns (list of str) –
    A list os column names that are used as features for the denoise, debias and final models in double-ml. All
    this names should be in df.
  • treatment_column (str) –
    The name of the column in df that should be used as treatment for the double-ml model. It will learn the
    impact of this column with respect to the outcome column.
  • outcome_column (str) – The name of the column in df that should be used as outcome for the double-ml model. It will learn the impact of the treatment column on this outcome column.
  • debias_model (RegressorMixin (default None)) – The estimator for fitting the treatment from the features. Must implement fit and predict methods. It can be an scikit-learn regressor. When None, defaults to GradientBoostingRegressor.
  • debias_feature_columns (list of str (default None)) – A list os column names to be used only for the debias model. If not None, it will replace feature_columns when fitting the debias model.
  • denoise_model (RegressorMixin (default None)) – The estimator for fitting the outcome from the features. Must implement fit and predict methods. It can be an scikit-learn regressor. When None, defaults to GradientBoostingRegressor.
  • denoise_feature_columns (list of str (default None)) – A list os column names to be used only for the denoise model. If not None, it will replace feature_columns when fitting the denoise model.
  • final_model (RegressorMixin (default None)) – The estimator for fitting the outcome residuals from the treatment residuals. Must implement fit and predict methods. It can be an arbitrary scikit-learn regressor. The fit method must accept sample_weight as a keyword argument. When None, defaults to GradientBoostingRegressor.
  • final_model_feature_columns (list of str (default None)) – A list os column names to be used only for the final model. If not None, it will replace feature_columns when fitting the final model.
  • prediction_column (str (default "prediction")) – The name of the column with the treatment effect predictions from the final model.
  • cv_splits (int (default 2)) – Number of folds to split the training data when fitting the debias and denoise models
  • encode_extra_cols (bool (default: True)) – If True, treats all columns in df with name pattern fklearn_feat__col==val` as feature columns.

  • p (function pandas.DataFrame -> pandas.DataFrame) – A function that when applied to a DataFrame with the same columns as df returns a new DataFrame with a new column with predictions from the model.
  • new_df (pandas.DataFrame) – A df-like DataFrame with the same columns as the input df plus a column with predictions from the model.
  • log (dict) – A log-like Dict that stores information of the Non Parametric Double/ML model.

fklearn.causal.cate_learning.meta_learners module


Fits a Causal S-Learner classifier. The S-learner is a meta-learner which learns the Conditional Average Treatment Effect (CATE) through the creation of an auxiliary binary feature T that indicates if the samples is in the treatment (T = 1) or in the control (T = 0) group. Then, this feature can then be used to perform inference by artificially simulating the conversion of a new sample for both scenarios, i.e., with T = 0 and T = 1. The CATE τ is defined as τ(xi) = M(X=xi, T=1) - M(X=xi, T=0), being M a Machine Learning Model.




  • df (pd.DataFrame) – A Pandas’ DataFrame with features and target columns. The model will be trained to predict the target column from the features.
  • treatment_col (str) – The name of the column in df which contains the names of the treatments or control to which each data sample was subjected.
  • control_name (str) – The name of the control group.
  • prediction_column (str) – The name of the column with the predictions from the provided learner.
  • learner (Callable) – A fklearn classification learner function.
  • learner_transformers (list) – A list of fklearn transformer functions to be applied after the learner and before estimating the CATE. This parameter may be useful, for example, to estimate the CATE with calibrated classifiers.

  • p (function pandas.DataFrame -> pandas.DataFrame) – A function that when applied to a DataFrame with the same columns as df returns a new DataFrame with a new column with predictions from the model.
  • new_df (pandas.DataFrame) – A df-like DataFrame with the same columns as the input df plus a column with predictions from the model.
  • log (dict) – A log-like Dict that stores information of the Causal S-Learner Classifier model.


Fits a Causal T-Learner classifier. The T-Learner is a meta-learner which learns the Conditional Average Treatment Effect (CATE) through the use of one Machine Learning model for each treatment and for the control group. Each model is fitted in a subset of the data, according to the treatment: the CATE $ au$ is defined as $ au(x_{i}) = M_{1}(X=x_{i}, T=1) - M_{0}(X=x_{i}, T=0)$, being $M_{1}$ a model fitted with treatment data and $M_{0}$ a model fitted with control data. Notice that $M_{0}$ and $M_{1}$ are traditional Machine Learning models such as a LightGBM Classifier and that $x_{i}$ is the feature set of sample $i$.




  • df (pd.DataFrame) – A Pandas’ DataFrame with features and target columns. The model will be trained to predict the target column from the features.
  • treatment_col (str) – The name of the column in df which contains the names of the treatments and control to which each data sample was subjected.
  • control_name (str) – The name of the control group.
  • prediction_column (str) – The name of the column with the predictions from the provided learner.
  • learner (LearnerFnType) – A fklearn classification learner function.
  • treatment_learner (LearnerFnType) – An optional fklearn classification learner function.
  • learner_transformers (List[LearnerFnType]) – A list of fklearn transformer functions to be applied after the learner and before estimating the CATE. This parameter may be useful, for example, to estimate the CATE with calibrated classifiers.

  • p (function pandas.DataFrame -> pandas.DataFrame) – A function that when applied to a DataFrame with the same columns as df returns a new DataFrame with a new column with predictions from the model.
  • new_df (pandas.DataFrame) – A df-like DataFrame with the same columns as the input df plus a column with predictions from the model.
  • log (dict) – A log-like Dict that stores information of the Causal T-Learner Classifier model.

Module contents