Quickstart

⬇️  Download this Notebook

KhalibClassifier Scikit-Learn Estimator

We first create our train, calibration and test datasets. We use 45k as test to ensure good error estimations. The rest is divided evenly into train and calibration.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(
    n_samples=50_000, n_features=20, n_informative=2, n_redundant=10, random_state=42
)
X_train, X_not_train, y_train, y_not_train = train_test_split(
    X, y, train_size=2500, random_state=42
)
X_calib, X_test, y_calib, y_test = train_test_split(
    X_not_train, y_not_train, train_size=2500, random_state=42
)

We now train a GaussianNB classifier. This kind of model is usually uncalibrated because data never fullfill its hypotheses. We also estimate its expected calibration error (ECE):

from sklearn.naive_bayes import GaussianNB

import khalib

# Compute the positive scores with a Gaussian Naive Bayes model
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_scores_test = gnb.predict_proba(X_test)[:, 1]

# Compute and display the ECE
ece_test = khalib.calibration_error(y_scores_test, y_test)
print("RAW GNB ECE:", ece_test)
RAW GNB ECE: 0.07955606662760853

To calibrate our GaussianNB we create an instance of KhalibClassifier with it:

We can also plot the reliability diagram using the build_reliability_diagram function:

%config InlineBackend.figure_formats = ['svg']
_ = khalib.build_reliability_diagram(y_scores_test, y_test)
_images/389c5ac8856dbbaac6d51f3533ef32389dcd09638a7ad3a0be970b8b1df6576a.svg

We now calibrate the model with a KhalibClassifier object. It uses the uncalibrated model as parameter. We then fit it on the calib split.

# Train the calibrated classifier and obtain the calibrated scores
calib_gnb = khalib.KhalibClassifier(gnb)
calib_gnb.fit(X_calib, y_calib)
y_calib_scores_test = calib_gnb.predict_proba(X_test)[:, 1]

# Compute the ECE
calib_ece_test = khalib.calibration_error(y_calib_scores_test, y_test)
print("CALIB ECE:", calib_ece_test)
print("Reduction:", (ece_test - calib_ece_test) / ece_test)
CALIB ECE: 0.006877174416970479
Reduction: 0.9135556255041917

We observe that khalib reduced the ECE by ~90%. We now plot the reliability diagram for the calibrated scores. The reliability_diagram uses a heuristic to detect when the scores are distributed as Dirac deltas and changes the visualization accordingly:

_ = khalib.build_reliability_diagram(y_calib_scores_test, y_test)
_images/f6fbcded77b46056f7d00f1dee0d9ee059f2efa0ef2b8578692105d5411ffe46.svg

calibrate_binary function + Histogram class

We can achieve the same result “manually” by using the function calibrate_binary which calibrates the scores with a Histogram object.

# Obtain the scores on the calib split and build a supervised histogram with it
y_scores_calib = gnb.predict_proba(X_calib)[:, 1]
hist = khalib.Histogram.from_data(y_scores_calib, y=y_calib)

# Calibrate the scores of the test split
calib_hist_y_test_scores = khalib.calibrate_binary(
    y_scores_test, hist, only_positive=True
)

# Print the error and plot the reliability diagram
calib_hist_ece_test = khalib.calibration_error(calib_hist_y_test_scores, y=y_test)
print("CALIB HIST ECE:", calib_hist_ece_test)
print("Reduction     :", (ece_test - calib_ece_test) / ece_test)
_ = khalib.build_reliability_diagram(calib_hist_y_test_scores, y_test)
CALIB HIST ECE: 0.006877174416970479
Reduction     : 0.9135556255041917
_images/b79166efc12579647608b51aef96daf02f935be770d567626b9ae8cc3660dcbe.svg