We already covered SHAP-explained models for classification and regression scenarios in a previous APL blog post, and at the time we talked briefly about the main effect of a predictor and its interaction effect with the other predictors of the model. Now with HANA ML 2.17, you have the ability to visualize the interaction between variables in a heatmap. This new visualization adds to the bar chart Variable Importance in providing a global explanation of the classification/regression model. To get that new feature you need APL 2311 or a later version.

This blog will walk you through an example using the Census dataset that comes with APL.

from hana_ml import dataframe as hd
conn = hd.ConnectionContext(userkey='MLMDA_KEY')
sql_cmd = 'SELECT * FROM "APL_SAMPLES"."CENSUS" ORDER BY 1'
hdf_train = hd.DataFrame(conn, sql_cmd)

First, we train a gradient boosting classification model with the interaction parameter set to true:

from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier
apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True, 
                                             interactions=True)
apl_model.fit(hdf_train, label='class', key='id')

When the model training is completed, we ask for the report:

from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(apl_model).build().display()

You may want to generate the report as an HTML file:

apl_model.generate_html_report('APL_Census')

The usual “Variable Importance” tab provides a global explanation of the predictive model.

But because we explicitly requested the interactions when setting the model parameters, a new tab “Interaction Matrix” appears at the end:

On the diagonal is the main effect of each variable. The interaction matrix presents only the variables with the highest interactions. By default, it is limited to a size of 6×6. For a larger matrix, 9×9 for example, we must specify a maximum number as follows:

apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True, 
                                             interactions=True, 
                                             interactions_max_kept=8)
apl_model.fit(hdf_train, label='class', key='id')

The larger the matrix, the longer it takes to fit the model.

If needed, one can obtain the interaction values in a pandas dataframe:

df = apl_model.get_debrief_report('ClassificationRegression_InteractionMatrix').deselect('Oid').collect()
df.style.hide(axis='index')

These figures are computed using the Shapley Taylor index.

 

Sara Sampaio

Sara Sampaio

Author Since: March 10, 2022

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x