I am writing this blog to show basic model management using python package hana_ml. Wtih class ModelStorage, you can save and load models. Besides, I show State Enabled Real-Time Scoring Functions for faster prediction process.
Environment
Environment is as below.
- Python: 3.7.13(Google Colaboratory)
- HANA: Cloud Edition 2022.16
Python packages and their versions.
- hana_ml: 2.13.22072200
- pandas: 1.3.5
- scikit-learn: 1.0.2
As for HANA Cloud, I activated scriptserver and created my users. Though I don’t recognize other special configurations, I may miss something since our HANA Cloud was created long time before.
I didn’t use HDI here to make environment simple.
Python Script
Pre-requisites
Please see another article “Python hana_ml: PAL Classification Training(UnifiedClassification)” for training process. From step1 “Install Python packages” to step 8 “Training” are exactly same code. Step 9, 10 and 11 are unnecessary for this article.
9. Import modules
Import other python package modules as additional.
import pprint
from hana_ml.model_storage import ModelStorage
10. Save model
Just save model with class “ModelStorage” and function “save_model”.
ms = ModelStorage(conn)
uc_rdt.name = 'Random Forest'
ms.save_model(model=uc_rdt, if_exists='replace')
Model metadata is stored in table “HANAML_MODEL_STORAGE”, so the both below result are same.
display(ms.list_models())
display(conn.table('HANAML_MODEL_STORAGE').collect())
Let’s look into the contents deeply.
pprint.pprint(ms.list_models().to_dict())
Though model metadata is stored in table “HANAML_MODEL_STORAGE”, model contents and other data are saved in tables under “JSON -> artifacts”, which are up to algorithm. Help doc says as below.
The back-end model. It consists in the model returned by SAP HANA APL or SAP HANA PAL. For SAP HANA APL, it is always saved into the table HANAMl_APL_MODELS_DEFAULT, while for SAP HANA PAL, a model can be saved into different tables depending on the nature of the specified algorithm.
{'CLASS': {0: 'hana_ml.algorithms.pal.unified_classification.UnifiedClassification'},
'JSON': {0: '{"model_attributes": {"func": "RandomDecisionTree", '
'"multi_class": null, "massive": false, "group_params": null, '
'"kwargs": {"n_estimators": 10, "max_depth": 10}}, "fit_params": '
'{"key": "ID", "features": null, "label": null, "group_key": '
'null, "group_params": null, "purpose": null, "partition_method": '
'"stratified", "stratified_column": "CLASS", '
'"partition_random_state": null, "training_percent": 0.8, '
'"training_size": null, "ntiles": 2, "categorical_variable": '
'null, "output_partition_result": null, "background_size": null, '
'"background_random_state": null, "build_report": true, "impute": '
'false, "strategy": null, "strategy_by_col": null, "als_factors": '
'null, "als_lambda": null, "als_maxit": null, "als_randomstate": '
'null, "als_exit_threshold": null, "als_exit_interval": null, '
'"als_linsolver": null, "als_cg_maxit": null, "als_centering": '
'null, "als_scaling": null, "kwargs": {}}, "artifacts": '
'{"schema": "I348221", "model_tables": '
'["HANAML_RANDOM_FOREST_2_MODELS_0", '
'"HANAML_RANDOM_FOREST_2_MODELS_1", '
'"HANAML_RANDOM_FOREST_2_MODELS_2", '
'"HANAML_RANDOM_FOREST_2_MODELS_3", '
'"HANAML_RANDOM_FOREST_2_MODELS_4", '
'"HANAML_RANDOM_FOREST_2_MODELS_5"], "library": "PAL"}, '
'"pal_meta": {"_fit_param": [["FUNCTION", "RDT", "string"], '
'["KEY", 1, "integer"], ["N_ESTIMATORS", 10, "integer"], '
'["MAX_DEPTH", 10, "integer"], ["PARTITION_METHOD", 2, '
'"integer"], ["PARTITION_STRATIFIED_VARIABLE", "CLASS", '
'"string"], ["PARTITION_TRAINING_PERCENT", 0.8, "float"], '
'["NTILES", 2, "integer"], ["HANDLE_MISSING_VALUE", 0, '
'"integer"], ["CATEGORICAL_VARIABLE", "CLASS", "string"]], '
'"fit_data_struct": {"ID": "INT", "X1": "DOUBLE", "X2": "DOUBLE", '
'"X3": "DOUBLE", "CLASS": "INT"}, "label": "CLASS"}}'},
'LIBRARY': {0: 'PAL'},
'MODEL_STORAGE_VER': {0: 1},
'NAME': {0: 'Random Forest'},
'SCHEDULE': {0: '{"schedule": {"status": "inactive", "schedule_time": "every '
'1 hours", "pid": null, "client": null, "connection": '
'{"userkey": "your_userkey", "encrypt": "false", '
'"sslValidateCertificate": "true"}, "hana_ml_obj": '
'"hana_ml.algorithms.pal.xxx", "init_params": {}, '
'"fit_params": {}, "training_dataset_select_statement": '
'"SELECT * FROM YOUR_TABLE"}}'},
'STORAGE_TYPE': {0: 'default'},
'TIMESTAMP': {0: Timestamp('2022-09-07 06:54:10')},
'VERSION': {0: 2}}
11. Load model
Now, just load model with function “load_model”. create_model_state is for State Enabled Real-Time Scoring Functions.
saved_model = ms.load_model(name='Random Forest')
saved_model.create_model_state()
12. Predict with loaded model
Just call “predict” function for prediction.
df_pred = saved_model.predict(test, key='ID')
print(df_pred.collect())
ID SCORE CONFIDENCE
0 9 0 1.0
1 13 1 1.0
2 14 0 1.0
3 16 1 0.8
4 20 0 1.0
... ... ... ...
1995 9988 1 1.0
1996 9990 0 1.0
1997 9996 1 1.0
1998 9998 0 0.8
1999 9999 1 1.0
REASON_CODE
0 [{"attr":"X2","pct":81.0,"val":-0.350732473499...
1 [{"attr":"X2","pct":89.0,"val":-0.546387864002...
2 [{"attr":"X2","pct":82.0,"val":-0.367046185280...
3 [{"attr":"X2","pct":76.0,"val":-0.221394522848...
4 [{"attr":"X2","pct":88.0,"val":-0.470017154574...
... ...
1995 [{"attr":"X2","pct":90.0,"val":-0.490175736690...
1996 [{"attr":"X2","pct":71.0,"val":-0.333635163456...
1997 [{"attr":"X2","pct":94.0,"val":-0.510854084253...
1998 [{"attr":"X2","pct":48.0,"val":-0.140319048941...
1999 [{"attr":"X2","pct":97.0,"val":-0.498180631259...
[2000 rows x 4 columns]
13. Delete model state and close connection
Delete model state and close HANA connection. If you are testing and don’t need all models anymore, then clean_up function delete all models.
saved_model.delete_model_state()
#ms.clean_up()
conn.close()