I am writing this blog to show basic model management using python package hana_ml.  Wtih class ModelStorage, you can save and load models.  Besides, I show State Enabled Real-Time Scoring Functions for faster prediction process.

Environment

Environment is as below.

  • Python: 3.7.13(Google Colaboratory)
  • HANA: Cloud Edition 2022.16

Python packages and their versions.

  • hana_ml: 2.13.22072200
  • pandas: 1.3.5
  • scikit-learn: 1.0.2

As for HANA Cloud, I activated scriptserver and created my users.  Though I don’t recognize other special configurations, I may miss something since our HANA Cloud was created long time before.

I didn’t use HDI here to make environment simple.

Python Script

Pre-requisites

Please see another article “Python hana_ml: PAL Classification Training(UnifiedClassification)” for training process.  From step1 “Install Python packages” to step 8 “Training” are exactly same code.  Step 9, 10 and 11 are unnecessary for this article.

9. Import modules

Import other python package modules as additional.

import pprint

from hana_ml.model_storage import ModelStorage

10. Save model

Just save model with class “ModelStorage” and function “save_model”.

ms = ModelStorage(conn)

uc_rdt.name = 'Random Forest'
ms.save_model(model=uc_rdt, if_exists='replace')

Model metadata is stored in table “HANAML_MODEL_STORAGE”, so the both below result are same.

display(ms.list_models())
display(conn.table('HANAML_MODEL_STORAGE').collect())

Let’s look into the contents deeply.

pprint.pprint(ms.list_models().to_dict())

Though model metadata is stored in table “HANAML_MODEL_STORAGE”, model contents and other data are saved in tables under “JSON -> artifacts”, which are up to algorithm.  Help doc says as below.

The back-end model. It consists in the model returned by SAP HANA APL or SAP HANA PAL. For SAP HANA APL, it is always saved into the table HANAMl_APL_MODELS_DEFAULT, while for SAP HANA PAL, a model can be saved into different tables depending on the nature of the specified algorithm.

{'CLASS': {0: 'hana_ml.algorithms.pal.unified_classification.UnifiedClassification'},
 'JSON': {0: '{"model_attributes": {"func": "RandomDecisionTree", '
             '"multi_class": null, "massive": false, "group_params": null, '
             '"kwargs": {"n_estimators": 10, "max_depth": 10}}, "fit_params": '
             '{"key": "ID", "features": null, "label": null, "group_key": '
             'null, "group_params": null, "purpose": null, "partition_method": '
             '"stratified", "stratified_column": "CLASS", '
             '"partition_random_state": null, "training_percent": 0.8, '
             '"training_size": null, "ntiles": 2, "categorical_variable": '
             'null, "output_partition_result": null, "background_size": null, '
             '"background_random_state": null, "build_report": true, "impute": '
             'false, "strategy": null, "strategy_by_col": null, "als_factors": '
             'null, "als_lambda": null, "als_maxit": null, "als_randomstate": '
             'null, "als_exit_threshold": null, "als_exit_interval": null, '
             '"als_linsolver": null, "als_cg_maxit": null, "als_centering": '
             'null, "als_scaling": null, "kwargs": {}}, "artifacts": '
             '{"schema": "I348221", "model_tables": '
             '["HANAML_RANDOM_FOREST_2_MODELS_0", '
             '"HANAML_RANDOM_FOREST_2_MODELS_1", '
             '"HANAML_RANDOM_FOREST_2_MODELS_2", '
             '"HANAML_RANDOM_FOREST_2_MODELS_3", '
             '"HANAML_RANDOM_FOREST_2_MODELS_4", '
             '"HANAML_RANDOM_FOREST_2_MODELS_5"], "library": "PAL"}, '
             '"pal_meta": {"_fit_param": [["FUNCTION", "RDT", "string"], '
             '["KEY", 1, "integer"], ["N_ESTIMATORS", 10, "integer"], '
             '["MAX_DEPTH", 10, "integer"], ["PARTITION_METHOD", 2, '
             '"integer"], ["PARTITION_STRATIFIED_VARIABLE", "CLASS", '
             '"string"], ["PARTITION_TRAINING_PERCENT", 0.8, "float"], '
             '["NTILES", 2, "integer"], ["HANDLE_MISSING_VALUE", 0, '
             '"integer"], ["CATEGORICAL_VARIABLE", "CLASS", "string"]], '
             '"fit_data_struct": {"ID": "INT", "X1": "DOUBLE", "X2": "DOUBLE", '
             '"X3": "DOUBLE", "CLASS": "INT"}, "label": "CLASS"}}'},
 'LIBRARY': {0: 'PAL'},
 'MODEL_STORAGE_VER': {0: 1},
 'NAME': {0: 'Random Forest'},
 'SCHEDULE': {0: '{"schedule": {"status": "inactive", "schedule_time": "every '
                 '1 hours", "pid": null, "client": null, "connection": '
                 '{"userkey": "your_userkey", "encrypt": "false", '
                 '"sslValidateCertificate": "true"}, "hana_ml_obj": '
                 '"hana_ml.algorithms.pal.xxx", "init_params": {}, '
                 '"fit_params": {}, "training_dataset_select_statement": '
                 '"SELECT * FROM YOUR_TABLE"}}'},
 'STORAGE_TYPE': {0: 'default'},
 'TIMESTAMP': {0: Timestamp('2022-09-07 06:54:10')},
 'VERSION': {0: 2}}

 

11. Load model

Now, just load model with function “load_model”.  create_model_state is for State Enabled Real-Time Scoring Functions.

saved_model = ms.load_model(name='Random Forest')
saved_model.create_model_state()

12. Predict with loaded model

Just call “predict” function for prediction.

df_pred = saved_model.predict(test, key='ID')
print(df_pred.collect())
        ID SCORE  CONFIDENCE  
0        9     0         1.0   
1       13     1         1.0   
2       14     0         1.0   
3       16     1         0.8   
4       20     0         1.0   
...    ...   ...         ...   
1995  9988     1         1.0   
1996  9990     0         1.0   
1997  9996     1         1.0   
1998  9998     0         0.8   
1999  9999     1         1.0   

                                            REASON_CODE  
0     [{"attr":"X2","pct":81.0,"val":-0.350732473499...  
1     [{"attr":"X2","pct":89.0,"val":-0.546387864002...  
2     [{"attr":"X2","pct":82.0,"val":-0.367046185280...  
3     [{"attr":"X2","pct":76.0,"val":-0.221394522848...  
4     [{"attr":"X2","pct":88.0,"val":-0.470017154574...  
...                                                 ...  
1995  [{"attr":"X2","pct":90.0,"val":-0.490175736690...  
1996  [{"attr":"X2","pct":71.0,"val":-0.333635163456...  
1997  [{"attr":"X2","pct":94.0,"val":-0.510854084253...  
1998  [{"attr":"X2","pct":48.0,"val":-0.140319048941...  
1999  [{"attr":"X2","pct":97.0,"val":-0.498180631259...  

[2000 rows x 4 columns]

13. Delete model state and close connection

Delete model state and close HANA connection.  If you are testing and don’t need all models anymore, then clean_up function delete all models.

saved_model.delete_model_state()
#ms.clean_up()
conn.close()
Sara Sampaio

Sara Sampaio

Author Since: March 10, 2022

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x