In this blog post I leverage the Iris flower data set provided by scikit-learn that contains three classes of fifty instances each, where each class refers to a type of iris plant:
from sklearn.datasets import load_iris
df = load_iris()
df.target_names
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
One class (setosa) is linearly separable from the other two, but the latter are not linearly separable from each other:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
x = df.data
y_true = df.target
ax = Axes3D(plt.figure())
ax.scatter(x[:, 2], x[:, 1], x[:, 0], c=y_true)
ax.set_xlabel(df.feature_names[2])
ax.set_ylabel(df.feature_names[1])
ax.set_zlabel(df.feature_names[0])
To start with, I prepare my data set for TensorFlow:
import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices((x, y_true))
dataset = dataset.batch(1)
for feat, targ in dataset.take(5):
print('Features: {}, Target: {}'.format(feat, targ))
print(x.shape)
print(y_true.shape)
Features: [[5.1 3.5 1.4 0.2]], Target: [0]
Features: [[4.9 3. 1.4 0.2]], Target: [0]
Features: [[4.7 3.2 1.3 0.2]], Target: [0]
Features: [[4.6 3.1 1.5 0.2]], Target: [0]
Features: [[5. 3.6 1.4 0.2]], Target: [0]
(150, 4)
(150,)
Then I build, compile and train my model with ReLU activation to add non-linearity:
model = tf.keras.Sequential([tf.keras.layers.Dense(8, activation=tf.nn.relu, input_shape=(4,)), tf.keras.layers.Dense(8, activation=tf.nn.relu), tf.keras.layers.Dense(3)])
model.compile(tf.keras.optimizers.SGD(), tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.summary()
history = model.fit(dataset, epochs=64)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_9 (Dense) (None, 8) 40
_________________________________________________________________
dense_10 (Dense) (None, 8) 72
_________________________________________________________________
dense_11 (Dense) (None, 3) 27
=================================================================
Total params: 139
Trainable params: 139
Non-trainable params: 0
_________________________________________________________________
Epoch 1/64
150/150 [==============================] - 0s 2ms/step - loss: 0.1592
Epoch 2/64
150/150 [==============================] - 0s 3ms/step - loss: 0.1407
...
Epoch 63/64
150/150 [==============================] - 0s 2ms/step - loss: 0.0209
Epoch 64/64
150/150 [==============================] - 0s 2ms/step - loss: 0.0436
With that I check its accuracy. I expect fifty zeros followed by 50 ones and fifty twos, so not too bad really:
tf.argmax(model(x), axis=1)
<tf.Tensor: shape=(150,), dtype=int64, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1,
2, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1,
1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])>
Let’s try a prediction in the critical area between versicolor and virginica then:
df.target_names[tf.argmax(model.predict([[6, 3, 4, 1]])[0], axis=0).numpy()]
'versicolor'
For leveraging these results, in the SAP Data Intelligence ML Scenario Manager, I add a Python Producer to create, compile, train and store my model. Since I do not need an input file but a Docker image with the scikit-learn and TensorFlow Python libraries, I replace the default Python 3 operator with a custom operator that only got the required output Ports and Tag for my Docker image:
Here is the full code for my custom operator:
from sklearn.datasets import load_iris
import tensorflow as tf
import json
import h5py
# Example Python script to perform training on input data & generate Metrics & Model Blob
def gen():
# to send metrics to the Submit Metrics operator, create a Python dictionary of key-value pairs
df = load_iris()
x = df.data
y_true = df.target
dataset = tf.data.Dataset.from_tensor_slices((x, y_true))
dataset = dataset.batch(1)
model = tf.keras.Sequential([tf.keras.layers.Dense(8, activation=tf.nn.relu, input_shape=(4,)), tf.keras.layers.Dense(8, activation=tf.nn.relu), tf.keras.layers.Dense(3)])
model.compile(tf.keras.optimizers.SGD(), tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
history = model.fit(dataset, epochs=64)
# metrics_dict = {"kpi1": "1"}
metrics_dict = json.dumps({'loss': str(history.history['loss'][len(history.history['loss']) - 1])})
# send the metrics to the output port - Submit Metrics operator will use this to persist the metrics
api.send("metrics", api.Message(metrics_dict))
# create & send the model blob to the output port - Artifact Producer operator will use this to persist the model and create an artifact ID
f = h5py.File('blob', driver='core', backing_store=False)
model.save(f)
f.flush()
# model_blob = bytes("example", 'utf-8')
model_blob = f.id.get_file_image()
api.send("modelBlob", model_blob)
api.add_generator(gen)
That requires this Docker image:
FROM $com.sap.sles.ml.python
RUN python3.6 -m pip --no-cache-dir install --user --upgrade pip
RUN python3.6 -m pip --no-cache-dir install --user sklearn tensorflow
Executing my modified Python Producer provides me with my Metrics, Models and Datasets:
To consume these, in the SAP Data Intelligence ML Scenario Manager, I add a Python Consumer and add the following code:
# apply your model
blob = io.BytesIO(model)
f = h5py.File(blob, 'r')
architectSAP = tf.keras.models.load_model(f)
blob.close()
# obtain your results
prediction = architectSAP.predict([json.loads(user_data)['iris']])
success = True
# apply carried out successfully, send a response to the user
# msg.body = json.dumps({'Results': 'Model applied to input data successfully.'})
df = load_iris()
msg.body = json.dumps({'iris': df.target_names[tf.argmax(prediction[0], axis=0).numpy()]})
Since this also requires the scikit-learn and TensorFlow Python libraries, I add a Group with the Tag for my Docker image:
Deploying this provides me with an URL that I can use to retrieve my prediction e.g. with Postman:
I hope this example is complex enough to make sense for a machine learning scenario but still simple enough for you to now understand how to bild a SAP Data Intelligence ML Scenario with TensorFlow.