Anomaly detection has been an enabler of integrity and security for a very wide range of information-rich, modern systems: from fraud detection to network security, fault detection in machinery, health monitoring, and many others. In this post, we will see how to do anomaly detection with TensorFlow.
What is Anomaly Detection?
Anomaly Detection is the process of identifying those data points that considerably deviate from the norm. These deviations at times may signal very critical incidents, such as fraud, network break-ins, or system failure. Anomaly detection models learn the normal behavior of a system and flag any data point that does not conform to this behavior.
Why TensorFlow?
Developed at Google, tensorflow is an open source machine learning framework. It provides an enormous quantity of tools to build and deploy machine learning models. Therefore, within the spectrum of anomaly detection, due to its flexibility and scalability, it becomes very suitable. It also permits a developer to create his model according to the particular case apart from the already developed methods.
Steps to Implement Anomaly Detection with TensorFlow
1.Data Collection and Preprocessing
Any machine learning project starts with data collection. Now, for anomaly detection, you would want a dataset consisting of normal and anomalous data points. Preprocessing basically ensures that, by a series of steps, the data is in an appropriate format for training.
import numpy as np from sklearn.preprocessing import StandardScaler # Example data data = np.random.rand(1000, 20) # 1000 samples, 20 features # Standardize the data scaler = StandardScaler() data_scaled = scaler.fit_transform(data)
2.Building the Model
Autoencoders are very popular in anomaly detection. On its core, an autoencoder itself is a neural network that will learn to compress data in lower-dimensional space and then reconstruct again. The anomalies are detected based on reconstruction error.
import tensorflow as tf from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense input_dim = data_scaled.shape[1] encoding_dim = 14 # Dimension of the encoded representation input_layer = Input(shape=(input_dim,)) encoder = Dense(encoding_dim, activation="relu")(input_layer) decoder = Dense(input_dim, activation="sigmoid")(encoder) autoencoder = Model(inputs=input_layer, outputs=decoder) autoencoder.compile(optimizer='adam', loss='mse') autoencoder.summary()
Output:
Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 20)] 0 dense (Dense) (None, 14) 294 dense_1 (Dense) (None, 20) 300 ================================================================= Total params: 594 (2.32 KB) Trainable params: 594 (2.32 KB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________
3.Training the Model
Train the auto-encoder on the normal data. The model is trained to reconstruct this data with minimum error.
history = autoencoder.fit(data_scaled, data_scaled, epochs=50, batch_size=32, validation_split=0.2, shuffle=True)
Output:
Epoch 1/50 25/25 [==============================] - 1s 17ms/step - loss: 1.2188 - val_loss: 1.1869 Epoch 2/50 25/25 [==============================] - 0s 7ms/step - loss: 1.1768 - val_loss: 1.1473 Epoch 3/50 25/25 [==============================] - 0s 6ms/step - loss: 1.1375 - val_loss: 1.1118 Epoch 4/50 25/25 [==============================] - 0s 5ms/step - loss: 1.1021 - val_loss: 1.0792 Epoch 5/50 25/25 [==============================] - 0s 5ms/step - loss: 1.0702 - val_loss: 1.0501 Epoch 6/50 25/25 [==============================] - 0s 5ms/step - loss: 1.0418 - val_loss: 1.0245 Epoch 7/50 25/25 [==============================] - 0s 6ms/step - loss: 1.0172 - val_loss: 1.0017 Epoch 8/50 25/25 [==============================] - 0s 7ms/step - loss: 0.9957 - val_loss: 0.9817 Epoch 9/50 25/25 [==============================] - 0s 7ms/step - loss: 0.9769 - val_loss: 0.9643 Epoch 10/50 25/25 [==============================] - 0s 5ms/step - loss: 0.9605 - val_loss: 0.9490 Epoch 11/50 25/25 [==============================] - 0s 5ms/step - loss: 0.9460 - val_loss: 0.9355 Epoch 12/50 25/25 [==============================] - 0s 7ms/step - loss: 0.9331 - val_loss: 0.9233 Epoch 13/50 25/25 [==============================] - 0s 8ms/step - loss: 0.9215 - val_loss: 0.9124 Epoch 14/50 25/25 [==============================] - 0s 8ms/step - loss: 0.9109 - val_loss: 0.9025 Epoch 15/50 25/25 [==============================] - 0s 7ms/step - loss: 0.9010 - val_loss: 0.8933 Epoch 16/50 25/25 [==============================] - 0s 6ms/step - loss: 0.8919 - val_loss: 0.8848 Epoch 17/50 25/25 [==============================] - 0s 8ms/step - loss: 0.8832 - val_loss: 0.8768 Epoch 18/50 25/25 [==============================] - 0s 6ms/step - loss: 0.8750 - val_loss: 0.8694 Epoch 19/50 25/25 [==============================] - 0s 7ms/step - loss: 0.8674 - val_loss: 0.8623 Epoch 20/50 25/25 [==============================] - 0s 8ms/step - loss: 0.8600 - val_loss: 0.8554 Epoch 21/50 25/25 [==============================] - 0s 5ms/step - loss: 0.8531 - val_loss: 0.8490 Epoch 22/50 25/25 [==============================] - 0s 5ms/step - loss: 0.8465 - val_loss: 0.8428 Epoch 23/50 25/25 [==============================] - 0s 5ms/step - loss: 0.8402 - val_loss: 0.8368 Epoch 24/50 25/25 [==============================] - 0s 5ms/step - loss: 0.8341 - val_loss: 0.8310 Epoch 25/50 25/25 [==============================] - 0s 5ms/step - loss: 0.8283 - val_loss: 0.8255 Epoch 26/50 25/25 [==============================] - 0s 5ms/step - loss: 0.8227 - val_loss: 0.8202 Epoch 27/50 25/25 [==============================] - 0s 5ms/step - loss: 0.8172 - val_loss: 0.8151 Epoch 28/50 25/25 [==============================] - 0s 4ms/step - loss: 0.8120 - val_loss: 0.8101 Epoch 29/50 25/25 [==============================] - 0s 3ms/step - loss: 0.8070 - val_loss: 0.8052 Epoch 30/50 25/25 [==============================] - 0s 3ms/step - loss: 0.8021 - val_loss: 0.8004 Epoch 31/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7973 - val_loss: 0.7958 Epoch 32/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7927 - val_loss: 0.7913 Epoch 33/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7882 - val_loss: 0.7870 Epoch 34/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7838 - val_loss: 0.7828 Epoch 35/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7796 - val_loss: 0.7787 Epoch 36/50 25/25 [==============================] - 0s 4ms/step - loss: 0.7756 - val_loss: 0.7746 Epoch 37/50 25/25 [==============================] - 0s 4ms/step - loss: 0.7716 - val_loss: 0.7708 Epoch 38/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7677 - val_loss: 0.7671 Epoch 39/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7640 - val_loss: 0.7635 Epoch 40/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7604 - val_loss: 0.7600 Epoch 41/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7569 - val_loss: 0.7567 Epoch 42/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7535 - val_loss: 0.7534 Epoch 43/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7503 - val_loss: 0.7504 Epoch 44/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7471 - val_loss: 0.7475 Epoch 45/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7442 - val_loss: 0.7447 Epoch 46/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7412 - val_loss: 0.7420 Epoch 47/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7385 - val_loss: 0.7395 Epoch 48/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7358 - val_loss: 0.7370 Epoch 49/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7333 - val_loss: 0.7347 Epoch 50/50 25/25 [==============================] - 0s 3ms/step - loss: 0.7308 - val_loss: 0.7326
4.Anomaly Detection
These can then be compared with real data to uniquely identify anomalies after training a model. Classification into normal or anomalous data points is then possible with a threshold setting over the reconstruction error.
reconstructions = autoencoder.predict(data_scaled) reconstruction_error = np.mean(np.square(data_scaled - reconstructions), axis=1) threshold = np.mean(reconstruction_error) + 3 * np.std(reconstruction_error) anomalies = reconstruction_error > threshold print(f"Number of anomalies detected: {np.sum(anomalies)}")
Output:
32/32 [==============================] - 0s 1ms/step Number of anomalies detected: 3
Conclusion
One of the most critical techniques being applied to the detection of anomalies, particularly within use cases of critical nature, is anomaly detection in datasets. You can create and deploy anomaly detection models based on TensorFlow. In this recipe, we will make a model learn the normal—hence abnormal—behavior of your system so that it can really be good with picking up anomalies. Whether it is finance, health, or cybersecurity, Anomaly Detection using TensorFlow helps you ensure that the data and systems are integrated and safe.