Anomaly Detection with TensorFlow

Anomaly detection has been an enabler of integrity and security for a very wide range of information-rich, modern systems: from fraud detection to network security, fault detection in machinery, health monitoring, and many others. In this post, we will see how to do anomaly detection with TensorFlow.

What is Anomaly Detection?

Anomaly Detection is the process of identifying those data points that considerably deviate from the norm. These deviations at times may signal very critical incidents, such as fraud, network break-ins, or system failure. Anomaly detection models learn the normal behavior of a system and flag any data point that does not conform to this behavior.

Why TensorFlow?

Developed at Google, tensorflow is an open source machine learning framework. It provides an enormous quantity of tools to build and deploy machine learning models. Therefore, within the spectrum of anomaly detection, due to its flexibility and scalability, it becomes very suitable. It also permits a developer to create his model according to the particular case apart from the already developed methods.

Steps to Implement Anomaly Detection with TensorFlow

1.Data Collection and Preprocessing

Any machine learning project starts with data collection. Now, for anomaly detection, you would want a dataset consisting of normal and anomalous data points. Preprocessing basically ensures that, by a series of steps, the data is in an appropriate format for training.

import numpy as np
from sklearn.preprocessing import StandardScaler

# Example data
data = np.random.rand(1000, 20)  # 1000 samples, 20 features

# Standardize the data
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

2.Building the Model

Autoencoders are very popular in anomaly detection. On its core, an autoencoder itself is a neural network that will learn to compress data in lower-dimensional space and then reconstruct again. The anomalies are detected based on reconstruction error.

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense

input_dim = data_scaled.shape[1]
encoding_dim = 14  # Dimension of the encoded representation

input_layer = Input(shape=(input_dim,))
encoder = Dense(encoding_dim, activation="relu")(input_layer)
decoder = Dense(input_dim, activation="sigmoid")(encoder)

autoencoder = Model(inputs=input_layer, outputs=decoder)
autoencoder.compile(optimizer='adam', loss='mse')

autoencoder.summary()

Output:

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 20)]              0         
                                                                 
 dense (Dense)               (None, 14)                294       
                                                                 
 dense_1 (Dense)             (None, 20)                300       
                                                                 
=================================================================
Total params: 594 (2.32 KB)
Trainable params: 594 (2.32 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

3.Training the Model

Train the auto-encoder on the normal data. The model is trained to reconstruct this data with minimum error.

history = autoencoder.fit(data_scaled, data_scaled,
                          epochs=50,
                          batch_size=32,
                          validation_split=0.2,
                          shuffle=True)

Output:

Epoch 1/50
25/25 [==============================] - 1s 17ms/step - loss: 1.2188 - val_loss: 1.1869
Epoch 2/50
25/25 [==============================] - 0s 7ms/step - loss: 1.1768 - val_loss: 1.1473
Epoch 3/50
25/25 [==============================] - 0s 6ms/step - loss: 1.1375 - val_loss: 1.1118
Epoch 4/50
25/25 [==============================] - 0s 5ms/step - loss: 1.1021 - val_loss: 1.0792
Epoch 5/50
25/25 [==============================] - 0s 5ms/step - loss: 1.0702 - val_loss: 1.0501
Epoch 6/50
25/25 [==============================] - 0s 5ms/step - loss: 1.0418 - val_loss: 1.0245
Epoch 7/50
25/25 [==============================] - 0s 6ms/step - loss: 1.0172 - val_loss: 1.0017
Epoch 8/50
25/25 [==============================] - 0s 7ms/step - loss: 0.9957 - val_loss: 0.9817
Epoch 9/50
25/25 [==============================] - 0s 7ms/step - loss: 0.9769 - val_loss: 0.9643
Epoch 10/50
25/25 [==============================] - 0s 5ms/step - loss: 0.9605 - val_loss: 0.9490
Epoch 11/50
25/25 [==============================] - 0s 5ms/step - loss: 0.9460 - val_loss: 0.9355
Epoch 12/50
25/25 [==============================] - 0s 7ms/step - loss: 0.9331 - val_loss: 0.9233
Epoch 13/50
25/25 [==============================] - 0s 8ms/step - loss: 0.9215 - val_loss: 0.9124
Epoch 14/50
25/25 [==============================] - 0s 8ms/step - loss: 0.9109 - val_loss: 0.9025
Epoch 15/50
25/25 [==============================] - 0s 7ms/step - loss: 0.9010 - val_loss: 0.8933
Epoch 16/50
25/25 [==============================] - 0s 6ms/step - loss: 0.8919 - val_loss: 0.8848
Epoch 17/50
25/25 [==============================] - 0s 8ms/step - loss: 0.8832 - val_loss: 0.8768
Epoch 18/50
25/25 [==============================] - 0s 6ms/step - loss: 0.8750 - val_loss: 0.8694
Epoch 19/50
25/25 [==============================] - 0s 7ms/step - loss: 0.8674 - val_loss: 0.8623
Epoch 20/50
25/25 [==============================] - 0s 8ms/step - loss: 0.8600 - val_loss: 0.8554
Epoch 21/50
25/25 [==============================] - 0s 5ms/step - loss: 0.8531 - val_loss: 0.8490
Epoch 22/50
25/25 [==============================] - 0s 5ms/step - loss: 0.8465 - val_loss: 0.8428
Epoch 23/50
25/25 [==============================] - 0s 5ms/step - loss: 0.8402 - val_loss: 0.8368
Epoch 24/50
25/25 [==============================] - 0s 5ms/step - loss: 0.8341 - val_loss: 0.8310
Epoch 25/50
25/25 [==============================] - 0s 5ms/step - loss: 0.8283 - val_loss: 0.8255
Epoch 26/50
25/25 [==============================] - 0s 5ms/step - loss: 0.8227 - val_loss: 0.8202
Epoch 27/50
25/25 [==============================] - 0s 5ms/step - loss: 0.8172 - val_loss: 0.8151
Epoch 28/50
25/25 [==============================] - 0s 4ms/step - loss: 0.8120 - val_loss: 0.8101
Epoch 29/50
25/25 [==============================] - 0s 3ms/step - loss: 0.8070 - val_loss: 0.8052
Epoch 30/50
25/25 [==============================] - 0s 3ms/step - loss: 0.8021 - val_loss: 0.8004
Epoch 31/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7973 - val_loss: 0.7958
Epoch 32/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7927 - val_loss: 0.7913
Epoch 33/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7882 - val_loss: 0.7870
Epoch 34/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7838 - val_loss: 0.7828
Epoch 35/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7796 - val_loss: 0.7787
Epoch 36/50
25/25 [==============================] - 0s 4ms/step - loss: 0.7756 - val_loss: 0.7746
Epoch 37/50
25/25 [==============================] - 0s 4ms/step - loss: 0.7716 - val_loss: 0.7708
Epoch 38/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7677 - val_loss: 0.7671
Epoch 39/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7640 - val_loss: 0.7635
Epoch 40/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7604 - val_loss: 0.7600
Epoch 41/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7569 - val_loss: 0.7567
Epoch 42/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7535 - val_loss: 0.7534
Epoch 43/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7503 - val_loss: 0.7504
Epoch 44/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7471 - val_loss: 0.7475
Epoch 45/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7442 - val_loss: 0.7447
Epoch 46/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7412 - val_loss: 0.7420
Epoch 47/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7385 - val_loss: 0.7395
Epoch 48/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7358 - val_loss: 0.7370
Epoch 49/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7333 - val_loss: 0.7347
Epoch 50/50
25/25 [==============================] - 0s 3ms/step - loss: 0.7308 - val_loss: 0.7326

4.Anomaly Detection

These can then be compared with real data to uniquely identify anomalies after training a model. Classification into normal or anomalous data points is then possible with a threshold setting over the reconstruction error.

reconstructions = autoencoder.predict(data_scaled)
reconstruction_error = np.mean(np.square(data_scaled - reconstructions), axis=1)

threshold = np.mean(reconstruction_error) + 3 * np.std(reconstruction_error)
anomalies = reconstruction_error > threshold

print(f"Number of anomalies detected: {np.sum(anomalies)}")

Output:

32/32 [==============================] - 0s 1ms/step
Number of anomalies detected: 3

Conclusion

One of the most critical techniques being applied to the detection of anomalies, particularly within use cases of critical nature, is anomaly detection in datasets. You can create and deploy anomaly detection models based on TensorFlow. In this recipe, we will make a model learn the normal—hence abnormal—behavior of your system so that it can really be good with picking up anomalies. Whether it is finance, health, or cybersecurity, Anomaly Detection using TensorFlow helps you ensure that the data and systems are integrated and safe.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top