Hey there! Ready to dive into the exciting world of time series analysis? In this tutorial, we will learn how to use TensorFlow, Google’s open-source library for machine learning, to analyze and predict time series data.
We will work with a dataset of daily minimum temperatures in Melbourne and walk through each step. By the end of this tutorial, you’ll have built a powerful predictive model using Tensorflow.
Building a Time Series Analysis Model with Tensorflow
Step 1: Setting the Stage
First things first, we need to set up our environment and load the necessary libraries. We’ll be using Pandas for handling data, matplotlib for plotting, and Tensorflow for building our model.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, LSTM # Load the dataset data = pd.read_csv("daily-min-temperatures.csv") data.head()
Output:
Date Temp 0 1981-01-01 20.7 1 1981-01-02 17.9 2 1981-01-03 18.8 3 1981-01-04 14.6 4 1981-01-05 15.8
Step 2: Exploring the Data
Let’s take a look at our data. It’s a dataset of daily minimum temperatures in Melbourne. We’ll use Pandas to inspect the first few rows and understand its structure.
data['Date'] = pd.to_datetime(data['Date']) data.set_index('Date', inplace=True) print(data.head()) # Plotting the data plt.figure(figsize=(10, 6)) plt.plot(data, label='Daily Minimum Temperatures') plt.xlabel('Date') plt.ylabel('Temperature') plt.title('Daily Minimum Temperatures in Melbourne') plt.legend() plt.show()
Output:
Date Temp 1981-01-01 20.7 1981-01-02 17.9 1981-01-03 18.8 1981-01-04 14.6 1981-01-05 15.8 1981-01-01 20.7 1981-01-02 17.9 1981-01-03 18.8 1981-01-04 14.6 1981-01-05 15.8
Step 3: Preprocessing the data
Before diving into modeling the data, we need to prepare our data. Time series forecasting requires us to create sequences of past observations to predict future values. We will create sequences and corresponding labels.
def create_sequences(data, window_size): sequences = [] labels = [] for i in range(len(data) - window_size): sequences.append(data[i:i + window_size]) labels.append(data[i + window_size]) return np.array(sequences), np.array(labels) window_size = 30 data_values = data['Temp'].values sequences, labels = create_sequences(data_values, window_size)
Output:
Sequences shape: (3613, 30) Labels shape: (3613,)
Step 4: Splitting the Data
We need to split our data into training and testing sets. This will help us evaluate how well our model performs on unseen data.
split_ratio = 0.8 split_index = int(len(sequences) * split_ratio) x_train, x_test = sequences[:split_index], sequences[split_index:] y_train, y_test = labels[:split_index], labels[split_index:] print(f"Training set size: {x_train.shape[0]}") print(f"Test set size: {x_test.shape[0]}")
Output:
Training set size: 2890 Test set size: 723
Step 5: Splitting the Data
Now, it’s time to build our model. We’ll use an LSTM(Long Short-Term Memory) network, which is great for time series data because it can capture long-term dependencies.
model = Sequential() model.add(LSTM(50, return_sequences=True, input_shape=(window_size, 1))) model.add(LSTM(50)) model.add(Dense(1)) model.compile(optimizer='adam', loss='mean_squared_error') model.summary()
Output:
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lstm (LSTM) (None, 30, 50) 10400 _________________________________________________________________ lstm_1 (LSTM) (None, 50) 20200 _________________________________________________________________ dense (Dense) (None, 1) 51 ================================================================= Total params: 30,651 Trainable params: 30,651 Non-trainable params: 0 _________________________________________________________________
Step 6: Training the Model
We’re ready to train our model! We will convert our data into the right format and then fit the model.
x_train = np.expand_dims(x_train, axis=2) x_test = np.expand_dims(x_test, axis=2) history = model.fit(x_train, y_train, epochs=50, batch_size=32, validation_split=0.2)
Output:
Epoch 1/50 73/73 [==============================] - 5s 25ms/step - loss: 23.7394 - val_loss: 6.6627 Epoch 2/50 73/73 [==============================] - 1s 16ms/step - loss: 5.3046 - val_loss: 3.9453 ... Epoch 50/50 73/73 [==============================] - 1s 16ms/step - loss: 1.6107 - val_loss: 1.8825
Step 7: Evaluating the Model with metrics
Once the model is trained, we need to see how well it performs. We will use statistical metrics to evaluate our model’s performance.
from sklearn.metrics import mean_absolute_error, mean_squared_error import numpy as np # Make predictions predictions = model.predict(x_test) # Flatten the predictions array to match the shape of y_test predictions = predictions.flatten() # Calculate MAE mae = mean_absolute_error(y_test, predictions) print(f'Mean Absolute Error (MAE): {mae:.4f}') # Calculate MSE mse = mean_squared_error(y_test, predictions) print(f'Mean Squared Error (MSE): {mse:.4f}') # Calculate RMSE rmse = np.sqrt(mse) print(f'Root Mean Squared Error (RMSE): {rmse:.4f}')
Output:
Mean Absolute Error (MAE): 1.1215 Mean Squared Error (MSE): 2.0365 Root Mean Squared Error (RMSE): 1.4279
These metrics will provide you with numerical insights into how well your model is performing:
- MAE: On average, the model’s predictions are off by about 1.12 degrees.
- MSE: The average of the squared errors is about 2.04.
- RMSE: The average error is about 1.43 degrees, in the same units as the data.
Step 8: Fine-Tuning and Improvements
If your model is not performing as well as you’d like, don’t worry! It’s common to go back and tweak your model. You can try different window sizes, more epochs, or even different network architectures. Experimenting with hyperparameters and additional layers can significantly improve your model’s performance.
Conclusion
We’ve walked through the entire process of time series analysis using TensorFlow, from loading and exploring the data to building and evaluating a model. Time series forecasting is a power tool, and with TensorFlow, it’s easier than ever to get started. Keep experimenting and improving your models, and soon you’ll be a time series forecasting pro.
Happy coding!