This example demonstrates the process of loading, normalizing, and batching the MNIST dataset using TensorFlow. The code showcases best practices for preparing data for training and validation in deep learning models.
Key Features:
- MNIST Dataset: Loads the famous MNIST dataset, which contains images of handwritten digits.
- Data Normalization: Scales the pixel values of the images to the [0, 1] range for better model performance.
- Shuffling and Batching: The training data is shuffled and divided into batches to optimize model training.
- Data Verification: Prints out the shapes of the batched images and labels to confirm successful data preparation.
Code:
# Step 1: Import necessary libraries import tensorflow as tf # Step 2: Load a built-in dataset (e.g., MNIST) mnist = tf.keras.datasets.mnist # Step 3: Split dataset into training and testing sets (x_train, y_train), (x_test, y_test) = mnist.load_data() # Step 4: Normalize the data x_train, x_test = x_train / 255.0, x_test / 255.0 # Step 5: Create a TensorFlow Dataset object from the NumPy arrays train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)) # Step 6: Shuffle and batch the training dataset train_dataset = train_dataset.shuffle(buffer_size=1024).batch(32) test_dataset = test_dataset.batch(32) # Step 7: Print a sample batch for verification for images, labels in train_dataset.take(1): print("Sample batch shape:", images.shape, labels.shape) # Step 8: Iterate through the dataset to confirm loading for images, labels in train_dataset.take(5): print("Batch images shape:", images.shape) print("Batch labels shape:", labels.shape)
OUTPUT:
Batch images shape: (32, 28, 28) Batch labels shape: (32,) Batch images shape: (32, 28, 28) Batch labels shape: (32,) Batch images shape: (32, 28, 28) Batch labels shape: (32,) Batch images shape: (32, 28, 28) Batch labels shape: (32,) Batch images shape: (32, 28, 28) Batch labels shape: (32,)
EXPLANATION:
This program loads the MNIST dataset using TensorFlow, normalizes the image data, and creates TensorFlow Dataset objects for training and testing. It shuffles and batches the data for efficient model training. The code prints the shape of a sample batch of images and labels. This is a common preprocessing step for machine learning tasks using TensorFlow.