Bitcoin Price Prediction Using Machine Learning

Overview

This Jupyter Notebook illustrates a machine learning pipeline for predicting Bitcoin price movements using historical price data. The pipeline consists of data loading, data preprocessing, feature engineering, model training, hyperparameter tuning, and evaluation of a Long Short-Term Memory (LSTM) network.

Dataset Description

The dataset used in this model is the historical Bitcoin price data (BTC-USD), which includes 2713 samples with the following features:

Open: Opening price of Bitcoin
High: Highest price of Bitcoin
Low: Lowest price of Bitcoin
Close: Closing price of Bitcoin
Adj Close: Adjusted closing price of Bitcoin
Volume: Trading volume of Bitcoin
Technical Indicators: Includes features like SMA, EMA, RSI, MACD, Bollinger Bands, and lag features.

The purpose of this model is to classify future price movements (up or down) based on the historical price data and technical indicators.

Load the Dataset

The dataset is imported using Pandas:

“`import pandas as pd

crypto = pd.read_csv(“/content/drive/My Drive/Colab Notebooks/BTC-USD.csv”)

“`

Data Preprocessing

The data is preprocessed by converting the ‘Date’ column to datetime format and setting it as the index. New features are created to enhance the model’s predictive capabilities.

`# Convert Date column to datetime
crypto[‘Date’] = pd.to_datetime(crypto[‘Date’])
crypto.set_index(‘Date’, inplace=True)

# Feature Engineering
crypto[‘open-close’] = crypto[‘Open’] – crypto[‘Close’]
crypto[‘low-high’] = crypto[‘Low’] – crypto[‘High’]
crypto[‘target’] = np.where(crypto[‘Close’].shift(-1) > crypto[‘Close’], 1, 0)

Feature Extraction

Technical indicators are calculated to provide additional insights into price movements:

`# Simple Moving Averages (SMA)
crypto[‘SMA_7’] = crypto[‘Close’].rolling(window=7).mean()
crypto[‘SMA_21’] = crypto[‘Close’].rolling(window=21).mean()

# Exponential Moving Average (EMA)
crypto[‘EMA_7’] = crypto[‘Close’].ewm(span=7, adjust=False).mean()
crypto[‘EMA_21’] = crypto[‘Close’].ewm(span=21, adjust=False).mean()

# Relative Strength Index (RSI)
delta = crypto[‘Close’].diff(1)
gain = (delta.where(delta > 0, 0)).rolling(window=14).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=14).mean()
rs = gain / loss
crypto[‘RSI’] = 100 – (100 / (1 + rs))

Data Splitting

The data is divided into training and test sets with an 80-20 ratio:

`from sklearn.model_selection import train_test_split

X = crypto.drop(columns=[‘target’])
y = crypto[‘target’]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=False)

Model Creation & Training

An LSTM model is created and trained on the training data:

`from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Bidirectional

model = Sequential([
Bidirectional(LSTM(100, return_sequences=True, dropout=0.2, recurrent_dropout=0.2), input_shape=(X_train.shape[1], 1)),
Bidirectional(LSTM(100, dropout=0.2, recurrent_dropout=0.2)),
Dense(50, activation=’relu’),
Dropout(0.3),
Dense(1)
])

model.compile(optimizer=’adam’, loss=’mean_squared_error’)
history = model.fit(X_train_seq, y_train_seq, epochs=200, batch_size=32, validation_data=(X_test_seq, y_test_seq))

Model Evaluation

The model’s performance is evaluated using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R²) score:

`from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

y_pred = model.predict(X_test_seq)
mae = mean_absolute_error(y_test_seq, y_pred)
rmse = np.sqrt(mean_squared_error(y_test_seq, y_pred))
r2 = r2_score(y_test_seq, y_pred)

print(f”Mean Absolute Error (MAE): {mae:.4f}”)
print(f”Root Mean Squared Error (RMSE): {rmse:.4f}”)
print(f”R-squared (R²) Score: {r2:.4f}”)