Coders Packet

Scraping the Web Page and Storing it in an SQLite 3 Database using Python

By Kiran Reddy

In this project, we are going to learn how to scrap the web page, that is present in the tabular form, creating a database in an SQLite3, and storing the scraped data in it.

web scraping the yahoo finance

for this project, we are going to scrape the stock values of the tesla company from yahoo finance.

before going to the project make sure you installed Pandas and SQLite3 packages.

# installation of packages with pip command

pip install pandas


# sqlite3 is a standard library and it comes along with python so no need to install

# in case you don't you the sqlite3 package execute the following command in terminal/command line

pip install sqlite3



# installation of packages with conda command
# if you are to install packages with conda command then you must be having anaconda in your computer
# both pandas and sqlite3 packages come preinstalled with the anaconda
# in case you don't you the sqlite3 package execute the following command in terminal/command line

conda install -c conda-forge pandas

conda install -c anaconda sqlite

 

now we are going to web scrap the stock values of tesla company from the yahoo finance website

note:- I'm executing all the codes in jupyter notebook, so make sure to use print function if your are executing the codes in text editors

importing packages 

import pandas as pd
import sqlite3

scraping the required data using pandas package

 

url = "https://finance.yahoo.com/quote/TSLA/history?p=TSLA"
data = pd.read_html(url, match = "Date")
data
[                                                  Date  \
 0                                         Oct 06, 2020   
 1                                         Oct 05, 2020   
 2                                         Oct 02, 2020   
 3                                         Oct 01, 2020   
 4                                         Sep 30, 2020   
 ..                                                 ...   
 96                                        May 21, 2020   
 97                                        May 20, 2020   
 98                                        May 19, 2020   
 99                                        May 18, 2020   
 100  *Close price adjusted for splits.**Adjusted cl...   
 [101 rows x 7 columns]]

 

 

now we have scraped the stock values of the tesla company and it is not the desired form(tabular). so we are performing some simple data cleaning operations.

 

 

 

Data Cleaning

now we are going to perform some data cleaning operations.

df = data[0].iloc[:100,:]
df
  Date Open High Low Close* Adj Close**
0 Oct 06, 2020 423.79 428.78 406.05 413.98 413.98 48925000
1 Oct 05, 2020 423.35 433.64 419.33 425.68 425.68 44722800
2 Oct 02, 2020 421.39 439.13 415.00 415.09 415.09 71430000
3 Oct 01, 2020 440.76 448.88 434.42 448.16 448.16 50741500
4 Sep 30, 2020 421.32 433.93 420.47 429.01 429.01 48145600
... ... ... ... ... ... ... ...
95 May 22, 2020 164.43 166.36 162.40 163.38 163.38 49937500
96 May 21, 2020 163.20 166.50 159.20 165.52 165.52 61273000
97 May 20, 2020 164.10 165.20 162.36 163.11 163.11 36546500
98 May 19, 2020 163.03 164.41 161.22 161.60 161.60 48182500
99 May 18, 2020 165.56 166.94 160.78 162.73 162.73 58490500

100 rows × 7 columns

 make sure that the " df " variable type is a data frame.

type(df)
pandas.core.frame.DataFrame

 

 

Converting DataFrame into a list of tuples

to insert the data frame into a database(sqlite3) we need to convert the data frame into a list of tuples

records = df.to_records(index=False)
list_of_tuples = list(records)
list_of_tuples

the output looks like this

[('Oct 06, 2020', '423.79', '428.78', '406.05', '413.98', '413.98', '48925000'),
 ('Oct 05, 2020', '423.35', '433.64', '419.33', '425.68', '425.68', '44722800'),
 ('Oct 02, 2020', '421.39', '439.13', '415.00', '415.09', '415.09', '71430000'),
 ('Oct 01, 2020', '440.76', '448.88', '434.42', '448.16', '448.16', '50741500'),
 ('Sep 30, 2020', '421.32', '433.93', '420.47', '429.01', '429.01', '48145600'),
 ('Sep 29, 2020', '416.00', '428.50', '411.60', '419.07', '419.07', '50219300'),
 ('Sep 28, 2020', '424.62', '428.08', '415.55', '421.20', '421.20', '49719600'),]

 

 

Creating Records in SQLite3 Database

import sqlite3
conn = sqlite3.connect('TSLA.db')

c = conn.cursor()

# Create table
c.execute('''CREATE TABLE teslastocks
             (Date, Open, High, Low, Close, Adjclose, Volume)''')
 

 

after the execution of the above program, we have successfully created a table named "teslastocks" in our database

teslastocks table has the attributes (Date, Open, High, Low, Close, Adjclose, Volume)

 

Inserting the values into the table

# Insert a row of data
c.executemany('INSERT INTO teslastocks VALUES (?,?,?,?,?,?,?)', list_of_tuples)
# Save (commit) the changes
conn.commit()

 

Retrieving the Data from the Database

for row in c.execute('SELECT * FROM teslastocks'):
    print(row)

('Oct 06, 2020', '423.79', '428.78', '406.05', '413.98', '413.98', '48925000') ('Oct 05, 2020', '423.35', '433.64', '419.33', '425.68', '425.68', '44722800') ('Oct 02, 2020', '421.39', '439.13', '415.00', '415.09', '415.09', '71430000') ('Oct 01, 2020', '440.76', '448.88', '434.42', '448.16', '448.16', '50741500') ('Sep 30, 2020', '421.32', '433.93', '420.47', '429.01', '429.01', '48145600') ('Sep 29, 2020', '416.00', '428.50', '411.60', '419.07', '419.07', '50219300') ('Sep 28, 2020', '424.62', '428.08', '415.55', '421.20', '421.20', '49719600') ('Sep 25, 2020', '393.47', '408.73', '391.30', '407.34', '407.34', '67208500') ('Sep 24, 2020', '363.80', '399.50', '351.30', '387.79', '387.79', '96561100') ('Sep 23, 2020', '405.16', '412.15', '375.88', '380.36', '380.36', '95074200') ('Sep 22, 2020', '429.60', '437.76', '417.60', '424.23', '424.23', '79580800') ('Sep 21, 2020', '453.13', '455.68', '407.07', '449.39', '449.39', '109476800')
 

 similarly, you can scrape the stock values of any company and you can store in the database.

 

 

 

 

Download Complete Code

Comments

No comments yet