By Kiran Reddy
In this project, we are going to learn how to scrap the web page, that is present in the tabular form, creating a database in an SQLite3, and storing the scraped data in it.
for this project, we are going to scrape the stock values of the tesla company from yahoo finance.
before going to the project make sure you installed Pandas and SQLite3 packages.
# installation of packages with pip command pip install pandas # sqlite3 is a standard library and it comes along with python so no need to install # in case you don't you the sqlite3 package execute the following command in terminal/command line pip install sqlite3 # installation of packages with conda command # if you are to install packages with conda command then you must be having anaconda in your computer # both pandas and sqlite3 packages come preinstalled with the anaconda # in case you don't you the sqlite3 package execute the following command in terminal/command line conda install -c conda-forge pandas conda install -c anaconda sqlite
now we are going to web scrap the stock values of tesla company from the yahoo finance website
note:- I'm executing all the codes in jupyter notebook, so make sure to use print function if your are executing the codes in text editors
importing packages
import pandas as pd import sqlite3
scraping the required data using pandas package
url = "https://finance.yahoo.com/quote/TSLA/history?p=TSLA" data = pd.read_html(url, match = "Date")
data
[ Date \ 0 Oct 06, 2020 1 Oct 05, 2020 2 Oct 02, 2020 3 Oct 01, 2020 4 Sep 30, 2020 .. ... 96 May 21, 2020 97 May 20, 2020 98 May 19, 2020 99 May 18, 2020 100 *Close price adjusted for splits.**Adjusted cl... [101 rows x 7 columns]]
now we have scraped the stock values of the tesla company and it is not the desired form(tabular). so we are performing some simple data cleaning operations.
now we are going to perform some data cleaning operations.
df = data[0].iloc[:100,:] df
make sure that the " df " variable type is a data frame.
type(df)
to insert the data frame into a database(sqlite3) we need to convert the data frame into a list of tuples
records = df.to_records(index=False) list_of_tuples = list(records) list_of_tuples
the output looks like this
[('Oct 06, 2020', '423.79', '428.78', '406.05', '413.98', '413.98', '48925000'), ('Oct 05, 2020', '423.35', '433.64', '419.33', '425.68', '425.68', '44722800'), ('Oct 02, 2020', '421.39', '439.13', '415.00', '415.09', '415.09', '71430000'), ('Oct 01, 2020', '440.76', '448.88', '434.42', '448.16', '448.16', '50741500'), ('Sep 30, 2020', '421.32', '433.93', '420.47', '429.01', '429.01', '48145600'), ('Sep 29, 2020', '416.00', '428.50', '411.60', '419.07', '419.07', '50219300'), ('Sep 28, 2020', '424.62', '428.08', '415.55', '421.20', '421.20', '49719600'),]
import sqlite3 conn = sqlite3.connect('TSLA.db') c = conn.cursor() # Create table c.execute('''CREATE TABLE teslastocks (Date, Open, High, Low, Close, Adjclose, Volume)''')
after the execution of the above program, we have successfully created a table named "teslastocks" in our database
teslastocks table has the attributes (Date, Open, High, Low, Close, Adjclose, Volume)
# Insert a row of data c.executemany('INSERT INTO teslastocks VALUES (?,?,?,?,?,?,?)', list_of_tuples) # Save (commit) the changes conn.commit()
for row in c.execute('SELECT * FROM teslastocks'): print(row)
('Oct 06, 2020', '423.79', '428.78', '406.05', '413.98', '413.98', '48925000') ('Oct 05, 2020', '423.35', '433.64', '419.33', '425.68', '425.68', '44722800') ('Oct 02, 2020', '421.39', '439.13', '415.00', '415.09', '415.09', '71430000') ('Oct 01, 2020', '440.76', '448.88', '434.42', '448.16', '448.16', '50741500') ('Sep 30, 2020', '421.32', '433.93', '420.47', '429.01', '429.01', '48145600') ('Sep 29, 2020', '416.00', '428.50', '411.60', '419.07', '419.07', '50219300') ('Sep 28, 2020', '424.62', '428.08', '415.55', '421.20', '421.20', '49719600') ('Sep 25, 2020', '393.47', '408.73', '391.30', '407.34', '407.34', '67208500') ('Sep 24, 2020', '363.80', '399.50', '351.30', '387.79', '387.79', '96561100') ('Sep 23, 2020', '405.16', '412.15', '375.88', '380.36', '380.36', '95074200') ('Sep 22, 2020', '429.60', '437.76', '417.60', '424.23', '424.23', '79580800') ('Sep 21, 2020', '453.13', '455.68', '407.07', '449.39', '449.39', '109476800')
similarly, you can scrape the stock values of any company and you can store in the database.
Submitted by Kiran Reddy (kirankumarreddy)
Download packets of source code on Coders Packet
Comments