URL Link scanner from given html file written in C++ language

By Koppula Bhanu Prakash Reddy

It is not easy to work on URLs using C++ so this project can help in developing a library in C++ for extracting URL links present in the given HTML files or codes.

The packet has 6 files out of which "main.cpp" is the only file necessary for running the program, the rest of the files are just input files that can be given as input for the code.

The code will take a file with the extension (.html or .txt (a txt file with HTML code in it)) as input and scans all the links present in the and classifies the links into a referred link if the link in Html code was present in "" this tag which implies that the URL was actually referred not only mentioned.

Referred links are those which are linked from the given HTML code but there can few links which are not referred to but only just mentioned in the given HTML code.

The code will generate an output file with all the referred links that are scanned from the given HTML code.

The regex header file from C++ is used to run a regular expression in C++ for extracting URL links and a few STL for storing the extracted data.

