FIND ALL THE JAVASCRIPT FILES ON A WEBPAGE FROM URL IN PYTHON

This is the tutorial which describes ‘How to find all javascript files on a Webpage from a URL using Python. This tutorial is structured for easy understanding and practical application.

                       How to find all the javascript files on a                               webpage from a URL using python

JavaScript files are essential for making webpages interactive and dynamic. If you’re working on web scraping, penetration testing, or analyzing website dependencies, you may need to find all javascript files linked to a webpage. This tutorial will show you how to extract all javascript files from a given URL using Python.

Prerequisites

To follow along with this tutorial, ensure you have Python installed on your system. You will also need the following Python libraries:

  • requests to fetch the webpage content.
  • BeautifulSoup to parse HTML and extract <script> tags.
  • urllib.parse to handle URLs

You can install the required libraries using:

pip install requests beautifulsoup4

Steps to Extract Javascript Files

  1. send an HTTP request to fetch the webpage content.
  2. Parse the HTML to find all <script> tags.
  3. Extract the src attributes of <script> tags containing JavaScript files.

Python Code to Extract JavaScript Files

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
def find_js_files(url):
    try:
        response = requests.get(url)
        response.raise_for_status()
    except requests.exceptions.RequestException as e:
        print(f"Error fetching the URL: {e}")
        return []
    soup = BeautifulSoup(response.text,'html.parser')
    script_tags = soup.findall('script')

    js_files = []
    for script in script_tags:
        src = script.get('src')
        if src:
            full_url = urljoin(url, src)
            js_files.append(full_url)
    return js_files
#Example usage
url = "https://example.com"
js_files = find_js_files(url)

print("JavaScript files found:")
for js in js_files:
    print(js)

Explanation of the Code

  1. We send an HTTP GET request to the given URL using request.get(url).
  2. The response is parsed using BeautifulSoup to extract all <script> tags.
  3. We check for the src attribute in each <script> tag and construct the full URL using urljoin.
  4. All found JavaScript file URLs are stored in a list and printed.

Example Output

If you run the script on a sample webpage, you may get output like:

JavaScript files found:
https://example.com/static/js/script1.js
https://example.com/assets/main.js
https://cdn.example.com/jquery.js

Use Cases

  • Web scraping and analyzing website dependencies
  • Detecting external JavaScript file sources for security analysis
  • Understanding how a website is structured

This simple Python script can be extended further for advanced use cases like analyzing Javascript dependencies or integrating with penetration testing tools.

Conclusion

This script allows you to find all Javascript files on a webpage quickly. It is useful for analyzing web resources, debugging, and security audits. We can further enhance this by storing the results in a file or scanning multiple URLs.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top