How to remove all the email ids from a .txt file in Python

To remove all the email IDs from a .txt in Python, you can use regular expressions (regex) to identify and remove the email addresses. Here’s a step-by-step guide:

  • Read the content of the file.
  • Use a regex pattern to identify email addresses.
  • Replace the identified email addresses with an empty string.
  • Write the cleaned content back to the file or a new file.

Here’s an example code to achieve this:

import re

# Define the regex pattern for email addresses
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

# Read the content of the file 
with open('input.txt', 'r') as file: 
    content = file.read()
# Remove all email addresses
cleaned_content = re.sub(email_pattern, '', content)

# Write the cleaned content back to the file or to a new file
with open('output.txt', 'w') as file:
    file.write(cleaned_content)

Explanation:

1.Regular Expression Pattern:

  • \b: Word boundary to ensure we match complete words.
  • @: Matches the @ symbol.
  • [A-Za-z0-9.-]+: Matches the domain name.

2. Reading the file:

  • The open(file_path, ‘r’) statement opens the file in read mode.
  • file .read() reads the entire content of the file into a string.

3. Replacing Email Addresses:

  • re.sub(email_pattern, ”, content) uses the re.sub function to replace all occurrences of the email pattern with an empty string.

4. Writing the Modified content:

  • The open(file_path, ‘w’) statement opens the file in write mode (which will overwrite the existing content).
  • file.write(cleaned_content) writes the cleaned content back to the file.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top