To remove all the email IDs from a .txt in Python, you can use regular expressions (regex) to identify and remove the email addresses. Here’s a step-by-step guide:
- Read the content of the file.
- Use a regex pattern to identify email addresses.
- Replace the identified email addresses with an empty string.
- Write the cleaned content back to the file or a new file.
Here’s an example code to achieve this:
import re # Define the regex pattern for email addresses email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' # Read the content of the file with open('input.txt', 'r') as file: content = file.read() # Remove all email addresses cleaned_content = re.sub(email_pattern, '', content) # Write the cleaned content back to the file or to a new file with open('output.txt', 'w') as file: file.write(cleaned_content)
Explanation:
1.Regular Expression Pattern:
- \b: Word boundary to ensure we match complete words.
- @: Matches the @ symbol.
- [A-Za-z0-9.-]+: Matches the domain name.
2. Reading the file:
- The open(file_path, ‘r’) statement opens the file in read mode.
- file .read() reads the entire content of the file into a string.
3. Replacing Email Addresses:
- re.sub(email_pattern, ”, content) uses the re.sub function to replace all occurrences of the email pattern with an empty string.
4. Writing the Modified content:
- The open(file_path, ‘w’) statement opens the file in write mode (which will overwrite the existing content).
- file.write(cleaned_content) writes the cleaned content back to the file.