Detect slang words from a.txt file in Python

To detect slang words from a text file in Python, you generally don’t need any external modules, it can be achieved using Python’s built-in functions and standard libraries.

Here’s the list of built-in functions :

os: For handling file paths.
re: For regular expressions if you need more advanced text processing
string: For string manipulation (though not always necessary).

Here’s the list of standard libraries:

- NLTK (Natural Language Toolkit): For advanced text processing and tokenization.
- SpaCy: Another powerful library for NLP (Natural Language Processing).
- Pandas: For handling and processing data, although it’s more useful for structured data.
- Collections: Specifically, Counter from collections can be useful for counting occurrences of words.

Detect slang words using built-in functions in python

Here we can have the list of slang words. I have read the file, split the file content and detect slang words and detect slang

slang_words = [
    'bruh', 'lit', 'fam', 'dope', 'bae', 'yolo', 'gucci', 'savage', 
    'salty', 'thirsty', 'ghost', 'throwing shade', 'woke', 'fomo',
    'stan', 'slay', 'goat', 'sus', 'flex', 'tea', 'clap back', 'basic'
]

def read_file(file_path):
    with open(file_path, 'r') as file:
        contents = file.read().lower()  
    return contents

def split_into_words(text):
    words = text.split()
    return words

def detect_slang_words(words, slang_words):
    detected_slang = set()  
    for word in words:
        if word in slang_words:
            detected_slang.add(word)
    return detected_slang

def detect_slang(file_path):
    contents = read_file(file_path)
    words = split_into_words(contents)
    detected_slang = detect_slang_words(words, slang_words)
    return detected_slang

Output : Detected slang words: {'lit', 'yolo', 'woke'}

Detect slang words using NLTK

To use the module, first I installed the package on my local system using the command prompt on Windows (Terminal for macOS/Linux users).

pip install nltk

Once installed, import the package to your code

from nltk.tokenize import word_tokenize

I have created a script to read the file, tokenize the contents, and detect slang words using NLTK.

import nltk
from nltk.tokenize import word_tokenize
import string


nltk.download('punkt')


slang_words = {
    'bruh', 'lit', 'fam', 'dope', 'bae', 'yolo', 'gucci', 'savage', 
    'salty', 'thirsty', 'ghost', 'throwing shade', 'woke', 'fomo',
    'stan', 'slay', 'goat', 'sus', 'flex', 'tea', 'clap back', 'basic'
}

def detect_slang(file_path):
    with open(file_path, 'r') as file:
        contents = file.read().lower()  
    
    
    tokens = word_tokenize(contents)
    
    
    words = [word for word in tokens if word.isalnum()]
    
    detected_slang = set()  
    
    for word in words:
        if word in slang_words:
            detected_slang.add(word)
    
    return detected_slang

Output : Detected slang words: {'lit', 'fam', 'dope'}

Detect slang words using built-in functions in python

Detect slang words using NLTK

Related Posts

Leave a Comment Cancel Reply