Slang word detection in Python

Method 1: Detect Slang Words Using Built-in Functions

A predefined list of slang words.

slang_words = [
    'bruh', 'lit', 'fam', 'dope', 'bae', 'yolo', 'gucci', 'savage', 
    'salty', 'thirsty', 'ghost', 'throwing shade', 'woke', 'fomo',
    'stan', 'slay', 'goat', 'sus', 'flex', 'tea', 'clap back', 'basic'
]

The below function opens the specified file file_path, reads its contents, converts them to lowercase (to ensure case insensitivity), and returns the contents as a string.

def read_file(file_path):
    with open(file_path, 'r') as file:  
        contents = file.read().lower()  
    return contents

The below function splits the given text into individual words and returns them as a list.

def split_into_words(text):
    words = text.split()  
    return words

The below function takes a list of words and a list of slang words, checks if any word from the list is in the slang words list, and adds the detected slang words to a set. It returns the set of detected slang words.

def detect_slang_words(words, slang_words):
    detected_slang = set()  
    for word in words:  
        if word in slang_words:  
            detected_slang.add(word)  
    return detected_slang

The below function combines the previous functions to read the file, split the contents into words, and detect slang words. It returns the detected slang words.

def detect_slang(file_path):
    contents = read_file(file_path) 
    words = split_into_words(contents) 
    detected_slang = detect_slang_words(words, slang_words)  
    return detected_slang

Assume a text file file.txt that consists of a paragraph to detect the slang words in it.
The below piece of code specifies the path to the file, detects slang words in the file, and prints the detected slang words.

file_path = 'file.txt'  
detected_slang = detect_slang(file_path)  
print("Detected slang words:", detected_slang)

The output is as follows:

Method 2: Detect Slang Words Using NLTK

The following piece of code imports the NLTK library, word_tokenize function for tokenization, string module (not used here, but often useful for text processing) and downloads the tokenizer models required for word_tokenize.

import nltk
from nltk.tokenize import word_tokenize
import string
nltk.download('punkt')

A predefined list of slang words.

slang_words = [
    'bruh', 'lit', 'fam', 'dope', 'bae', 'yolo', 'gucci', 'savage', 
    'salty', 'thirsty', 'ghost', 'throwing shade', 'woke', 'fomo',
    'stan', 'slay', 'goat', 'sus', 'flex', 'tea', 'clap back', 'basic'
]

The below function opens the file specified by file_path, reads its contents, converts them to lowercase, tokenizes the contents into words using NLTK’s word_tokenize, filters out non-alphanumeric tokens, and detects slang words by checking each token against the slang words set. It returns the detected slang words.

def detect_slang(file_path):
    with open(file_path, 'r') as file:  
        contents = file.read().lower()  
    tokens = word_tokenize(contents)  
    words = [word for word in tokens if word.isalnum()]  
    detected_slang = set()  
    for word in words:  
        if word in slang_words:  
            detected_slang.add(word) 
    return detected_slang

file_path = 'file.txt'  
detected_slang = detect_slang(file_path)  
print("Detected slang words:", detected_slang)

The output is as follows:

Summary

Method 1 uses built-in functions for file reading, text splitting, and slang detection.
Method 2 uses NLTK for more advanced text processing, specifically tokenization.

Slang word detection in Python

Method 1: Detect Slang Words Using Built-in Functions

Method 2: Detect Slang Words Using NLTK

Summary

Related Posts

Leave a Comment Cancel Reply