Find repeated words in a string in Python

In this program, we will learn about how  to find out the duplicate words present in the string and display those words.

Here are the steps to find repeated words in a string in Python:

Step1: First we need to split the string into word

Step2: Create a set to store unique words

Step3: Create another set to store duplicate words

Step4: Using a loop, typically a for loop in Python, to iterate over each element in the list.

Step5: Check if the word is already in the set of unique words, If the current word is already in the set of unique words, it means it’s a duplicate. Add it to the set of duplicate words.

  • Add the word to the set of unique words: If the current word is not in the set of unique words, add it to the set.
  • Display the duplicate words: After iterating through all the words in the input string, the set of duplicate words will contain all the repeated words. You can then display these duplicate words.
"Buy apples, oranges, bananas, mangoes" In this list she loves apples.

To identify duplicate words in the string, we initially break down the string into individual words. We count the occurrence of each word in the string. If a word occurs more than once, it indicates the presence of duplicates within the string.

In the above example, the word “apples” is repeated, indicating a duplicate word in the list.

Input: "I love ice cream, whether it's vanilla or chocolate, but I especially love ice cream on hot summer days."
Output:  love


Input: "The sun shines brightly in the sky."
Output:  No Repetition

Ways to find repeated word in a string in Python

There are several ways to find repeated words in a string in Python. Here are a few common methods:

  • Using a Dictionary
  • Using Sets
  • Using Collections
  • Using Regular Expressions

Each method has its advantages and may be more suitable depending on the specific requirements and characteristics of the input string.

Using Dictionary

To find repeated words in a string using a dictionary in Python ,Divide the string into individual words, then go through each word one by one. Use a dictionary to keep track of how many times each word appears. If a word appears more than once in the dictionary, it means it’s repeated within the string.

def find_repeated_words(input_string):
    word_count = {}
    for word in input_string.split():
        word_count[word] = word_count.get(word, 0) + 1
    return [word for word, count in word_count.items() if count > 1]

sentence = "Sara had been claiming that she had been present at the event"
repeated_words = find_repeated_words(sentence)
print("Repeated words:", repeated_words)
OUTPUT:
Repeated words: ['had', 'been']




Using Sets

In this, divide the string into individual words and add them to a set. As you go through each word, if it’s already present in the set, it indicates a word that has been repeated.

Using Collections

You can use the Counter class from the collections module, which is a subclass of dictionary, to count the occurrences of each word in the string.

Using Regular Expressions

Regular expressions can also be used to find repeated words in a string. You can use the re.findall() function to find all occurrences of words and then count them.

Difference in Find the repeated word in a string in Python

The differences between finding repeated words in a string using a dictionary, sets, collections, and regular expressions , as well as the complexity and efficiency of the approach are:

Using a Dictionary:

  • This method involves going through every word in the string and keeping track of how many times each word appears by using a dictionary.
  • Words with counts greater than 1 are considered repeated.
  • This method is easy to understand, but it may be less efficient for very large strings due to the overhead of dictionary operations.

Using Sets:

  • This technique includes breaking down the string into words and storing them in a set.
  • During iteration through the words, if a word is already present in the set, it’s recognized as repeated.
  • Sets automatically handles uniqueness, making this method suitable and effective, especially for extensive strings.

Using Collections:

  • This method often employs the Counter class from the collections module, a specialized version of a dictionary.
  • The Counter class simplifies the process of counting item occurrences in an iterable, offering a handy choice for finding repeated words.
  • It merges the simplicity of a dictionary-based approach with the specialized features provided by the Counter class.

Using Regular Expressions:

  • Regular expressions present a potent tool for pattern matching in strings.
  • By leveraging the re.findall() function, it’s feasible to identify all instances of words and then count them.
  • Although more intricate and less straightforward compared to other methods, it offers flexibility for addressing complex patterns or requirements.

Purpose of Find the duplicate words in a string in python

  • Before performing data processing , it’s important to clean and preprocess the data. Identifying duplicate words allows us to remove redundancy and ensure data accuracy.
  • Removing duplicate words can optimize memory and storage usage, especially in applications dealing with large volumes of text data.
  • By eliminating duplicates, we can reduce the size of data structures and improve computational efficiency.
  • Also improving Algorithm Performance by eliminating the duplicate values

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top