Python “Collections” Module

This tutorial explains about the Collections module in python in detail. The python ‘collections’ module is a powerful utility that provides specialized containers datatypes beyond the built-in data structures(like lists, dictionaries ,sets, tuples).

It offers additional data structures such ,

  • counter
  • defaultdict
  • deque
  • namedtuple
  • OrderedDict
  • ChainMap

The above are some of the classes that were introduced in python language.

Counter

The “Counter” class in python collections module is used for counting the occurrence of the elements in collections. It’s essentially a specialized dictionary designed for counting hashable objects. Here’s a breakdown of its key features:

  1. Counting Elements: The primary purpose of  Counter is to count the occurrences of elements in a collection, typically iterables like lists or strings.For instance,
    mylist = [1, 1, 2, 3, 4, 4, 5, 5, 5]
    count = mylist.count(elementstobecount)
  2. Dictionary-Like Interface: Counter behaves like a dictionary, where the elements are stored as keys and their counts as values. For example,
    from collections import Counter
    
    # Create a Counter object by passing a list of elements
    counts = Counter(['dog', 'cat', 'dog', 'fox', 'cat', 'dog'])
    
    # Access the counts of specific elements using square brackets
    print(counts['dog'])  # Output: 3
    print(counts['cat']) # Output: 2
    

     

  3. Arithmetic Operations: Counter supports arithmetic operations like addition, subtraction, intersection, and union.
    count1 = Counter({'a': 3, 'b': 1})
    count2 = Counter({'a': 1, 'b': 2})
    
    # Addition
    print(count1 + count2)  # Output: Counter({'a': 4, 'b': 3})
    
    # Subtraction
    print(count1 - count2)  # Output: Counter({'a': 2})
    
    # Intersection (minimum of corresponding counts)
    print(count1 & count2)  # Output: Counter({'a': 1, 'b': 1})
    
    # Union (maximum of corresponding counts)
    print(count1 | count2)  # Output: Counter({'a': 3, 'b': 2})
    
  4. Useful Methods: Counter provides additional methods such as most_common() to retrieve the most common elements and elements() to return an iterator over the elements repeated according to their counts.

defaultdict

A defaultdict is a specialized dictionary-like container provided by Python’s collections module. It’s similar to the built-in dict type, but with one key difference: it automatically creates missing keys and initializes their values based on a default factory function provided by the user.

  1. Initialization: When you create a defaultdict, you provide it with a default factory function that defines the initial value for any missing key. This factory function can be any callable object, such as a function or a lambda expression. If no factory function is provided, the default value for missing keys will be None.
  2. Automatic Key Creation: When you try to access or modify a key that doesn’t exist in the defaultdict, instead of raising a keyerror as a regular dictionary would, a new key-value pair is automatically created. The value for the new key is initialized using the default factory function.
  3. Use Cases: defaultdict is particularly useful in scenarios where you need to handle missing keys gracefully, without having to explicitly check for their existence before accessing or modifying them. It simplifies code and makes it more concise.
from collections import defaultdict

# Sample string
text = "hello codespeedy"

# Create a defaultdict with default factory as int (defaults to 0)
char_count = defaultdict(int)

# Count the occurrences of each character in the string
for char in text:
    char_count[char] += 1

# Print the character count
for char, count in sorted(char_count.items()):
    print(f"Character '{char}' occurs {count} times.")

In the above code, We directly iterated over the characters in the string and update the counts in the char_count defaultdict. Then, we print the character counts in a sorted manner.

The output:

Character ' ' occurs 1 times.
Character 'c' occurs 1 times.
Character 'd' occurs 1 times.
Character 'e' occurs 3 times.
Character 'h' occurs 1 times.
Character 'l' occurs 2 times.
Character 'o' occurs 2 times.
Character 'p' occurs 1 times.
Character 's' occurs 1 times.
Character 'y' occurs 1 times.

deque

The deque class, an abbreviation for “double-ended queue,” forms an integral component of Python’s collections module. Offering a flexible data structure, it facilitates swift additions and removals from either end of the queue, rendering it highly adept for queue and stack implementations.

Some key features of deque,

  • Fast Operations
  • Memory Efficiency
  • Thread Safety

Versatility

from collections import deque

# Initialize a deque
queue = deque()

# Enqueue elements
queue.append(1)
queue.append(2)
queue.append(3)

# Dequeue elements
print(queue.popleft())
print(queue.popleft())

# Current queue
print(queue)

namedtuple

A namedtuple in Python is provided by the collections module that allows you to create tuple subclasses with named fields. It is like a regular tuple, but with named fields. It is great for greating lightweight,immutable data structures. You can access the fields using dot notation instead of indexing.

from collections import namedtuple

Person = namedtuple('Person', ['name', 'age','city'])

person1 = Person('Alice', 25, 'New York')

person2 = Person('Bob', 30, 'San Francisco')

print(person1.name)

print(person2.age)

print(person1.city)

The output:

Alice
30
New York

OrderedDict

OrderedDict, a distinctive dictionary subclass offered within Python’s collections module. It’s similar to a regular dictionary, but it maintains the order of the keys as they were inserted. This can be helpful when you need to preserve the order of elements in your dictionary.

from collections import OrderedDict

# Create an empty OrderedDict
my_dict = OrderedDict()

# Add key-value pairs to the OrderedDict

my_dict['dog'] = 3

my_dict['cat'] = 2

my_dict['cow'] = 5

# Print the OrderedDict

print(my_dict)

The output:

OrderedDict([('dog', 3), ('cat', 2), ('cow', 5)])

the order of the keys is preserved in the OrderedDict. If you were to use a regular dictionary, the order of the keys might not be maintained.

ChainMap

A ChainMap is a data structure provided by Python’s collections module that encapsulates multiple dictionaries into a single mapping. It’s used to combine multiple dictionaries into a single dictionary-like object. It allows you to access and manipulate multiple dictionaries as if they were a single dictionary.

from collections import ChainMap

# Create two dictionaries
dictionary1 = {'dog': 3, 'cat': 2}
dictionary2 = {'cow': 5, 'fox': 4}

# Create a ChainMap with the dictionaries
combined_dict = ChainMap(dict1, dict2)

# Access and modify the combined dictionary
print(combined_dict['dog'])
print(combined_dict['cow'])

combined_dict['cat'] = 1
print(combined_dict['cat'])

# Accessing a key not present in the first dictionary falls back to the second dictionary
print(combined_dict['fox']) 

The output:

3
5
1
4

To summarize ,The Python ‘collections’ module provides specialized data structures beyond the standard containers. It enhances efficiency and functionality with types like Counter, defaultdict, and deque and many more. These structures offer solutions for common programming tasks, such as counting elements, handling missing keys, and managing multiple dictionaries. The module’s versatility and ease of use make it an essential tool for developers across various domains.

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top