Python Tutorial

In this article, we'll examine a variety of Python programs that let us quickly spot duplicate sets among a list of sets. To complete this task, we will make use of Python's robust set operations and functional programming features. We will also go over several techniques and approaches for dealing with duplicates according to particular needs.

In this article, we will discover effective methods for comparing sets for equality, identifying duplicate sets based on their elements, and removing or modifying duplicate sets from a list of sets.

Introduction to the Problem:

The given list of sets may contain sets with duplicate elements, and the task is to efficiently identify these duplicate sets based on their contents and take appropriate actions, such as removing or modifying them.

# If the input is:
Set_List = [
    {1, 2, 3},
    {4, 7, 2, 9},
    {5, 7, 6},
    {3, 1, 2},
    {6, 7, 5, 4},
    {6, 7, 5}
]

# The output should be:
Set_List = [
    {1, 2, 3},
    {4, 7, 2, 9},
    {5, 7, 6},
    {6, 7, 5, 4},
]

"""Explanation - In the above-given list of set, ('{1, 2, 3}' and '{3, 1, 2}') and ('{5, 7, 6}' and '{6, 7, 5}') are equal sets so we must remove them from the list."""

The problem can be divided into the following essential steps:

Taking a list of sets as input, each of which may include either unique or redundant entries.
Considering the arrangement of the elements within the sets, comparing the sets in the list to find duplicate sets based on their elements.
Developing a strategy or algorithm for dealing with duplicate sets, such as deleting or changing them in accordance with particular needs.
Giving the complete list of sets, with duplicate sets properly handled in accordance with the specified approach.

Python program to remove duplicate sets from a list of sets: Uising frozenset() and set comprehension

We can use the set() function, which removes duplicates from a list, to remove the duplicate sets from the Set_List, However, since sets are not hashable, we need to convert each set in Set_List to a frozenset object which is hashable.

# Define a list of sets to check for duplicates
Set_List = [
    {1, 2, 3, 1},
    {4, 7, 2, 9},
    {5, 7, 6},
    {3, 1, 2},
    {6, 7, 5, 4},
    {6, 7, 5, 5}
]

# Create a list of unique sets
# First, convert each set in Set_List to a frozenset to make it hashable and immutable.
# Then, pass this iterable of frozensets to the set() function to remove any duplicate.
# Finally, convert the resulting set back to a list to make it easier to work with.
unique_sets = list(set(frozenset(s) for s in Set_List))

# Print out the list of unique sets
print(f"Unique Sets: {unique_sets}")

Output:

Unique Sets: [frozenset({1, 2, 3}), frozenset({5, 6, 7}), frozenset({9, 2, 4, 7}), frozenset({4, 5, 6, 7})]

Now let's get started and explore the Python programs that find duplicate sets in a list of sets.

Method 1 - Using for-Loop and frozenset()

In this method, we will iterate over each set in Set_List and find the duplicate sets using the set lookup operation. If we find any duplicate set, we will store it in a new list.

# Define a list of sets to check for duplicates
Set_List = [
    {1, 2, 3},
    {4, 7, 2, 9},
    {5, 7, 6, 6},
    {3, 1, 2},
    {6, 7, 5, 4},
    {6, 7, 5}
]

# Define an empty list to store any duplicate sets
duplicate_sets = []

# Define an empty set to keep track of sets that have been seen
seen_sets = set()

# Iterate over each set in Ser_List
for s in Set_List:
    # Convert the set to a frozenset to make it hashable and immutable
    frozenset_s = frozenset(s)
    
    # Check if the frozenset has already been seen
    if frozenset_s in seen_sets:
        # If it has, then this set is a duplicate
        duplicate_sets.append(s)
    else:
        # If it hasn't, then add the frozenset to seen_sets
        seen_sets.add(frozenset_s)

# Check if any duplicate sets were found
if len(duplicate_sets) != 0:
    print(f"Duplicate sets found: {duplicate_sets}")
else:
    # If there were no duplicates
    print("No duplicate sets found.")

Output:

Duplicate sets found: [{1, 2, 3}, {5, 6, 7}]

In this program, we create an empty set called seen_sets to keep track of sets we've seen before and duplicate_sets to store duplicate sets. We then iterate over each set s in Set_List.

For each set, we create a frozenset object 'frozenset_s', which is hashable. We then check if frozenset_s is already in seen_sets. If it is, then s is a duplicate set, and we append it to duplicate_sets. If it isn't, we add frozenset_s to seen_sets to check for duplicates in future iterations.

Finally, if duplicate_sets is not empty, we print out each duplicate set. Otherwise, we print out a message indicating no duplicate sets were found.

Time Complexity: O(n) - The program uses a single for-loop to iterate through each set in Set_List. So, the time complexity of the loop is O(n), where n is the length of Set_List. Within the loop, there is a set lookup operation to check if a frozenset has already been seen, which takes O(1) time. Where n is the number of sets in the set_list.

Space Complexity: O(m) - where m is the number of unique sets. Since each set is stored as a frozenset in seen_sets, the maximum size of seen_sets is equal to the number of unique sets in the input list. In the worst case, it can be O(n) when m becomes equal to n, and in the best case, it can be O(1) when m becomes equal to 1.

Method 2 - Using Counter() from collections

In this method, we will use the Counter from collections to count the occurrences of each set. Then, we will filter out the set with an occurrence count of more than 1 and store them into the duplicate_sets as usual.

# Importing Counter from collections
from collections import Counter

# Define a list of sets to check for duplicates
set_list = [
    {1, 2, 2, 3},
    {4, 7, 2, 9},
    {5, 7, 6},
    {3, 1, 2},
    {6, 7, 5, 4},
    {6, 7, 5},
    {1, 2, 3, 1},
    {4, 7, 2, 9}
]

# Count the occurrence of each set
set_count = Counter([frozenset(s) for s in set_list])

# Find the sets that occur more than once
duplicate_sets = [set(s) for s in set_count if set_count[s] > 1]

# Print the duplicate sets, if any
if len(duplicate_sets) != 0:
    print(f"Duplicate sets found: {duplicate_sets}")
else:
    # If there were no duplicates
    print("No duplicate sets found.")

Output:

Duplicate sets found: [{1, 2, 3}, {9, 2, 4, 7}, {5, 6, 7}]

In this method, we have used list comprehension to convert each set into a frozen sets. It is necessary because sets are not hashable and cannot be used as a key in the dictionary. Then we passed this list of frozen sets to the counter class, which returns a Counter object with each frozen set as a key and their count as the key's value.

We have then filtered out the keys which have a count value greater than 1 and stored them in duplicate_sets after converting them again into a set. And at the end of the program, we printed the result.

Time complexity: O(n) - The for-loop iterates over the set_list once and performs constant-time lookups and comparisons of frozen sets. Where n is the number of sets in the set_list.

Space Complexity: O(m) - where m is the number of unique sets. The counter object stores the count of each set, and the space required by it depends on the number of unique sets. And the duplicate sets also require some space which is comparatively very small. So, the overall space complexity of the program becomes O(m).

Method 3 - Using defaultdict() from collections

In this method, we will use defaultdict from collections to keep track of the number of occurrences of each set in set_list. After counting the frequency, we will use list comprehension to filter out duplicate sets.

# Importing defaultdict from collections
from collections import defaultdict

# Define a list of sets to check for duplicates
Set_List = [
    {1, 1, 2, 3},
    {2, 3, 6, 4},
    {5, 1, 12},
    {0, 11, 2, 16},
    {6, 7, 5, 4},
    {4, 6, 7, 5},
    {12, 5, 1, 5}
]

# Create a defaultdict to keep track of the frequency of each set
set_counts = defaultdict(int)

# Count the frequency of each set
for s in Set_List:
    set_counts[frozenset(s)] += 1
print(set_counts)

# Check for sets that appear more than once
duplicate_sets = [set(s) for s in set_counts if set_counts[s] > 1]

# Print the duplicate sets, if any
if len(duplicate_sets) != 0:
    print(f"Duplicate sets found: {duplicate_sets}")
else:
    # If there were no duplicates
    print("No duplicate sets found.")

Output:

Duplicate sets found: [{1, 12, 5}, {4, 5, 6, 7}]

In the above program, we have created a defaultdict object, 'set_counts', a subclass of dict in Python, to keep track of the frequency of each set in set_list. We iterate over each set in set_list, and increase the frequency count by 1 using set_counts[frozenset(s)] += 1.

After that, we filtered out the sets which have a frequency count greater than 1 indicating duplicate sets using list comprehension. And finally, we have printed the result.

Time Complexity: O(n) - Here also, we have iterated over the set_list, which takes O(n) time, and we have again iterated over set_list in the list comprehension method, which also takes O(n) time. Overall, the time complexity becomes O(n) + O(n) = O(n), where n is the number of sets in set_list.

Space Complexity: O(m) - where m is the number of unique sets in the set_list. The set_counts dictionary stores the frequency of unique sets in the set_list, and the duplicate_sets list stores the duplicate sets. In the worst case, if m @ n means all the sets are unique, the space complexity will rise to O(n), and in the best case, where all the sets are duplicates (m = 1), then the space complexity will become constant, O(1).

Method 4 - Using Hashing with Dictionary

Hashing refers to the method of converting a non-hashable object into a hashable and immutable form. This is done so that we can use the sets in set_list as keys in a dictionary that maps each key to a different hash value using a hash function.

# Define a list of sets to check for duplicates
set_list = [
    {11, 22, 23},
    {4, 7, 9, 4},
    {15, 11, 22, 24},
    {1, 2, 3},
    {6, 7, 5, 4},
    {6, 4, 6, 7, 5},
    {23, 11, 22}
]

# Create an empty dictionary to store sets and their frequencies
freq_dict = {}

# Iterate over each set in set_list
for s in set_list:
    # Convert the set to a tuple and hash it
    set_hash = hash(tuple(s))
    
    # Check if the hash value is already in the dictionary
    if set_hash in freq_dict:
        # If it is, increment the frequency of this set
        freq_dict[set_hash] += 1
    else:
        # If it isn't, add the hash value to the dictionary with a frequency of 1
        freq_dict[set_hash] = 1

# Create an empty list to store any duplicate sets
duplicate_sets = []

# Iterate over each set in set_list again
for s in set_list:
    # Convert the set to a forzenset or a tuple and hash it
    set_hash = hash(tuple(s))
    
    # Check if the frequency of the hash value is greater than 1
    if freq_dict[set_hash] > 1:
        # If it is, check if the set has already been added to duplicate_sets
        if s not in duplicate_sets:
            # If it hasn't, add it to duplicate_sets
            duplicate_sets.append(s)

# Check if any duplicate sets were found
if len(duplicate_sets) != 0:
    # If there were duplicates
    print(f"Duplicate sets found: {duplicate_sets}")
else:
    # If there were no duplicates
    print("No duplicate sets found.")

Output:

Duplicate sets found: [{11, 22, 23}, {4, 5, 6, 7}]

In this method, we have used a dictionary to store the hash value of each set and its frequency. We first iterate over each set in set_list and get the hash value using the hash() function. Then we check whether the hash value is already in the dictionary or not. If yes, we increase the frequency of the set. Otherwise, we set the frequency as 1 for that set.

We have again iterated over the sets in set_list to find the duplicate sets and check their frequencies. The set having a frequency greater than one means it is a duplicate set and needs to be added to the duplicate_sets list. And at the end of the program, we have printed the result.

Time Complexity: O(n) - In the above program, we have iterated over the set_list twice O(n) + O(n) = O(n), where n is the number of sets in the set_list. Other operations are performed in constant time.

Space Complexity: O(n) - The size of the freq_dict and duplicate_sets list is proportional to the size of the set_list, O(n) + O(n) = O(n), where n is the number of sets in the set_list.

Next TopicAugmented Reality (AR) in Python

← prev next →