Javatpoint Logo
Javatpoint Logo

Python Program to Find Duplicate Sets in a List of Sets

In this article, we'll examine a variety of Python programs that let us quickly spot duplicate sets among a list of sets. To complete this task, we will make use of Python's robust set operations and functional programming features. We will also go over several techniques and approaches for dealing with duplicates according to particular needs.

In this article, we will discover effective methods for comparing sets for equality, identifying duplicate sets based on their elements, and removing or modifying duplicate sets from a list of sets.

Introduction to the Problem:

The given list of sets may contain sets with duplicate elements, and the task is to efficiently identify these duplicate sets based on their contents and take appropriate actions, such as removing or modifying them.

The problem can be divided into the following essential steps:

  • Taking a list of sets as input, each of which may include either unique or redundant entries.
  • Considering the arrangement of the elements within the sets, comparing the sets in the list to find duplicate sets based on their elements.
  • Developing a strategy or algorithm for dealing with duplicate sets, such as deleting or changing them in accordance with particular needs.
  • Giving the complete list of sets, with duplicate sets properly handled in accordance with the specified approach.

Python program to remove duplicate sets from a list of sets: Uising frozenset() and set comprehension

We can use the set() function, which removes duplicates from a list, to remove the duplicate sets from the Set_List, However, since sets are not hashable, we need to convert each set in Set_List to a frozenset object which is hashable.

Output:

Unique Sets: [frozenset({1, 2, 3}), frozenset({5, 6, 7}), frozenset({9, 2, 4, 7}), frozenset({4, 5, 6, 7})]

Now let's get started and explore the Python programs that find duplicate sets in a list of sets.

Method 1 - Using for-Loop and frozenset()

In this method, we will iterate over each set in Set_List and find the duplicate sets using the set lookup operation. If we find any duplicate set, we will store it in a new list.

Output:

Duplicate sets found: [{1, 2, 3}, {5, 6, 7}]

In this program, we create an empty set called seen_sets to keep track of sets we've seen before and duplicate_sets to store duplicate sets. We then iterate over each set s in Set_List.

For each set, we create a frozenset object 'frozenset_s', which is hashable. We then check if frozenset_s is already in seen_sets. If it is, then s is a duplicate set, and we append it to duplicate_sets. If it isn't, we add frozenset_s to seen_sets to check for duplicates in future iterations.

Finally, if duplicate_sets is not empty, we print out each duplicate set. Otherwise, we print out a message indicating no duplicate sets were found.

Time Complexity: O(n) - The program uses a single for-loop to iterate through each set in Set_List. So, the time complexity of the loop is O(n), where n is the length of Set_List. Within the loop, there is a set lookup operation to check if a frozenset has already been seen, which takes O(1) time. Where n is the number of sets in the set_list.

Space Complexity: O(m) - where m is the number of unique sets. Since each set is stored as a frozenset in seen_sets, the maximum size of seen_sets is equal to the number of unique sets in the input list. In the worst case, it can be O(n) when m becomes equal to n, and in the best case, it can be O(1) when m becomes equal to 1.

Method 2 - Using Counter() from collections

In this method, we will use the Counter from collections to count the occurrences of each set. Then, we will filter out the set with an occurrence count of more than 1 and store them into the duplicate_sets as usual.

Output:

Duplicate sets found: [{1, 2, 3}, {9, 2, 4, 7}, {5, 6, 7}]

In this method, we have used list comprehension to convert each set into a frozen sets. It is necessary because sets are not hashable and cannot be used as a key in the dictionary. Then we passed this list of frozen sets to the counter class, which returns a Counter object with each frozen set as a key and their count as the key's value.

We have then filtered out the keys which have a count value greater than 1 and stored them in duplicate_sets after converting them again into a set. And at the end of the program, we printed the result.

Time complexity: O(n) - The for-loop iterates over the set_list once and performs constant-time lookups and comparisons of frozen sets. Where n is the number of sets in the set_list.

Space Complexity: O(m) - where m is the number of unique sets. The counter object stores the count of each set, and the space required by it depends on the number of unique sets. And the duplicate sets also require some space which is comparatively very small. So, the overall space complexity of the program becomes O(m).

Method 3 - Using defaultdict() from collections

In this method, we will use defaultdict from collections to keep track of the number of occurrences of each set in set_list. After counting the frequency, we will use list comprehension to filter out duplicate sets.

Output:

Duplicate sets found: [{1, 12, 5}, {4, 5, 6, 7}]

In the above program, we have created a defaultdict object, 'set_counts', a subclass of dict in Python, to keep track of the frequency of each set in set_list. We iterate over each set in set_list, and increase the frequency count by 1 using set_counts[frozenset(s)] += 1.

After that, we filtered out the sets which have a frequency count greater than 1 indicating duplicate sets using list comprehension. And finally, we have printed the result.

Time Complexity: O(n) - Here also, we have iterated over the set_list, which takes O(n) time, and we have again iterated over set_list in the list comprehension method, which also takes O(n) time. Overall, the time complexity becomes O(n) + O(n) = O(n), where n is the number of sets in set_list.

Space Complexity: O(m) - where m is the number of unique sets in the set_list. The set_counts dictionary stores the frequency of unique sets in the set_list, and the duplicate_sets list stores the duplicate sets. In the worst case, if m @ n means all the sets are unique, the space complexity will rise to O(n), and in the best case, where all the sets are duplicates (m = 1), then the space complexity will become constant, O(1).

Method 4 - Using Hashing with Dictionary

Hashing refers to the method of converting a non-hashable object into a hashable and immutable form. This is done so that we can use the sets in set_list as keys in a dictionary that maps each key to a different hash value using a hash function.

Output:

Duplicate sets found: [{11, 22, 23}, {4, 5, 6, 7}]

In this method, we have used a dictionary to store the hash value of each set and its frequency. We first iterate over each set in set_list and get the hash value using the hash() function. Then we check whether the hash value is already in the dictionary or not. If yes, we increase the frequency of the set. Otherwise, we set the frequency as 1 for that set.

We have again iterated over the sets in set_list to find the duplicate sets and check their frequencies. The set having a frequency greater than one means it is a duplicate set and needs to be added to the duplicate_sets list. And at the end of the program, we have printed the result.

Time Complexity: O(n) - In the above program, we have iterated over the set_list twice O(n) + O(n) = O(n), where n is the number of sets in the set_list. Other operations are performed in constant time.

Space Complexity: O(n) - The size of the freq_dict and duplicate_sets list is proportional to the size of the set_list, O(n) + O(n) = O(n), where n is the number of sets in the set_list.







Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA