Hello everyone, let me share with you the python setup.py install error. Many people don’t know this yet. Let’s explain it in detail below. Now let’s take a look!
What is set
Mathematically, a set is called a collection composed of different elements, and the members of a set are usually called set elements. Python introduces this concept into its collection type objects. Locomotive Little Hair Cat AI pseudo-original. A collection object is an unordered set of hashable values. Set relationship testing and operators such as union
, intersection
, and difference
(union, intersection, difference) are also the same in Python as we do Works as expected.
Characteristics of set
First of all, python data structures are divided into variable and immutable. set is a variable sequence
The elements in the collection have three characteristics:
- Determinism: The elements in the set must be certain;
- Mutuality: The elements in the set are different from each other. For example, if the set A={1,a}, then a cannot be equal to 1);
- Disorder: The elements in the set are not in order. For example, {3,4,5} and {3,5,4} are counted as the same set.
A set in Python is a set of unordered non-repeating elements. Basic functions include relationship testing and elimination of duplicate elements, and can also calculate intersection, difference, union, etc. It behaves similarly to a list, except that set cannot contain duplicate values and set elements are unordered.
In python, you can use curly braces {}
to create a collection. Note: If you want to create or initialize an empty collection, you must use set() instead of {}
. Because the latter {}
creates an empty dictionary, we will introduce the dictionary data structure later.
Create set
Create an empty collection
s1 = set() # To create an empty collection, set() must be used, and cannot be created directly with {} print(s1) #Print the empty set not {} but set() to distinguish it from the empty dictionary print(type(s1)) #set() # <class 'set'> d = {} # This is to create an empty dictionary, pay attention to the difference print(d, type(d)) # {} <class 'dict'>
Create a set with elements
You can add elements yourself to create
You can also use the set()
function to directly convert existing data structures (lists, strings, tuples) into sets (the main purpose is to remove duplicates)
s1 = {2, 1, 3} print(s1) # The output results below show that the collection is unordered # {1, 2, 3} s2 = {1, 2, 3, 3} # Although the added elements are duplicated, the set is automatically deduplicated. print(s2) # {1, 2, 3} # Use a factory to create a collection, but note that the parameters are iterable s3 = set('12aa') print(s3) # {'1', '2', 'a'} s4 = set([1, 2, 'a', 'a']) print(s4) # {1, 2, 'a'} s5 = set(((1,2),(3,4),(1,3),(1,2))) print(s5) # {(1, 2), (1, 3), (3, 4)}
discuss:
According to the above code, we found that we can only use
set()
to create an empty collection, not{}
. Moreover, the printing result of a non-empty set is wrapped by{}
. Earlier we learned about tuple, and its printing result is wrapped by()
. In addition, there is a second piece of code that shows that the set is unordered. Other data structures can be converted into sets throughset()
, and duplication is automatically removed.
Basic operation functions of collections
- len(set): Number of set elements
- max(set): Returns the maximum value of the set element
- min(set): Returns the minimum value of the set element
- list(set): Convert a set to a list
- del: Delete the collection and release memory space
Add and remove elements from collections
To add elements to a set, use set.add()
To remove elements from a collection, use set.remove()
s1 = {2, 1, 3} print(s1) # {1, 2, 3} s1.add(4) print(s1) # {1, 2, 3, 4} s1.remove(2) print(s1) # {1, 3, 4}
Be careful not to use pop to delete specified elements from the collection! !
First of all pop() can only be used in mutable sequences
pop() can be used in both lists and dictionaries. The pop parameter in the list is the index, and the pop parameter in the dictionary is the key.
There is no index or key concept in set, so no parameters are added to
set.pop()
Moreover, the function of
set.pop()
is to randomly delete an element in the set. It deletes immediately, so its effect is very uncontrollable. It is recommended not to use pop() in collections
Collection traversal and access
Loop traversal can only be traversed with a for loop
# Collection traversal uses for s1 = {5, 4, 3, 2, 1} # print(s1) # # {1, 2, 3, 4, 5} for x in s1: print(x, end=" ") # 1 2 3 4 5
You can see that the results printed and traversed above are not in the order entered in the original collection.
So once again, sets are unordered
The collection does not have an index. You cannot use the index to obtain the corresponding elements. For example, if you use s[1]
, an error will be reported. Why?
s1={1,2,3} print(s1[1]) # TypeError: 'set' object does not support indexing
Because the collection is unordered! So it is not certain that there cannot be an index. If there is an index, it means it is ordered
Intersection and complement operation of sets
Mathematical symbols | Python symbols | Function version | Meaning |
---|---|---|---|
– or | – | – | difference() | Difference set, relative complement |
∩ | & amp; | intersection() | Intersection |
∪ | | | union() | Union |
≠ | != | not equal | |
= | == | equal to | |
∈ | in | is a membership relationship | |
? | not in | Non-membership |
example
s1 = {1, 2, 3, 4, 5} s2 = {4, 5, 6, 7, 8} print(s1 - s2) # Difference set, equivalent to s1.difference(s2) # {1, 2, 3} print(s2 - s1) # Equivalent to s2.difference(s1) # {8, 6, 7} print(s1 & amp; s2) # Intersection is equivalent to s1.intersection(s2), s1 and s2 can also interchange their positions # {4, 5} print(s1 | s2) # Union is equivalent to s1.union(s2), s1 and s2 can also interchange their positions # {1, 2, 3, 4, 5, 6, 7, 8} print(s1 ^ s2) # Cross complement, that is, the union minus the intersection # {1, 2, 3, 6, 7, 8} print(6 in s1) #False print(6 not in s1) #True
Note that the usage of functions and symbols are completely equivalent. They will generate a new object and will not modify the original collections s1 and s2
If you want to perform intersection and complement operations on sets and modify them in place, you need to use assignment operations.
# Update s1 to the intersection of s1 and s2 s1 = {1, 2, 3, 4, 5} s2 = {4, 5, 6, 7, 8} s1.intersection(s2) print(s1) # {1, 2, 3, 4, 5} has not modified s1 s1 = s1.intersection(s2) # Must use assignment operation print(s1) # {4, 5}
Function defined by collection class
Method | Description |
---|---|
add() | Add elements to the collection |
update() | Add elements to the collection |
clear( ) | Remove all elements in the collection |
copy() | Copy a collection |
pop() | Remove elements randomly |
remove() | Remove specified elements |
discard() | Delete the specified element in the collection |
isdisjoint() | Determine whether the two sets contain the same elements, return True if not, otherwise return False. |
issubset() | Determine whether the specified set is a subset of the method parameter set. |
issuperset() | Determine whether the parameter set of this method is a subset of the specified set |
symmetric_difference() | Returns a set of unique elements in the two sets. |
symmetric_difference_update() | Remove the same elements from the current collection in another specified collection, and insert different elements in another specified collection into in the current collection. |
union() | Returns the union of two sets, equivalent to | |
difference () | Returns the difference of multiple sets, which is equivalent to – |
difference_update() | Removing elements from the set , the element also exists in the specified collection. |
intersection() | Returns the intersection of sets |
intersection_update() | Returns the intersection of sets. |
example
s1 = {1, 2, 3} s1.add(4) s1.add(3) #Add duplicate elements and automatically remove duplicates print(s1) # {1, 2, 3, 4} s2 = {3, 4, 5} s1.update(s2) print(s1) # {1, 2, 3, 4, 5} s1.remove(1) print(s1) # {2, 3, 4, 5}
Collect the time complexity of various operations
Operation | Average case | Worst case |
---|---|---|
x in s | O(1) | O(n) |
Union s|t | O(len(s) + len(t)) | |
Intersection s & amp;t | O(min(len(s), len(t)) | O(len(s) * len(t)) |
Difference set s-t | O(len(s)) | |
s.difference_update(t) | O( len(t)) | |
Cross complement s^t | O(len(s)) | O(len(s) * len(t)) |
Symmetric difference set s.symmetric_difference_update(t) | O(len(t )) | O(len(t) * len(s)) |
[Additional reading]
Underlying implementation mechanism, common interview questions
Core: The bottom layer of the collection is the same as the bottom layer of the dictionary, which is a hash table
Collection underlying data structure
The efficiency of a collection is inseparable from its internal data structure. Unlike other data structures, the internal structure of a collection is a hash table: only a single element is stored in the hash table. If you don’t understand the principles and characteristics of hashing, you can search online. We will also write some content in this regard in the future.
Insert data into hash table
When inserting data into the collection, Python will calculate the hash value corresponding to the element through the hash(value) function. After getting the hash value (for example, hash), combined with the number of data to be stored in the collection (for example, n), you can get the position where the element should be inserted into the hash table (for example, using the modulo method hash%n method) .
If this position in the hash table is empty, then the element can be inserted directly into it; otherwise, if this position is occupied by another element, Python will compare whether the hash values of the two elements are equal:
Key points:
- If they are equal, it means that the element already exists, and then compare their values. If they are not equal, update them;
- If they are not equal, the situation is called a hash collision (that is, the two elements have different keys but have the same hash value). In this case, Python will use the open addressing method, the rehash method, etc. to continue looking for free positions in the hash table until the position is found.
Hash table search data
Searching for data in a hash table is similar to the insertion operation. Python will find the location where the element should be stored in the hash table based on the hash value, and then compare the element value with the element at that location:
- If they are equal, it proves that they have been found; otherwise, it proves that a hash conflict was encountered when the element was stored, and you need to continue searching using the original method of resolving the hash conflict until the element is found or a gap is found. The empty position found here means that the target element is not stored in the hash table.
- On the contrary, it proves that when the element was originally stored, a hash conflict was encountered, and the original method of resolving the hash conflict needs to be used to search until the element is found or a vacancy is found. The empty position found here means that the target element is not stored in the hash table.
Delete elements from hash table
For deletion operations, Python will temporarily assign a special value to the element at this position, and then delete it when the hash table is resized.
Key points:
- It should be noted that the occurrence of hash collisions often slows down dictionary and set operations.
- Therefore, in order to ensure their efficiency, hash tables in dictionaries and collections are usually guaranteed to have at least 1/3 of remaining space.
- As elements continue to be inserted, when the remaining space is less than 1/3, Python will reacquire a larger memory space and expand the hash table. At the same time, the positions of all elements in the table will be be redischarged.
Although hash collisions and hash table resizing can cause slowdowns, they occur very rarely. So, on average, the time complexity of insertion, search and deletion is still guaranteed to be O(1).
How does set determine whether two elements are duplicates, and how to remove duplicates?
The deduplication of set is achieved through the combination of two functions **__hash__
and __eq__
**.
When the hash values of two elements are not the same, the two variables are considered different.
When the hash values of two elements are the same, call the **__eq__
** method. When the return value is True, the two variables are considered to be the same and one should be removed. When returning FALSE, duplication will not be removed.