The difference between is and == in Python

xhk7aeakcdnhe_80d8dc53ff644bb9b 23b624dda3a0370.png

1.The difference between is and ==

I believe everyone who has studied Python knows that is and == are both used to compare Python objects, but the difference is

is comparison requires that the object’s value and memory address are equal
== comparison only requires that the values of the objects are equal.

Let’s look at an example

import time

t1 = time.time()
t2 = time.time()

print("The value of t1:", t1)
print("value of t2:", t2)
print(t1 == t2)
print(t1 is t2)

#results
Value of t1: 1679973143.1747568
Value of t2: 1679973143.1747568
True
False

We can see that the time() method of the time module is used to obtain the current time, so the values of t1 and t2 are the same

== is used to determine whether the values of t1 and t2 are equal, so it returns True

Although the values of t1 and t2 are equal, they are two different objects (each call to time() returns a different object), so t1 is t2 returns False

So, how to determine whether two objects are the same?

Answer: Determine the memory addresses of two objects. If the memory addresses are the same, it means that the two objects use the same memory, and of course they are the same object.

Let’s take a look at the memory addresses of t1 and t2

import time

t1 = time.time()
t2 = time.time()

print("Memory address of t1:", id(t1))
print("Memory address of t2:", id(t2)

#results
Memory address of t1: 2251407006832
Memory address of t2: 2251405788464

You can see that the memory addresses of the two of them are different.

2. Small integer pool & amp; caching mechanism

But some friends may encounter the following situation:

a = 4
b = 4

print(a == b) # True
print(a is b) # True

Huh? How come a is b turns out to be True? These should be two different objects.

This is actually because of the small integer pool

Some values frequently used in python are defined as small integer pools. The range of the small integer pool is [-5,256]

Python has created memory space for these values in advance. Even if it is redefined multiple times, it will not open up new space. However, values outside the small integer pool will open up new space again when they are redefined.

Therefore, for numbers in the small integer pool, the memory addresses must be the same. For numbers in and outside the small integer pool, the memory addresses are different.

a = 4
b = 4

print(id(a))
print(id(b))

# results
2308490488208
2308490488208

Okay, then this time I use numbers outside the small integer pool

a = 1000
b = 1000

print(a == b) # True
print(a is b) # True

a = 1000
b = 1000

print(id(a))
print(id(b))

#results
2102348852368
2102348852368

? Let me play with you. The numbers in and out of the small integer pool are said to have different memory addresses. So why is the above code different from what was said?

I typed the above code in the IDE environment. Let’s try typing it in interactive mode.

#Small integer pool
>>> a = 4
>>> b = 4
>>> print(a == b)
True
>>> print(a is b)
True

#Non-small integer pool
>>> a = 1000
>>> b = 1000
>>> print(a == b)
True
>>> print(a is b)
False

It can be seen that in interactive mode, the memory addresses of numbers outside the small integer pool are different. Why is this?

Let’s talk about the conclusion first: This is because of Python’s caching mechanism, so in the IDE environment or script mode, the same integer is referenced by multiple variables without opening up new memory space.

Python caching mechanism

When the Python interpreter starts, it will first carve out a small part of the memory space to store frequently used data (immutable data types). This can greatly reduce the need to apply for memory when creating and destroy frequently used data objects. memory overhead
Under the same code block, objects of immutable data types (numbers, strings, ancestors) are referenced by multiple variables, and memory space will not be allocated repeatedly.

From the above, we know that only immutable data types (strings, tuples, basic data types) will not repeatedly open up memory space if they are referenced by multiple variables, except for mutable data types (lists, dictionaries, sets)

mutable data type

let’s take a look

#List
l1 = [1, 2, 3]
l2 = [1, 2, 3]

print(id(l1))
print(id(l2))
print(l1 is l2)

#result
2157601558656
2157601388224
False

#Dictionary
dict1 = {'name': "kanye", "age":18}
dict2 = {'name': "kanye", "age":18}

print(id(dict1))
print(id(dict2))
print(dict1 is dict2)

#result
2096576418240
2096576418432
False

#Collection
s1 = {1, 2, '3'}

s2 = {1, 2, '3'}

print(id(s1))
print(id(s2))
print(s1 is s2)

#result
2326184069152
2326184068928
False

The same results in interactive mode

>>> s1 = {1, 2, '3'}
>>> s2 = {1, 2, '3'}
>>> print(s1 is s2)
False
>>> dict1 = {'name': "kanye", "age":18}
>>> dict2 = {'name': "kanye", "age":18}
>>> print(dict1 is dict2)
False
>>> l1 = [1, 2, 3]
>>> l2 = [1, 2, 3]
>>> print(l1 is l2)
False

Immutable data types

1. Numbers in the small integer pool

Let’s take a look at the caching mechanism of immutable data types in interactive mode

>>> a=4
>>> b=4
>>> print(a is b)
True

>>>num1=100
>>>num2=100
>>> print(num1 is num2)
True

It can be seen that the numbers in the integer range [-5, 256] in Python are fixed cache. As long as a number within this range is used, whether it is a direct assignment or an expression calculation, the data in the fixed cache will be used.

2. Numbers in the non-small integer pool

For numbers in the non-small integer pool, cache will be used in the IDE environment, that is, multiple variables refer to the same data, and no new memory space will be opened.

#The results are all True
a = -10
b = -10

print(a is b)

num1 = 1.0
num2 = 1.0
print(num1 is num2)

n1 = 1000
n2 = 1000
print(n1 is n2)

For numbers in the non-small integer pool, in interactive mode, the caching mechanism will not be used unless assigned at the same time or assigned in the same code block

#Assign values simultaneously
>>> n1,n2=1000,1000
>>> print(n1 is n2)
True

>>> f1,f2=-10.2,-10.2
>>> print(f1 is f2)
True

#Assignment under the same code block
>>> for i in range(3):
...a=-10
... b=-10
...print(a is b)
...
True
True
True

>>> for i in range(3):
...a=1000
... b=1000
...print(a is b)
...
True
True
True

>>> for i in range(3):
...num1=-100
...num2=-100
...print(num1 is num2)
...
True
True
True

>>> for i in range(3):
...f1=-10.2
...f2=-10.2
...print(f1 is f2)
...
True
True
True

4.intern mechanism

We know that due to Python’s caching mechanism:

If immutable data types (strings, tuples, basic data types) are referenced by multiple variables, memory space will not be allocated repeatedly.
However, if mutable data types (lists, dictionaries, sets) are referenced by multiple variables, new memory space will be opened up.
For integers in the small integer pool, they are referenced by multiple variables and will not repeatedly open up memory space.

But we know so far:In interactive mode, except for special cases (simultaneous assignments, assignments within the same local code block) and small integer pools, all data will open up new memory when referenced by multiple variables Space

In fact, there is a special case. Let’s take a look at such an example.

#Interactive mode
>>> s1='hello'
>>> s2='hello'
>>> print(s1 is s2)
True

Look at the output results and compare them with the knowledge you just learned. Do you find anything wrong?

In interactive mode, multiple variables referencing strings (immutable data types) should open up new memory space. Why is it not opened up in the above example?

intern mechanism

The string type is one of the most commonly used data types in Python. In order to improve the efficiency and performance of string use, Python uses intern (string residence) technology to improve string efficiency.

That is, only one copy of the same string object will be saved and placed in a string storage pool, which is shared. When a new variable refers to the same string, no new one will be opened. memory space, but refers to this shared string

principle

The way to implement the Intern mechanism is very simple, by maintaining a string saving pool. This pool is a dictionary structure. If the string already exists in the pool, no new string will be created, and the previously created string will be returned directly. Object, if it has not been added to the pool before, construct a string object first and add this object to the pool to facilitate the next acquisition.

The following is pseudo code

intern_pool = {}
 
def intern(s):
    if str in intern_pool:
        return intern_pool[str]
    else:
        obj = PyStringObject(str)
        intern_pool[str] = obj
        return obj

1. In interactive mode, only strings containing alphanumeric underscores will trigger the intern mechanism.

>>> s1='hello'
>>> s2='hello'
>>> print(s1 is s2)
True

#If special characters are used, the intern mechanism will not be triggered.
>>> s1='hello &'
>>> s2='hello &'
>>> print(s1 is s2)
False

>>> a='12 3'
>>> b='12 3'
>>> print(a is b)
False

2. In IDE environment or script mode, as long as the length does not exceed 20 (length limit), even using special characters will trigger the intern mechanism

a = '12 3'
b = '12 3'
print(a is b) # True

s1='hello &'
s2='hello &'
print(s1 is s2) # True

s1 = "a b" * 10
s2 = "a b" * 10
print(s1 is s2) # True

s1 = "ab" * 11
s2 = "ab" * 11
print(s1 is s2) # False

PS: I was using python 3.9 when writing this article and found that there is no length limit and the intern mechanism will be triggered.

s1 = "a b" * 22
s2 = "a b" * 22
print(s1 is s2) # True

s1 = "%^ & amp;?" * 22
s2 = "%^ & amp;?" * 22
print(s1 is s2) # True

———————————END——————- ——–

Digression

Interested friends will receive a complete set of Python learning materials, including interview questions, resume information, etc. See below for details.

CSDN gift package:The most complete “Python learning materials” on the Internet are given away for free! (Safe link, click with confidence)

1. Python learning routes in all directions

The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the following knowledge points to ensure that you learn more comprehensively.

2. Essential development tools for Python

The tools have been organized for you, and you can get started directly after installation!

3. Latest Python study notes

When I learn a certain basic and have my own understanding ability, I will read some books or handwritten notes compiled by my seniors. These notes record their understanding of some technical points in detail. These understandings are relatively unique and can be learned. to a different way of thinking.

4. Python video collection

Watch a comprehensive zero-based learning video. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher’s ideas in the video, from basic to in-depth.

5. Practical cases

What you learn on paper is ultimately shallow. You must learn to type along with the video and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.