Python is both an interpreted language and a compiled language

I don’t know if any friends are like me. When I first started learning Python, I heard that Python is an interpreted language, because it interprets and executes line by line when running, while C++ is a compiled language. language

But I saw an article today. The author suggested that Python actually has a compilation process. The interpreter will compile first and then execute it.

Not only that, the author also believes that [interpretation] and [compilation] are wrong dichotomies that limit the possibilities of programming languages. Python is both an interpreted language and a compiled language!

This article contains a lot of useful information. I believe you will gain a lot after reading it patiently.

Foreword

The Python mentioned in this article does not refer to alternative versions of Python such as PyPy, Mypyc, Numba, Cinder, etc., nor does it refer to Python-like programming languages such as Cython, Codon, and mojo1

I mean regular Python – CPython

Currently, I am writing a textbook to teach students how to read and understand programming error messages. We are running courses in three programming languages (C, Python, Java)

One of the key points in the nature of program error information is that program errors are generated at different stages, some are generated at compile time, and some are generated at runtime

The first course is for the C language, specifically how to use the GCC compiler and demonstrate how GCC converts code into an executable program

preprocessing
Lexical analysis
syntactic analysis
semantic analysis
linking

In addition, this lesson also discusses program errors that may occur during the above stages and how these will affect the error messages presented. Importantly: Errors in early stages will prevent errors from being detected in later stages (that is, after an error in stage A occurs, even if there is an error in stage B, it will not be detected)

When I adapted this course to target Java and Python, I discovered that neither Python nor Java has a preprocessor, and linking between Python and Java is not the same concept.

I ignored the above changes, but I stumbled upon an interesting phenomenon:

The compiler will generate error messages at each stage, and the compiler will usually display errors in previous stages before continuing execution, which means that we can discover various stages of the compiler by deliberately creating errors in the program.

So let’s play a little game to discover the stages of the Python interpreter

Which Is The First Error

We will create a Python program that contains multiple bugs, each trying to cause a different type of error message

We know that regular Python will only report one error per run, so the game is – which error report will be triggered first

# The following is a buggy program
1/0
print() = None
if False
    ? = "hello

Each line of code will generate a different error:

1 / 0 will generate ZeroDivisionError: division by zero
print() = None will generate SyntaxError: cannot assign to function call
if False will generate SyntaxError: expected ':' .
? = "hello will generate SyntaxError: EOL while scanning string literal .

The question is, which error will be displayed first? Something to note: Python version matters (more so than I thought), so keep that in mind if you see different results

PS: The Python version used to run the code below is Python 3.12

Before you start executing code, think about what “interpreted” and “compiled” languages mean to you?

Below I will give a Socratic dialogue, I hope you can reflect on the difference

Socrates: A compiled language is one in which the code first passes through a compiler before being run. An example is the C programming language. To run C code, you first have to run a gcc compiler like clang or clang before you can run the code. The compiled language is converted into machine code, the 1s and 0s that the CPU understands.

Plato: Wait, isn’t Java a compiled language?

Socrates: Yes, Java is a compiled language.

Plato: But the output of a regular Java compiler is not a .class file. That’s bytecode, isn’t it?

Socrates: That’s right. Bytecode is not machine code, but Java is still a compiled language. This is because the compiler can catch many problems, so you need to correct the errors before your program starts running.

Plato: What about interpretive language?

Socrates: An interpreted language is one that relies on a separate program (properly called an interpreter) to actually run the code. Interpreted languages do not require the programmer to run a compiler first. Therefore, any mistakes you make will be caught while the program is running. Python is an interpreted language, there is no separate compiler, and all mistakes you make are caught at runtime.

Plato: If Python is not a compiled language, then why does the standard library include a file called py_compile and compileall module?

Socrates: Well, these modules just convert Python into bytecode. They do not convert Python to machine code, so Python remains an interpreted language.

Plato: So, are both Python and Java converted to bytecode?

Socrates: Yes.

Plato: So, why is Python an interpreted language and Java a compiled language?

Socrates: Because all errors in Python are caught at runtime. (ps: please pay attention to this sentence)

Round 1

When we execute the buggy program above, we will receive the following error

 File "/private/tmp/round_1.py", line 4
    ? = "hello # SyntaxError: EOL while scanning string literal
        ^
SyntaxError: unterminated string literal (detected at line 4)

The first error detected is on the last line of the source code. You can see: Python must read the entire source file before running the first line of code

If you have a definition of “interpreted language” in your head that includes “an interpreted language reads code sequentially and runs it one line at a time”, I hope you forget it

I haven’t dug into the source code of the CPython interpreter to verify this, but I think the reason this is the first error detected is that the first step done byPython 3.12 is scanning, also known as scanning for lexical analysis

The scanner converts the entire file into a series of tokens and then proceeds to the next stage.

The scanner scanned the last line of the source code and found that there was a quotation mark missing at the end of the string literal. It hoped to convert the entire string literal into a token, but it could not convert it without the closing quotation mark.

In Python 3.12, the scanner runs first, so that’s why the first error is unterminated string literal

Round 2

We have fixed the bug in the fourth line of code, but there are still bugs in lines 1, 2, and 3.

1/0
print() = None
if False
    ? = "hello"

Let’s execute the code now and see which one is the first to report an error.

File "/private/tmp/round_2.py", line 2
    print() = None
    ^^^^^^^
SyntaxError: cannot assign to function call here. Maybe you meant '==' instead of '='?"

This time it’s the second line that reports an error! Again, I haven’t looked at the source code of CPython, but I’m reasonably certain thatthe next stage of scanning is parsing, also called syntax analysis

The source code is parsed before running the code, which means that Python will not see the error on the first line, but will report an error on the second line.

I’d like to point out that the code I wrote for this little game makes absolutely no sense, and there’s no right answer for how to fix the bug. My purpose is purely to estimate writing errors and then find out what stage the python interpreter is at now

I don’t know what print() = None might mean, so I’m going to solve this by replacing it with print(None), which doesn’t make sense either, but At least it’s grammatically correct.

Round 3

We also fixed the syntax error in the second line, but there are two other errors in the source code, one of which is also a syntax error.

1/0
print(None)
if False
    ? = "hello"

Recall that grammatical errors were displayed first in round two. Will it be the same in round three?

 File "/private/tmp/round_3.py", line 3
    if False
            ^
SyntaxError: expected ':'

That’s right! Syntax errors on the third line take precedence over errors on the first line

Just like Round 2, the Python interpreter will parse the source code and perform syntax analysis on it before running the code

This means that Python will not see the error on the first line, but will report an error on the third line

You may be wondering why I inserted two SyntaxError in one file, isn’t one enough to make my point?

This is because different Python versions will lead to different results. If you run the code in Python3.8 or earlier, the results will be as follows

In Python 3.8, the first error message reported by round 2 is on line 3:

1/0
print() = None
if False
    ? = "hello"
  
# Report an error
File "round_2.py", line 3
    if False
            ^
SyntaxError: invalid syntax

After fixing the error on line 3, Python 3.8 reports the following error message on line 2:

1/0
print() = None
if False:
    ? = "hello"

# Report an error
  File "round_3.py", line 2
    print() = None
    ^
SyntaxError: cannot assign to function call

Why are the error reporting orders different between Python 3.8 and 3.12? It’s because Python 3.9 introduces a new parser. This parser is more powerful than the previous na?ve parser

The old parser cannot look ahead at multiple tokens, which means that the old parser can technically accept Python programs with invalid syntax

In particular, this limitation causes the parser to be unable to identify whether the left side of the assignment statement is a valid assignment target. For example, the following code, the old parser can receive the following code

[x for x in y] = [1,2,3]

The above code makes no sense and is not even allowed by Python syntax. In order to solve this problem, Python once had an independent, hacky stage (I don’t know what translation is better for this hacky)

That is, Python will check all assignment statements and ensure that the left side of the assignment number is actually something that can be assigned

This stage occurs after parsing, which is why the error report on the second line is displayed first in the old version of Python.

Round 4

Now there’s one last mistake left

1/0
print(None)
if False:
    ? = "hello"

Let’s run it

Traceback (most recent call last):
  File "/private/tmp/round_4.py", line 1, in <module>
    1/0
    ~~^~~
ZeroDivisionError: division by zero

It should be noted that Traceback (most recent call last) represents the main content of the error when Python is running, which only appears in round four.

After the previous scanning and parsing stages, Python can finally run the code. But when Python starts running to interpret the first line, an error named ZeroDivisionError is raised.

Why we know we are at [runtime] is because Python has printed out Traceback (most recent call last), which means we have a stack trace

Stack traces can only exist at runtime, which means this error must be caught at runtime.

But this means that the errors encountered in rounds 1~3 are not runtime errors, so what are they?

Python is both a compiled language and an interpreted language

That’s right! The CPython interpreter is actually an interpreter, but it is also a compiler

I hope the above exercise has illustrated the stages Python must go through before running the first line of code:

scanning
parsing

Older versions of Python had an extra stage:

scanning
parsing
checking for valid assignment targets

Let’s compare this to the previous stages of compiling a C program:

~~Preprocessing~~
Lexical analysis (another term for “scanning”)
Parsing (another term for “parsing”)
~~Semantic Analysis~~
~~Link~~

Python still performs some compilation phase before running any code, just like Java, it compiles the source code into bytecode

The first three errors are generated by Python during the compilation phase, and only the last one is generated at runtime, namely ZeroDivisionError: division by zero.

In fact, we can precompile all Python code using the compileall module on the command line:

$ python3 -m compileall

This will put the compiled bytecode for all Python files in the current directory into __pycache__/ and display any compiler errors

If you’re wondering what’s actually in that __pycache__/ folder, I gave a talk for EdmontonPy that you should check out!

Lecture address: https://www.youtube.com/watch?v=5yqUTJuFuUk &t=7m11s

The interpreter will not actually start until Python is compiled into bytecode. I hope the previous exercise has proven that Python can indeed report errors before runtime.

Compiled languages and interpreted languages are a false dichotomy

I hate it every time a programming language is classified as a “compiled” or “interpreted” language. A language itself is not compiled or interpreted

Whether a language is compiled or interpreted (or both!) is an implementation detail

I’m not the only one who thinks this way. Laurie Tratt has an excellent article demonstrating this by writing an interpreter that gradually became an optimizing compiler

Article address: https://tratt.net/laurie/blog/2023/compiled_and_interpreted_languages_two_ways_of_saying_tomato.html

Another article is Crafting Interpreters by Bob Nystrom. Here are some quotes from Chapter 2:

What is the difference between a compiler and an interpreter?

Turns out, it’s like asking the difference between fruits and vegetables. This may seem like a binary either/or choice, but in fact “fruit” is a botanical term and “vegetable” is a culinary term.

Strictly speaking, one does not imply the negation of the other. Some fruits are not vegetables (apples), and some vegetables are not fruits (carrots), but there are also edible plants that are both fruits and vegetables, such as tomatoes

When you use CPython to run a Python program, the source code is parsed and converted into an internal bytecode format, and then executed in the virtual machine

From a user’s perspective, this is obviously an interpreter (since they run programs from source code), but if you look closely at the scaly skin of CPython, you’ll see that it’s definitely compile

The answer is:CPython is an interpreter and it has a compiler

So why is this important? Why is making a strict distinction between compiled and interpreted languages counterproductive?

[Compilation] and [Interpretation] limit the possibilities we think of programming languages

A programming language does not have to be defined by whether it is compiled or interpreted! Thinking in this rigid way limits what we think a given programming language can do

For example, JavaScript is often lumped into the category of “interpreted languages.” But for a while, JavaScript running in Google Chrome was never interpreted-instead, JavaScript was compiled directly into machine code! Therefore, JavaScript can keep up with C++

For this reason, I’m really tired of arguments saying that interpreted languages are necessarily slow – performance is multifaceted and doesn’t just depend on the “default” programming language implementation

JavaScript is fast now, Ruby is fast now, Lua has been fast for a while

What about programming languages that are often labeled as compiled languages? (e.g. C) You don’t think about interpreting C language programs

The real differences between languages

The real difference between languages: [Static] or [Dynamic]

The real difference we should teach our students is the difference between language features, which can be determined statically, i.e. just staring at the code without running it, and which can only be known dynamically at runtime.

It should be noted that I am talking about [language features] rather than [language]. Each programming language chooses its own set of properties, which can be determined statically or dynamically and combined together, which makes the language more “Dynamic” or more “static”

Static and dynamic are a spectrum, and Python is at the more dynamic end of the spectrum. Languages like Java have more static features than Python, but even Java includes things like reflection, which is definitely a dynamic feature

I find that dynamic and static are often confused, and compilation and interpretation, which is understandable

Because languages that typically use interpreters have more dynamic features, such as Python, Ruby, and JavaScript

Languages with more static features tend to be implemented without an interpreter, such as C++ and Rust

And then there’s Java in between

Static type annotations in Python have gradually (hehe) been adopted in code bases, and one of the expectations is that this can unlock performance benefits in Python code due to more static stuff

Unfortunately, it turns out that types in Python (yes, just general types, think metaclasses) and annotations themselves are dynamic features of Python, which makes static typing not the performance benefit one was hoping for.

Finally, to summarize:

CPython is an interpreter, it has a compiler (or Python is both an interpreted language and a compiled language)
It doesn’t matter whether Python is compiled or interpreted. Importantly, compared to programming languages that have more static properties (properties that can be determined before run at the compile or interpretation stage), there are relatively few properties in Python that can be determined before run, which meansin In Python, many properties are determined dynamically at runtime rather than statically at compile or interpretation time
Since Python has less static properties, this means that some errors may only become apparent at runtime, rather than at compile or interpretation time.
Because in programming languages with more static nature, many errors are caught during compilation or interpretation and are therefore easier to find and fix during the coding phase.
This is the really important distinction, a more nuanced and subtle distinction than [compile] and [interpret]. For this reason, I think it’s important to emphasize specific static and dynamic features, rather than limiting ourselves to the cumbersome distinction between “interpreted” and “compiled” languages.

About Python’s technical reserves

Here I would like to share with you some free courses for everyone to learn. Below are screenshots of the courses. Scan the QR code at the bottom to get them all.

1. Python learning routes in all directions

2. Learning software

If a worker wants to do his job well, he must first sharpen his tools. The commonly used development software for learning Python is here, saving everyone a lot of time.