ebonnafoux

What happend when you "Hello Word" in Python

For the first entry of this blog, following the tradition, we will discuss "Hello World". As my day-to-day programming language is Python, I want to analyze the toolchain behind printing "Hello World".

1. Setup

So Assuming you are on a Unix terminal we start with

> echo 'print("Hello World") > main.py'

Then, assuming you have uv installed

> uv run main.py
Hello World

So what happend ?

2. Tokenization

First Python will cut the code source into token. This step can be analyze with the tokenize module

> uv run python -m tokenize main.py
0,0-0,0:            ENCODING       'utf-8'
1,0-1,5:            NAME           'print'
1,5-1,6:            OP             '('
1,6-1,19:           STRING         '"Hello World"'
1,19-1,20:          OP             ')'
1,20-1,21:          NEWLINE        '\n'
2,0-2,0:            ENDMARKER      ''

The first column gives you the localisation of the token, the second its type and the third the string to which it applies.

You can breakdown the OP token type by adding -e as flag

0,0-0,0:            ENCODING       'utf-8'
1,0-1,5:            NAME           'print'
1,5-1,6:            LPAR           '('
1,6-1,19:           STRING         '"Hello World"'
1,19-1,20:          RPAR           ')'
1,20-1,21:          NEWLINE        '\n'
2,0-2,0:            ENDMARKER      ''

Notice that ( and ) are now LPAR and RPAR.

The list of available token is available in the documentation but is version dependent. For example between 3.10 and 3.14 the tokens AWAIT and ASYNC disappear but the tokens FSTRING_START,FSTRING_MIDDLE,FSTRING_END,TSTRING_START,TSTRING_MIDDLE,TSTRING_END appear.

Code that would normally crash can be tokenize. For exemple python has no problem tokenize this code

print("Hello World")
foo()

even if running this program would give you a NameError.

3. Abstract Syntax Tree

Then Python transform your code into a Abstract Syntax Tree. This can also be done manually like this

> uv run python -m ast main.py
Module(
   body=[
      Expr(
         value=Call(
            func=Name(id='print', ctx=Load()),
            args=[
               Constant(value='Hello World')],
            keywords=[]))],
   type_ignores=[])

Here we see that our code is understood as a Module with one expression made of a the call of function, the one with id print, with argument Hello World.

4. Compilation

Last but not least, the AST is transformed into a pile of bytecode. This step can also be isolated with the following command.

> uv run python -m dis main.py
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (print)
              6 LOAD_CONST               0 ('Hello World')
              8 CALL                     1
             16 POP_TOP
             18 RETURN_CONST             1 (None)

This product human readable disassembly. To actually compile it one has to use

uv run python -m py_compile main.py

5. Interpretation

The last command will produce a *.pyc inside a __pycache__ This can be execute directly by CPython

> uv run python __pycache__\main.cpython-313.pyc 
Hello World

6. Conclusion

This short blog post introduce the different steps before the execution of Python code : tokenization, AST, compilation and then execution. Hopefully, I will discuss all this step with more details later on.