About me:

Highly interested in security, low level and python. I enjoy understanding how things works and put them out of their normal operation.

Infos:
- @hakril
- bitbucket
RSS Feed
Categories
- python
Authors
- ✉ Hakril

Understanding Python by breaking it - Lvl 2

Written by Hakril
2014-05-15 17:51:12

After my first article I kept playing with ctypes and tried to see what fun things that could be possible with little modifications to the Python's objects at runtime. The goal of this article is to explain how to get reliable native code execution as a simple Python objet. By reliable I mean: that, if our payload respect certain rules, we can call it multiple times from the same Python code.

If you have not read my previous article, here is what you need to know:

the ctypes module allows you to manipulate C structure from Python,
we can map a ctypes structure on a given address using struct.from_address,
in CPython: id(obj) returns the address of obj.

Using `ctypes.Structure`

Last time, I only used id(obj) to access the internal variables of an object. This time we need to do more complex things, so I am going to use the ctypes.Structure type for more readability.
Here is an example:

import ctypes
>>> class PyObj(ctypes.Structure):
...     _fields_ = [("ob_refcnt", ctypes.c_size_t),
...                 ("ob_type", ctypes.c_void_p)] # Must be casted

# Inheritance of ctypes.Structure: PyVarObj will also have the fields of PyObj
>>> class PyVarObj(PyObj):
...     _fields_ = [("ob_size", ctypes.c_size_t)]

>>> l = [1,2,3,4] 
>>> lm = PyVarObj.from_address(id(l))
>>> lm.ob_refcnt
1L
>>> lm.ob_size
4L
>>> lm.ob_size = 2
>>> l
[1,2]

Now we have a nice way to access to C-only data with readable and explicite names!

So, let's get to work !

Dummy code execution

Instruction Pointer Control

Before having a nice and reliable native code execution, the first step is having code execution at all. The most obvious type we want to check is built-in function, this is a Python type that represents native code.

Let's have a look at the structure of a built-in function:

/** My own comments begin by '**' **/
/** From: Include/methodobject.h **/

typedef PyObject *(*PyCFunction)(PyObject *, PyObject *); /** A standard function pointer **/

struct PyMethodDef {
    const char  *ml_name;   /* The name of the built-in function/method */
    PyCFunction  ml_meth;   /* The C function that implements it */
    int      ml_flags;  /* Combination of METH_xxx flags, which mostly
                   describe the args expected by the C func */
    const char  *ml_doc;    /* The __doc__ attribute, or NULL */
};

typedef struct {
    PyObject_HEAD
    PyMethodDef *m_ml; /* Description of the C function to call */
    PyObject    *m_self; /* Passed as 'self' arg to the C func, can be NULL */
    PyObject    *m_module; /* The __module__ attribute, can be anything */
} PyCFunctionObject;

We can see that a PyCFunctionObject is just a C function pointer with some meta-data.
Here is the ctypes structure that represents those two structs:

class PyMethodDef(ctypes.Structure):
    _fields_ = [("ml_name", ctypes.POINTER(ctypes.c_char)), ("ml_meth", ctypes.c_void_p),
            ("ml_flags", ctypes.c_int), ("ml_doc", ctypes.POINTER(ctypes.c_char))]

# Reuse of PyObj from the first example
class PyCFunctionObject(PyObj):
    _fields_ = [("m_ml", ctypes.POINTER(PyMethodDef)),
        ("m_self", ctypes.c_void_p), ("m_module", ctypes.c_void_p)]

# The only new thing is ctypes.POINTER.
# This represent a pointer to another type, the
# content can be accessed by using `.contents`

>>> mabs = PyCFunctionObject.from_address(id(abs))
>>> mabs.m_ml.contents.ml_name[0:3]
'abs'

It seems to work. Now we need to see if code execution works too.

# Launching Python into GDB to nicely handle the crash
$ gdb -q -args `which python` -i intro_type.py
...
# Get the address of the builtin `abs`
(gdb) x/i 'builtin_abs.32990'
   0x51c8f0 <builtin_abs.32990>:        sub    $0x8,%rsp 
(gdb) r
...
>>> # repeat code of last example
# See if the `abs` C code match with what we found in gdb
>>> mabs.m_ml.contents.ml_meth
'0x51c8f0'
>>> mabs.m_ml.contents.ml_meth = 0x42424242
>>> abs(-1)
# Yeah!
Program received signal SIGSEGV, Segmentation fault.
0x0000000042424242 in ?? ()
(gdb)

That was easy! but this part was the simple one. If we want to get reliable code execution many steps remain.

Executing our code

The next step is to actually execute some of our code. For that, I chose the mmap module to create executable pages and stuff them with code and data.

There is only two problems with the mmap module:

the mmap type doesn't expose the actual address of the page to Python (as usual we will use ctypes to figure that out),
the mmap "API" is not the same for Linux and Windows (I have simply created my own platform-independant wrapper for that).

How to get the real address of the mmaped page? same procedure as usual:

typedef struct {
    PyObject_HEAD
    char *      data;
    size_t      size;
    size_t      pos;    /* relative to offset */
    ... /** Other fields that are non-relevant to our goal **/
} mmap_object;

All we need is the first field, which points to our page. In the following examples, all PyObj that I describe can be found in the file intro_type.py.

import intro_type
import mmap
# Ctypes object for introspection
class PyMmap(intro_type.PyObj):
    _fields_ = [("ob_addr", ctypes.c_size_t)]


# Specific mmap class for code injection
class MyMap(mmap.mmap):
    """ A mmap that is never unmaped and that contains the page address """
    def __init__(self, *args, **kwarg):
        # Get the page address by introspection of the C struct
        m = PyMmap.from_address(id(self))
        self.addr = m.ob_addr
        # Prevent garbage collection (and so the unmapping) of the page
        m.ob_refcnt += 1

    @classmethod
    def get_map(cls, size) 
    """ Returns a page of size 'size': the implementation is not important for our example.
        See the repository if you want the actual code. ;)
    """

Now, to be sure that our first step (simple code execution) works, we will replace the abs builtin by an identity one, equivalent to lambda x : x.

I run a 64 bits Python so the example of native code here is 64 bits, if you want the 32 bits version: look at the repository. ;)
The payload used for identity is:

    # All ctypes structs and MyMap classes are in the intro_type.py file.
    import intro_type

    # RSI is the first argument we pass and RAX is the return value.
    # So we just return our first argument.

    # Normally, the first argument of a function is RDI but builtins receive a 'self' as the actual first argument.
    payload = "4889F0" + "C3"  # mov rax, rsi ; ret
    payload = payload.decode('hex')

    # Get a page.
    page = intro_type.MyMap.get_map(0x1000)
    # Put our payload at the beginning.
    page[0:len(payload)] = payload
    mabs = intro_type.PyCFunctionObject.from_address(id(abs))
    # Change the function pointer to the address of our payload.
    mabs.m_ml.contents.ml_meth = page.addr

    >>> abs(4)
    4
    # Well it doesn't segfault: good news!
    >>> abs(-4)
    -4 
    # Well this is not the real abs anymore: another good news!
    >>> x = (1,2,3)
    >>> abs(x)
    (1,2,3)
    >>> abs(x)
    <refcnt 0 at 0x7f48b121eaa0>
    # Yep, do not forget that our function doesn't increment the refcount of the returned object.
    # This version can lead to crash!

Perfect, our dummy code execution works!

Object hand-crafting

The next step of our journey is the hand-crafting of some Python objects.
Why? because it's fun and it will be useful for our final goal!

Writing raw objects in memory

The first step of object crafting is to write the object data in memory.
For that, we are going to use the struct module (another really useful module). In this example we write the data for an int object (in 64 bits Python):

/** here is the struct that represents an int **/
typedef struct {
    PyObject_HEAD
    long ob_ival;
} PyIntObject;

So:

import struct

>>> help(struct.pack_into)
pack_into(...)
   Pack the values v1, v2, ... according to fmt.
   Write the packed bytes into the writable buffer buf starting at offset.
# This module heavily uses the struct format.
# Doc is here: https://docs.python.org/2/library/struct.html.

# Write three unsigned long long in little endian at offset 0 of 'page'.
>>> struct.pack_into("<QQQ", page, 0,
        2,       #ob_refcnt
        id(int), # ob_type            
        42)      # ob_ival

# That's all!
# Problem is: I have no way to prove you that it works before the next sub-step. :(
# Trust me!

The problem is: even if we have crafted a valid int in memory, it is not usable since the Python interpreter as no reference to it.
The next step is to create a reference to our crafted objet.

Giving life to the crafted object

Here is the problem:

we have a valid object in memory,
we have its address,
the Python interpreter have no reference to it,
we need to create the reference to give life to our object.

All we have to do is call a function that can return arbitrary valid references.

The easiest way is to return the address (so the object) corresponding to the value of the int we receive as parameter.
Think of it as the reverse id function:

# How our function should work:
>>> x = (1,2,3)
>>> id(x)
140147953449472
>>> my_func(140147953449472)
(1,2,3)

If this description is not enough, here is what the C code should be:

PyObject* my_func(PyIntObject* i)
{
    return (PyObject *) i->ob_ival;
}
/** Here we can see can the function can be harmful if wrongly used **/

So, here are the payloads for this function:

# int-to-object payloads:

# 64bits
# mov rax, [rsi + 0x10] # rsi is the address of the int; [rsi + 0x10] the ob_ival of the int
# ret
bootstrap_64 = "48 8b 46 10 C3".replace(" ", "").decode("hex")

# 32bits
# mov eax, [esp + 8] # [esp + 8] -> builtin argument
# mov eax, [eax + 0x8] # [eax + 14] ob_ival
# ret
bootstrap_32 = "8B 44 24 08 8B 40 08 C3".replace(" ", "").decode("hex")

Putting everything together, we get:

import struct
import intro_type

# Write our object into a page.
>>> page = intro_type.MyMap.get_map(0x1000)
>>> struct.pack_into("<QQQ", page, 0, 2, id(int), 42)
# Put the payload just after our int object in the page.
>>> bootstrap_64 = "48 8b 46 10 C3".replace(" ", "").decode("hex")
>>> page[24:24 + len(bootstrap_64)] = bootstrap_64
# Replace abs by our function.
>>> mabs = intro_type.PyCFunctionObject.from_address(id(abs))
>>> mabs.m_ml.contents.ml_meth = page.addr + 24
# Give life to our int!
>>> x = abs(int(page.addr))
#IT'S ALIVE!!
>>> x
42
# Proof that our int is not the original 42 (addresses comparaison).
>>> x is 42
False
# Proof that we have an object at the beginning of our page.
>>> hex(id(x))
'0x7ffff7ff8000L'
>>> hex(page.addr)
'0x7ffff7ff8000L'

We are now able to create Python objects out of the great void! We are near our goal.

Reliable native code execution

The problems with the previous dummy code execution are:

we have a limited number of different functions (limited by the number of existing builtins),
each time we create a new dummy-execution we lose a builtin,
we put the interpreter in an instable state as any part of the code could potentially use a replaced buitlin without knowing.

With the tools we acquired previously, the obvious solution to these problems is: we are going to craft our own builtin objects!

By doing so, we can have any number of different native functions without putting the interpreter in an unstable state by replacing builtins.

So here are the steps to create a complete builtin object:

Write our payload on a page.
Write a PyMethodDef on the page with the following values:
- ml_name: address of a C string (using ctypes.addressof(ctypes.create_string_buffer(name))),
- ml_math: address of the payload,
- ml_flags: 8 for METH_O (method with a simple object, can be changed depending of our need),
- ml_doc: 0 (no need to document so None is cool) or address of a C string.
Write a PyCFunctionObject on the page with the following values:
- ob_refcnt: 0x42 (we really don't want it to be collected since not allocated by Python),
- ob_type: <type 'builtin_function_or_method'> so id(type(abs)) is fine,
- m_ml: address of PyMethodDef,
- m_self: 0,
- m_module: 0 (or any PyObj address).
Give life to our object (see previous section).

Here is a demo for x86_64:

import struct
import intro_type 
def create_builtin(code, name="crafted builtin"):
    """ Dummy implementation of 'create_builtins' for x86_64"""
    # Create page.
    m = intro_type.MyMap.get_map(0x1000)
    # Get a C string with the name.
    cname = ctypes.create_string_buffer(name)
    name_addr = ctypes.addressof(cname)
    # Write the payload on the page.
    m[0:len(code)] = code
    # Write the PyMethodDef on the page.
    struct.pack_into("<QQQQ", m, 0x100,
        name_addr, m.addr, 8, name_addr)
    # Write the PyCFunctionObject on the page.
    struct.pack_into("<QQQQQ", m, 0x200,
        42, id(type(min)), m.addr + 0x100, 0, 0)
    # Give life to our object.
    return give_life_to_obj(int(m.addr + 0x200))


# Create page with our `give_life_to_obj` payload.
code_page = intro_type.MyMap.get_map(0x1000)
bootstrap_64 = "48 8b 46 10 C3".replace(" ", "").decode("hex")
code_page[0:len(bootstrap_64)] = bootstrap_64
# Replace abs by our function.
mabs = intro_type.PyCFunctionObject.from_address(id(abs))
mabs.m_ml.contents.ml_meth = code_page.addr
give_life_to_obj = abs
# A builtin that trigger a breakpoint when executed.
breakpoint = create_builtin("\xcc", "breakpoint")

>>> breakpoint
<built-in function breakpoint>
>>> breakpoint.__name__
'breakpoint'
>>> breakpoint(None)
[1]    4870 trace trap (core dumped)  python -i demo.py
# woot! breakpoint triggered \o/

The only problem now is the bootstrap of give_life_to_obj: to give life to a new builtin object we would need to already have the dummy execution. So we are going to:

replace abs by our give_life_to_object payload,
create our first builtin: real_give_life_to_object,
put back the original code of abs,
use our new builtin to create the builtin objects.

Bootstrap code:

def bootstrap():
    payload_64 = "48 8b 46 24 C3".replace(" ", "").decode("hex")
    # 'replace_builtin' replaces the code of its first argument and returns the old 'ml_meth' for restoration.
    old_abs = replace_builtin(abs, payload_64)
    # In my final code 'give_life_to_object' is named 'get_obj_at'.
    get_obj_at = abs
    new_obj_at = self.create_builtin(payload_64, "get_obj_at")
    # Just restore old code of builtin abs.
    restore_builtins(abs, old_abs)
    get_obj_at = new_obj_at

With that, we are able to create all the builtins we want!

Windows/Linux and 32/64 bits compatibility

Before releasing the code I worked at getting a working Linux and Windows version for both the 32 and 64 architectures.
Here are what I had to do:

Windows/Linux:

No change except MyMap class (mmap wrapper).
Get the OS using platform.system().
Call mmap with correct arguments depending on the system.

32/64 bits:

Get the architecture using: platform.architecture()[0].
Adapt:
- the payload for obj_at (describe as give_life_to_obj in the article),
- the format strings for struct.pack_into.

Last problem I had was with the way I gave life to object. It doesn't work very well on 32 bits systems because int(addr) may return a long, which is not compatible with our payload.

So in the final version, I use the str type and a string such as "\x41\x42\x43\x44" to pass the address of the object I want to give life to. This only change the way the payload is called, and how the offset the payload is looked for in the object.

Final result

The final result is pretty simple, all the magic is put into the CodeInjector class in the file code_injection.py. Just use it like this:

>>> import code_injection
>>> code_injection.CodeInjector
<class 'code_injection.CodeInjector'>
>>> ci = code_injection.CodeInjector()
>>> b = ci.create_builtin("\xcc", "my_breakpoint_builtin")
>>> b
<built-in function my_breakpoint_builtin>

Conclusion

I had a LOT of fun creating new objects and having Python interface so nicely with my own native payloads. Of course, this kind of horrible stuff (Pythonicly speaking) should not be used in real projects...
I hope you enjoyed this article and learned something (I did !).
Also, struct is a super-useful module for low level stuff! I discovered it during my first CTF and use it at almost every CTFs now.

The complete implementation is avalaible at https://bitbucket.org/Hakril/python-native-execution/.

Permalink & comments

Understanding Python by breaking it

Written by Hakril
2014-03-27 16:57:42

I have recently discovered ctypes and decided to use it to manipulate Python in a way I should not. The purpose was to see some of the Python's internals at work directly from the interpreter.
The article studies to the CPython implementation of Python 2.7.

Reference counting and the Garbage Collector

One thing to know about Python is that its Garbage Collector uses reference counting. It means that every Python object has a ref counter that knows how many references are pointing to the object.
Our first goal is to visualise that ref counter for any object from the Python interpreter.

The principal tool we are going to use is:

ctypes._CData.from_address that allows us to create a ctypes mapping on any address we want:

Example (a bad one)

1
2
3

import ctypes
ctypes.c_size_t.from_address(0)
# segfault! (to dereference 0 is not a good idea)

So we now have the ability to read any value in our address space. Next step is to get an object's address. For that, we will use a builtin function:

id() : returns the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime.
CPython implementation detail: This is the address of the object in memory.

All we need to do now is to get the offset of the ref counter in the Python object:

/** My own comments begin by '**' **/
/** From: Includes/object.h **/

typedef struct _object {
    PyObject_HEAD
} PyObject;

/* PyObject_HEAD defines the initial segment of every PyObject. */
#define PyObject_HEAD    \
    _PyObject_HEAD_EXTRA   /** empty in standard build **/ \
    Py_ssize_t ob_refcnt;  /**ref counter **/ \
    struct _typeobject *ob_type; /** pointer to the type **/

We can see that ob_refcnt is the first field of the struct, and therefore addr(ob_refcnt) == addr(object) == id(object).

So our function will simply be :

1
2
3

def get_ref(obj):
    """ returns a c_size_t, which is the refcount of obj """
    return ctypes.c_size_t.from_address(id(obj))

Let's try it!

>>> import ctypes
>>> def get_ref(obj):
...     """ returns a c_size_t, which is the refcount of obj """
...     return ctypes.c_size_t.from_address(id(obj))
...
# the c_ulong is not a copy of the address
# so any modification of the ob_refcnt are directly visible
>>> l = [1,2,3,4]
>>> l_ref = get_ref(l)
>>> l_ref 
c_ulong(1L) # there is just one reference on the list (l)

>>> l2 = l
>>> l_ref 
c_ulong(2L)# two references on the list (l and l2)

>>> del l
>>> l_ref
c_ulong(1L)

>>> del l2
>>> l_ref
c_ulong(0L) # no more reference!

>>> another_list = [0, 0, 7]
>>> l_ref
c_ulong(1L) # woot : old list's ob_refcnt have changed

This example is pretty straightforward, but the last two lines need an explanations. Why would creating a new list change the ref counting of the old one ?
This is the work of the Garbage Collector! When the old_list ref count go to 0, the GC "cleans" the list and put it in a pool of unused lists that will be used the next list to be created!

Proof:

>>> l1 = [1,2,3]
>>> l2 = [1,2,3]
>>> hex(id(l1))
'0xa367e8'
>>> hex(id(l2))
'0xa36d40'  # not the same address as l1 (hopefully)
>>> del l1  # l1 is garbage collected
>>> l3 = [8,0,0,8,5]
>>> hex(id(l3))
'0xa367e8'  # l1 address is reused for the new list l3

Special case : `int` and `str`

In the CPython implementation of int, the references to [-5 ; 256] are shared. And we have the tools to verify that!

# with "non-shared" int
>>> x1 = 257
>>> x2 = 257
>>> get_ref(x1)
c_ulong(1L)
>>> get_ref(x2)
c_ulong(1L)
>>> id(x1) == id(x2)
False  # logic : differents objects

>>> x3 = 0
>>> get_ref(x3)
c_ulong(409L) #  all ref to "0" point the the same "int(0)" object

Same thing happens with strings!

>>> p = "python"
>>> get_ref(p)
c_ulong(8L)
>>> p2 = "python"
>>> get_ref(p2)
c_ulong(9L)
>>> id(p) == id(p2)
True  # the two variables are references to the same string object!

Breaking reference counter

For now, we have just read the value of ob_refcnt. What if we change it ?
We can rewrite ob_refcnt to force a premature garbage collection!

>>> x = [1,2,3,4]
>>> x_ref = get_ref(x)
>>> xx = x
>>> x_ref
c_ulong(2L)
>>> x_ref.value = 1 # GC now thinks that we have just one reference to the list
>>> del xx # ob_refcnt == 0 -> garbage collection!
>>> x
[] # garbage collection of a list sets its size to 0
>>> another_list = [0, 4, 8, 15 ,16, 23, 42]
>>> x
[0, 4, 8, 15 ,16, 23, 42]  # always the same "reuse list" tricks!

Of course: this kind of code put the interpreter into an unstable state that will likeky cause crashes!

Messing with tuple

Other fun and simple things to mess with, are tuples. The documentation says "tuple is an immutable sequence type" : let's try to change this!

If you look at the CPython implementation you can find:

/** From: Includes/tupleobject.h **/
typedef struct {
    PyObject_VAR_HEAD
    PyObject *ob_item[1]; /** An array of pointer to PyObject **/
} PyTupleObject;

/** From: Includes/object.h **/
#define PyObject_VAR_HEAD     \
    PyObject_HEAD    /** See: part 1 **/ \
    Py_ssize_t ob_size; /* Number of items in variable part */

So the two importants things about tuples are:

a tuple is basically just an array of pointers to PyObject and,
a tuple object is composed of three size_t (ref_count, ob_type, ob_size) and the array of pointers.

Based on that and the memmove implementation in ctypes we can build a tuplecopy function:

def tuplecpy(dst, src, begin_offset):
    """
       Of course this function should NEVER be used in real code
       It  will probably result in segfaults/crashes
       - copy tuple(src) to dst[begin_offset:] tuple
       - remember id(x) -> addressof(x)
    """
    OFFSET = ctypes.sizeof(ctypes.c_size_t) * 3
    PTR_SIZE = ctypes.sizeof(ctypes.c_size_t)
    dst_addr = id(dst) + OFFSET + PTR_SIZE * begin_offset
    src_addr = id(src) + OFFSET
    ctypes.memmove(dst_addr, src_addr, len(src) * PTR_SIZE)

Let's try this new toy!

>>> x1 = tuple("A" * 4)
>>> x2 = tuple("B" * 2)

>>> print ("BEFORE -> x1 = {0} | x2  = {1}".format(x1, x2))
>>> tuplecpy(x1, x2, 2)
>>> print ("AFTER  -> x1 = {0} | x2  = {1}".format(x1, x2))

#Result:
#    BEFORE -> x1 = ('A', 'A', 'A', 'A') | x2  = ('B', 'B')
#    AFTER  -> x1 = ('A', 'A', 'B', 'B') | x2  = ('B', 'B')

It works! but why is this foundamentally bad (besides the fact of modifying tuples)?
The answer is in the first section: we have created new references to multiple objects (the ones in the src tuple) without any update of their ref count

>>> x1 = tuple("B" * 4)
>>> x2 = ([1,2,3,4],)

# problem is : 
>>> t = get_ref(x2[0])
>>> t
c_ulong(1L)
>>> del x2 # no more references
>>> x1
('B', 'B', 'B', <refcnt 0 at 0xacac68>)  # nice printing of an object with ob_refcnt == 0 :)

>>> l = [0, 42, 69]
>>> x1 # GC IN ACTION \o/
('B', 'B', 'B', [0, 42, 69]) # Same principle as before

We can just fix our function so that it updates the reference counter! (but it doesn't mean that we should now use this function in real code..)

def tuplecpy(dst, src, begin_offset):
    """
       Of course this function should NEVER be used in real code
       It  will probably result in segfaults/crashes
       - copy tuple(src) to dst[begin_offset:] tuple
       - remember id(x) -> addressof(x)
    """
    OFFSET = ctypes.sizeof(ctypes.c_size_t) * 3
    PTR_SIZE = ctypes.sizeof(ctypes.c_size_t)
    for obj in src:
        x = get_ref(obj)
        x.value = x.value + 1 # manually update ob_refcnt
    dst_addr = id(dst) + OFFSET + PTR_SIZE * begin_offset
    src_addr = id(src) + OFFSET
    ctypes.memmove(dst_addr, src_addr, len(src) * PTR_SIZE)

Diving into function and code objects.

Now that we can modify tuple, we are going to see the impact of modifying some important tuples in function objects and code objects

Code object

As explained in the documentation : "Code objects represent byte-compiled executable Python code"

In Python 2 code objects are in the func_code variable of function objects.

>>> import dis  # bytecode disassembler module
>>> def time_2(x):
...     return 2 * x
... 
>>> time_2.func_code
<code object time_2 at 0x9ee230, file "<stdin>", line 1>
>>> time_2(x=4)
8
>>> dis.dis(time_2)
          0 LOAD_CONST               1 (2)
          3 LOAD_FAST                0 (x)
          6 BINARY_MULTIPLY
          7 RETURN_VALUE

If we look at the disassembly of the function (which is pretty easy to read), we can see that the function:

LOAD the constant (2),
LOAD the variable (x),
MULTIPLY the two values,
RETURN the result.

If we focus on the first instruction (LOAD_CONST) we can see the following things:

LOAD_CONST is called with an argument 1
that argument points to the const 2

The fact is that 1 is just an offset into the code object's constants tuple.

>>> time_2.func_code.co_consts  # tuple of constants of our code object
(None, 2)
>>> time_2.func_code.co_consts[1]
2 # yep we were right!
# So what if we change this value ?
>>> tuplecpy(time_2.func_code.co_consts, (10,), 1)
>>> time_2.func_code.co_consts # tuple of constants of our code object
(None, 10)
>>> time_2(4)
40 # woot! It works!

So we are able to modify the constant used in the function.
Can we do the same for the LOAD_FAST instruction ?

>>> time_2(4) # works on the modified function!
40
>>> time_2(x=4) # call by dict
40
>>> time_2.func_code.co_varnames # tuple of local variables and argnames
('x',)
>>> tuplecpy(time_2.func_code.co_varnames, ('new_arg_name',), 0)
>>> time_2(x=4)
# x is not the argument name anymore!
TypeError: time_2() got an unexpected keyword argument 'x'
>>> time_2(new_arg_name=4)
40
>>> dis.dis(time_2) # see the function changes:
          0 LOAD_CONST               1 (10) # const changed
          3 LOAD_FAST                0 (new_arg_name) # arg name changed
          6 BINARY_MULTIPLY
          7 RETURN_VALUE

So, yeah, we have modified the behaviour of the function pretty well!
And here is a last example that I find very fun:

>>> def nop(): pass  # the function that does nothing
...
>>> dis.dis(nop)
         0 LOAD_CONST               0 (None)
         3 RETURN_VALUE
>>> nop.func_code.func_consts
(None,)
>>> l = []
>>> tuplecpy(nop.func_code.func_consts, (l,), 0)  # the function will always return the same list!
>>> nop()
[]
>>> l.append(2)
>>> nop()
[2]
>>> dis.dis(nop)
         0 LOAD_CONST               0 ([2])
         3 RETURN_VALUE

Function closure!

Finally, we are going to play with closure! Closure appear in function generating functions. Wikipedia page

>>> def gen_return_function(x):
...     def f():
...         return x  # in f(): x is going to be a closure
...     return f
... 
>>> ret_42 = gen_return_function(42)
>>> ret_42()
42
>>> dis.dis(ret_42)
          0 LOAD_DEREF               0 (x)
          3 RETURN_VALUE

This new instruction LOAD_DEREF is the one that handles closure. The question is: where is the closure stored?
The answer is simple : closures are in ret_42.func_closure. why not in the object code ? Because it allows all generated functions to share the same object code but with different closures!

>>> ret_23 = gen_return_function(23)
>>> ret_42 = gen_return_function(42)
>>> ret_42.func_code is ret_23.func_code
True
>>> ret_42.func_closure
(<cell at 0xa54398: int object at 0x95d748>,) # closures are always inside a cell object
>>> ret_42.func_closure[0].cell_contents
42

# We are not going to rewrite the tuple but directly the contents of the cell instead.
# We will still use tuplecpy but with an offset of (-1) because cell have no ob_size.

>>> x = 1337
>>> tuplecpy(ret_42.func_closure[0], (x,), -1)
>>> ret_42()
1337 # closure modified :)

That's all!

Conclusion

I really enjoyed messing with Python (one more time) and I hope you enjoyed it too. I think that this method is a good way to quickly and easily have a look at some Python's internals and how they work.

Lastly, ctypes is also a very powerful module to do legitimate work, and it is also capable of loading shared libraries and call C functions. If you have not already used ctypes, I strongly recommand you to read the ctypes Python documentation, and give it a try!

Permalink & comments

Understanding Python execution from inside: A Python assembly tracer

Written by Hakril
2015-01-19 19:46:41

Lastly, I have been looking at the Python's execution model. I was curious about the implementation of some opcodes like YIELD_VALUE and YIELD_FROM, how were compiled list comprehensions, generator expressions and other fun Python features, and what happens at the bytecode level when exceptions were raised. Reading the CPython code is really instructive, but I was feeling like something was missing to fully understand the bytecode execution and the stack evolution. Yeah, GDB is cool but I'm a lazy guy who wanted a high-level interface and code some Python.

So my goal was to create a bytecode-level tracing API like the one offered by sys.settrace but with a finer granularity. This exercise was perfect to practice my C-into-Python coding.

What we are going to need is:

A new opcode in the CPython interpreter
A way to inject the opcode in Python bytecode
Some Python code to handle the opcode on the Python side

In this article everything is based on Python3.5.

A new opcode for CPython

Our new opcode: `DEBUG_OP`

This new opcode, that I will call DEBUG_OP, is my first try at writing C code for CPython. I tried to keep it as simple as I could.

What we want is a way to call some Python code whenever our opcode is executed. We also want to be able to retrieve some data about our execution context. Our opcode will pass it as parameters to our callback. The useful information I identified is:

The content of the stack
The frame object that executed DEBUG_OP

So all our DEBUG_OP needs to do is:

Find the callback
Create a list with the content of the stack
Call the callback with that list and the current frame as parameters

Sounds easy! So let's go!

Disclaimer: the following explanations and code are the result of a LOT of segfaults.

First thing to do is to give a name and a value to our opcode. For that, we need to go into Include/opcode.h

/** My own comments begin by '**' **/
/** From: Includes/opcode.h **/

/* Instruction opcodes for compiled code */

/** We just have to define our opcode with a free value
    0 was the first one I found **/
#define DEBUG_OP                0

#define POP_TOP                 1
#define ROT_TWO                 2
#define ROT_THREE               3

The easy part is done. Now we have to actually code our opcode behaviour.

Implementing `DEBUG_OP`

First question we need to ask ourself before even thinking about the implementation of DEBUG_OP is "What my interface is going to be?".

It's cool to have a shiny new opcode that calls some code but what code is it going to call exactly? How will it retrieve the callback function? I chose what looked like the simplest solution: a fixed name function in the frame globals.

The question is: "how do I look for a fixed C-string in a dict?"

To answer this question we can look at some fixed identifiers used in the Python main loop: the ones associated with context managers __enter__ and __exit__.

We see that it's used in the SETUP_WITH opcode:

/** From: Python/ceval.c **/
TARGET(SETUP_WITH) {
_Py_IDENTIFIER(__exit__);
_Py_IDENTIFIER(__enter__);
PyObject *mgr = TOP();
PyObject *exit = special_lookup(mgr, &PyId___exit__), *enter;
PyObject *res;

Now, a look at the _Py_IDENTIFIER macro:

/** From: Include/object.h **/

/********************* String Literals ****************************************/
/* This structure helps managing static strings. The basic usage goes like this:
   Instead of doing

       r = PyObject_CallMethod(o, "foo", "args", ...);

   do

       _Py_IDENTIFIER(foo);
       ...
       r = _PyObject_CallMethodId(o, &PyId_foo, "args", ...);

   PyId_foo is a static variable, either on block level or file level. On first
   usage, the string "foo" is interned, and the structures are linked. On interpreter
   shutdown, all strings are released (through _PyUnicode_ClearStaticStrings).

   Alternatively, _Py_static_string allows to choose the variable name.
   _PyUnicode_FromId returns a borrowed reference to the interned string.
   _PyObject_{Get,Set,Has}AttrId are __getattr__ versions using _Py_Identifier*.
*/
typedef struct _Py_Identifier {
    struct _Py_Identifier *next;
    const char* string;
    PyObject *object;
} _Py_Identifier;

#define _Py_static_string_init(value) { 0, value, 0 }
#define _Py_static_string(varname, value)  static _Py_Identifier varname = _Py_static_string_init(value)
#define _Py_IDENTIFIER(varname) _Py_static_string(PyId_##varname, #varname)

Well, at least the documentation is explicit! With a little more research we can find the dict function we were looking for: _PyDict_GetItemId.

So the lookup part of our opcode will look like:

 /** Our callback function will be named op_target **/
PyObject *target = NULL;
_Py_IDENTIFIER(op_target);
target = _PyDict_GetItemId(f->f_globals, &PyId_op_target);
if (target == NULL && _PyErr_OCCURRED()) {
    if (!PyErr_ExceptionMatches(PyExc_KeyError))
        goto error;
    PyErr_Clear();
    DISPATCH();
}

To be completely explicit, this code needs a few explanations:

f is our current frame and f->f_globals is its globals
If we don't find op_target, we check if the exception is a KeyError
goto error; is the main-loop's way of raising the exception
PyErr_Clear() suppresses the current exception and DISPATCH() launches the evaluation of the next opcode

The next step is to gather the information we want (the stack):

 /** This code create a list with all the values on the current stack **/
PyObject *value = PyList_New(0);
for (i = 1 ; i <= STACK_LEVEL(); i++) {
    tmp = PEEK(i);
    if (tmp == NULL) {
        tmp = Py_None;
    }
    PyList_Append(value, tmp);
}

The last step is actually calling our callback! For that, we will use call_function and learn how to use it by looking at the opcode CALL_FUNCTION:

/** From: Python/ceval.c **/
TARGET(CALL_FUNCTION) {
    PyObject **sp, *res;
    /** stack_pointer is a local of the main loop.
        It's the pointer to the stacktop of our frame **/
    sp = stack_pointer;
    res = call_function(&sp, oparg);
    /** call_function handles the args it consummed on the stack for us **/
    stack_pointer = sp;
    PUSH(res);
    /** Standard exception handling **/
    if (res == NULL)
        goto error;
    DISPATCH();
}

With all that information, we are able to craft our DEBUG_OP:

TARGET(DEBUG_OP) {
    PyObject *value = NULL;
    PyObject *target = NULL;
    PyObject *res = NULL;
    PyObject **sp = NULL;
    PyObject *tmp;
    int i;
    _Py_IDENTIFIER(op_target);

    target = _PyDict_GetItemId(f->f_globals, &PyId_op_target);
    if (target == NULL && _PyErr_OCCURRED()) {
        if (!PyErr_ExceptionMatches(PyExc_KeyError))
            goto error;
        PyErr_Clear();
        DISPATCH();
    }
    value = PyList_New(0);
    Py_INCREF(target);
    for (i = 1 ; i <= STACK_LEVEL(); i++) {
        tmp = PEEK(i);
        if (tmp == NULL)
            tmp = Py_None;
        PyList_Append(value, tmp);
    }

    PUSH(target);
    PUSH(value);
    Py_INCREF(f);
    PUSH(f);
    sp = stack_pointer;
    res = call_function(&sp, 2);
    stack_pointer = sp;
    if (res == NULL)
        goto error;
    Py_DECREF(res);
    DISPATCH();
}

As I didn’t had that much experience with the C code in CPython, I might have missed something (I am looking at you refcounting). Feel free to correct me in this case. ;)

It compiles! So it works!

Well not really... It might seem good but this code will fail when we will try to execute our first DEBUG_OP. Since 2008, Python use computed goto (you can read more about computed goto in Python here). So we need to update the goto jump table: we just need to go into Python/opcode_targets.h and do the following change:

/** From: Python/opcode_targets.h **/
/** Easy change since DEBUG_OP is the opcode number 1 **/
static void *opcode_targets[256] = {
    //&&_unknown_opcode,
    &&TARGET_DEBUG_OP,
    &&TARGET_POP_TOP,
    /** ... **/

And that's all! We now have a fully working new opcode. The only problem is that our opcode is never called as it is inexistent in the compiled bytecode. Now we need to inject DEBUG_OP in the bytecode of some functions.

Injecting opcode `DEBUG_OP` into Python bytecode

There are many ways to insert a new opcode into the Python bytecode:

We can use the peephole optimizer just like Quarkslab did
We can do some changes in the bytecode generation code
We can (and we will) just modify the bytecode of some functions at runtime!

Yep, coding that new opcode was enough C for today, let's get back to the source of Understanding Python: some hacky, strange (somewhat magical) Python!

So, what we are going to do is:

Take the code object of the function we want to trace
Rewrite the bytecode to inject some DEBUG_OP
Put the new code object in place

Reminder about code object

If you have never heard of code object, there was a little introduction somewhere in my first article. There are also some good documentation on the net and the doc page as always (Ctrl+F "code objects").

One thing to note in the context of this article is that code objects are not mutable:

Python 3.4.2 (default, Oct  8 2014, 10:45:20)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = lambda y : 2
>>> x.__code__
<code object <lambda> at 0x7f481fd88390, file "<stdin>", line 1>
>>> x.__code__.co_name
'<lambda>'
>>> x.__code__.co_name = 'truc'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: readonly attribute
>>> x.__code__.co_consts = ('truc',)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: readonly attribute

But don't worry, we will find a way to get around.

Our tools

To do these bytecode modifications we are going to need few tools:

The dis module, used to disassemble and analyse bytecode
dis.Bytecode, a new feature from Python3.4 that is super useful for disassembly and bytecode analysis!
A way to easily modify code object

dis.Bytecode disassembles a code object and give us useful information about the opcode, argument and context:

# Python3.4
>>> import dis
>>> f = lambda x: x + 3
>>> for i in dis.Bytecode(f.__code__): print (i)
...
Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=0, starts_line=1, is_jump_target=False)
Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=3, argrepr='3', offset=3, starts_line=None, is_jump_target=False)
Instruction(opname='BINARY_ADD', opcode=23, arg=None, argval=None, argrepr='', offset=6, starts_line=None, is_jump_target=False)
Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=7, starts_line=None, is_jump_target=False)

To be able to modify code objects, I just created a small class that clones a code object, allows to modify the values we want and generates a new code object.

class MutableCodeObject(object):
    args_name = ("co_argcount", "co_kwonlyargcount", "co_nlocals", "co_stacksize", "co_flags", "co_code",
                  "co_consts", "co_names", "co_varnames", "co_filename", "co_name", "co_firstlineno",
                   "co_lnotab", "co_freevars", "co_cellvars")

    def __init__(self, initial_code):
        self.initial_code = initial_code
        for attr_name in self.args_name:
            attr = getattr(self.initial_code, attr_name)
            if isinstance(attr, tuple):
                attr = list(attr)
            setattr(self, attr_name, attr)

    def get_code(self):
        args = []
        for attr_name in self.args_name:
            attr = getattr(self, attr_name)
            if isinstance(attr, list):
                attr = tuple(attr)
            args.append(attr)
        return self.initial_code.__class__(*args)

Easy to use, that's one problem solved!

>>> x = lambda y : 2
>>> m = MutableCodeObject(x.__code__)
>>> m
<new_code.MutableCodeObject object at 0x7f3f0ea546a0>
>>> m.co_consts
[None, 2]
>>> m.co_consts[1] = '3'
>>> m.co_name = 'truc'
>>> m.get_code()
<code object truc at 0x7f3f0ea2bc90, file "<stdin>", line 1>

Testing our new opcode

Now that we have the basic tools to inject some DEBUG_OP, we should be able to verify if our implementation is usable.

For that, we are just going to add our opcode in the simplest function ever.

from new_code import MutableCodeObject

def op_target(*args):
    print("WOOT")
    print("op_target called with args <{0}>".format(args))

def nop():
    pass

new_nop_code = MutableCodeObject(nop.__code__)
new_nop_code.co_code = b"\x00" + new_nop_code.co_code[0:3] + b"\x00" + new_nop_code.co_code[-1:]
new_nop_code.co_stacksize += 3

nop.__code__ = new_nop_code.get_code()

import dis
dis.dis(nop)
nop()


# Don't forget that ./python is our custom Python implementing DEBUG_OP
hakril@computer ~/python/CPython3.5 % ./python proof.py
  8           0 <0>
              1 LOAD_CONST               0 (None)
              4 <0>
              5 RETURN_VALUE
WOOT
op_target called with args <([], <frame object at 0x7fde9eaebdb0>)>
WOOT
op_target called with args <([None], <frame object at 0x7fde9eaebdb0>)>

Sounds like it works! One line might need some explanations: new_nop_code.co_stacksize += 3:

co_stacksize represents the stack size needed by the code object
Our DEBUG_OP push 3 values to the stack, so we need to reserve space for it

Now we need to be able to inject our opcode in every Python functions! Be brave!

Rewriting bytecode

As we have seen in the last example, rewriting Python bytecode sounds easy! To inject our DEBUG_OP between every opcode, all we have to do is to get the offset of every opcode (injecting our opcode into arguments would be harmful) and inject our opcode at these offsets. The offsets will be easy to get, using dis.Bytecode.

Something like that:

def add_debug_op_everywhere(code_obj):
    # We get every instruction offset in the code object
    offsets = [instr.offset for instr in dis.Bytecode(code_obj)]
    # And insert a DEBUG_OP at every offset
    return insert_op_debug_list(code_obj, offsets)

def insert_op_debug_list(code, offsets):
    # We insert the DEBUG_OP one by one
    for nb, off in enumerate(sorted(offsets)):
        # Need to ajust the offsets by the number of opcodes already inserted before
        # That's why we sort our offsets!
        code = insert_op_debug(code, off + nb)
    return code

# Last problem: what does insert_op_debug looks like?

One might think (based on the previous example) that our insert_op_debug will just add a "\x00" at the specified offset, but there is a TRAP! Our first example of DEBUG_OP insertion was a simple code without any branch. To have a fully functioning insert_op_debug, we need to take care of such branching opcodes.

Python branches are really simple, there are two types of branches:

Absolute branches: the branch will look like Instruction_Pointer = argument(instruction)
Relative branches: the branch will look like Instruction_Pointer += argument(instruction)
- Relative branches are always forward

As we want those branches to be still valid after our DEBUG_OP insertions, we will need to rewrite those instructions arguments. So here is the logic I used:

For every relative branch before our insertion offset:
- If the destination is strictly superior to our insertion offset, add 1 to the instruction argument
- If it is equal, no need to add 1, it will allow us the execute our DEBUG_OP between the jump and its target
- If it's less, then our DEBUG_OP won't change the distance between the JUMP and the destination
For every absolute branch in the code object:
- If the destination is strictly superior to our insertion offset, add 1 to the instruction argument
- No modification if it is equal, for the same reason as the relative branches
- If it's less, our DEBUG_OP insertion won't change the address of the destination

Here is the implementation:

# Helper
def bytecode_to_string(bytecode):
    if bytecode.arg is not None:
        return struct.pack("<Bh", bytecode.opcode, bytecode.arg)
    return struct.pack("<B", bytecode.opcode)

# Dummy class for bytecode_to_string
class DummyInstr:
    def __init__(self, opcode, arg):
        self.opcode = opcode
        self.arg = arg

def insert_op_debug(code, offset):
    opcode_jump_rel = ['FOR_ITER', 'JUMP_FORWARD', 'SETUP_LOOP', 'SETUP_WITH', 'SETUP_EXCEPT', 'SETUP_FINALLY']
    opcode_jump_abs = ['POP_JUMP_IF_TRUE', 'POP_JUMP_IF_FALSE', 'JUMP_ABSOLUTE']
    res_codestring = b""
    inserted = False
    for instr in dis.Bytecode(code):
        if instr.offset == offset:
            res_codestring += b"\x00"
            inserted = True
        if instr.opname in opcode_jump_rel and not inserted: #relative jump are always forward
            if offset < instr.offset + 3 + instr.arg: # inserted beetwen jump and dest: add 1 to dest (3 for size)
                #If equal: jump on DEBUG_OP to get info before exec instr
                res_codestring += bytecode_to_string(DummyInstr(instr.opcode, instr.arg + 1))
                continue
        if instr.opname in opcode_jump_abs:
            if instr.arg > offset:
                res_codestring += bytecode_to_string(DummyInstr(instr.opcode, instr.arg + 1))
                continue
        res_codestring += bytecode_to_string(instr)
    # replace_bytecode just replaces the original code co_code
    return replace_bytecode(code, res_codestring)

We can look at the result:

>>> def lol(x):
...     for i in range(10):
...         if x == i:
...             break

>>> dis.dis(lol)
101           0 SETUP_LOOP              36 (to 39)
              3 LOAD_GLOBAL              0 (range)
              6 LOAD_CONST               1 (10)
              9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             12 GET_ITER
        >>   13 FOR_ITER                22 (to 38)
             16 STORE_FAST               1 (i)

102          19 LOAD_FAST                0 (x)
             22 LOAD_FAST                1 (i)
             25 COMPARE_OP               2 (==)
             28 POP_JUMP_IF_FALSE       13

103          31 BREAK_LOOP
             32 JUMP_ABSOLUTE           13
             35 JUMP_ABSOLUTE           13
        >>   38 POP_BLOCK
        >>   39 LOAD_CONST               0 (None)
             42 RETURN_VALUE
>>> lol.__code__ = transform_code(lol.__code__, add_debug_op_everywhere, add_stacksize=3)


>>> dis.dis(lol)
101           0 <0>
              1 SETUP_LOOP              50 (to 54)
              4 <0>
              5 LOAD_GLOBAL              0 (range)
              8 <0>
              9 LOAD_CONST               1 (10)
             12 <0>
             13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             16 <0>
             17 GET_ITER
        >>   18 <0>

102          19 FOR_ITER                30 (to 52)
             22 <0>
             23 STORE_FAST               1 (i)
             26 <0>
             27 LOAD_FAST                0 (x)
             30 <0>

103          31 LOAD_FAST                1 (i)
             34 <0>
             35 COMPARE_OP               2 (==)
             38 <0>
             39 POP_JUMP_IF_FALSE       18
             42 <0>
             43 BREAK_LOOP
             44 <0>
             45 JUMP_ABSOLUTE           18
             48 <0>
             49 JUMP_ABSOLUTE           18
        >>   52 <0>
             53 POP_BLOCK
        >>   54 <0>
             55 LOAD_CONST               0 (None)
             58 <0>
             59 RETURN_VALUE

# Setup the simplest handler EVER
>>> def op_target(stack, frame):
...     print (stack)

# GO
>>> lol(2)
[]
[]
[<class 'range'>]
[10, <class 'range'>]
[range(0, 10)]
[<range_iterator object at 0x7f1349afab80>]
[0, <range_iterator object at 0x7f1349afab80>]
[<range_iterator object at 0x7f1349afab80>]
[2, <range_iterator object at 0x7f1349afab80>]
[0, 2, <range_iterator object at 0x7f1349afab80>]
[False, <range_iterator object at 0x7f1349afab80>]
[<range_iterator object at 0x7f1349afab80>]
[1, <range_iterator object at 0x7f1349afab80>]
[<range_iterator object at 0x7f1349afab80>]
[2, <range_iterator object at 0x7f1349afab80>]
[1, 2, <range_iterator object at 0x7f1349afab80>]
[False, <range_iterator object at 0x7f1349afab80>]
[<range_iterator object at 0x7f1349afab80>]
[2, <range_iterator object at 0x7f1349afab80>]
[<range_iterator object at 0x7f1349afab80>]
[2, <range_iterator object at 0x7f1349afab80>]
[2, 2, <range_iterator object at 0x7f1349afab80>]
[True, <range_iterator object at 0x7f1349afab80>]
[<range_iterator object at 0x7f1349afab80>]
[]
[None]

Wonderful! We now have a way to get the state of our stack and our frame at every Python instruction. The rendering of the results is not quite usable in the current state. Let's add some wrapper in the last section!

Adding some Python wrapping

As you can see, all of the low level interface works. Our last mission is to make our op_target useful. (This part might be a little empty: it's not the funniest part of this project in my eyes)

The first thing that we want to do is to exploit the information given by the frame parameter. If we look at the informations stored in a frame we can see this:

f_code: the code object being executed in this frame
f_lasti: gives the current instruction (this is an index into the bytecode string of the code object)

Now our handle is able to know which opcode will be executed just after our DEBUG_OP. This will be useful to aggregate the data and do some nice display.

We can create a class that will setup the tracing mechanism for a function:

Change its co_code
Setup a callback as the target function op_debug

As we know the next instruction, we can analyse it and modify its arguments. For example, we can add an auto-follow-called-functions feature:

def op_target(l, f, exc=None):
    if op_target.callback is not None:
        op_target.callback(l, f, exc)

class Trace:
    def __init__(self, func):
        self.func = func

    def call(self, *args, **kwargs):
        self.add_func_to_trace(self.func)
        # Activate Trace callback for the func call
        op_target.callback = self.callback
        try:
            res = self.func(*args, **kwargs)
        except Exception as e:
            res = e
        op_target.callback = None
        return res

    def add_func_to_trace(self, f):
        # Is it code? is it already transformed?
        if not hasattr(f ,"op_debug") and hasattr(f, "__code__"):
            f.__code__ = transform_code(f.__code__, transform=add_everywhere, add_stacksize=ADD_STACK)
            f.__globals__['op_target'] = op_target
            f.op_debug = True

    def do_auto_follow(self, stack, frame):
        # Nothing fancy: FrameAnalyser is just the wrapper that gives the next executed instruction
        next_instr = FrameAnalyser(frame).next_instr()
        if "CALL" in next_instr.opname:
            arg = next_instr.arg
            f_index = (arg & 0xff) + (2 * (arg >> 8))
            called_func = stack[f_index]

            # If call target is not traced yet: do it
            if not hasattr(called_func, "op_debug"):
                self.add_func_to_trace(called_func)

Now, all we have to do is to implement sub-classes with the method callback which will be called every instruction and the method do_report that will print the gathered information.

Here is an example of a dummy tracer that follows function calls:

class DummyTrace(Trace):
    def __init__(self, func):
        self.func = func
        self.data = collections.OrderedDict()
        self.last_frame = None
        self.known_frame = []
        self.report = []

    def callback(self, stack, frame, exc):
        if frame not in self.known_frame:
            self.known_frame.append(frame)
            self.report.append(" === Entering New Frame {0} ({1}) ===".format(frame.f_code.co_name, id(frame)))
            self.last_frame = frame
        if frame != self.last_frame:
            self.report.append(" === Returning to Frame {0} {1}===".format(frame.f_code.co_name, id(frame)))
            self.last_frame = frame

        self.report.append(str(stack))
        instr = FrameAnalyser(frame).next_instr()
        offset = str(instr.offset).rjust(8)
        opname = str(instr.opname).ljust(20)
        arg = str(instr.arg).ljust(10)
        self.report.append("{0}  {1} {2} {3}".format(offset, opname, arg, instr.argval))
        self.do_auto_follow(stack, frame)

    def do_report(self):
        print("\n".join(self.report))

Here are some examples of implementation and uses. The format may be hard to read, I am not good at user-friendly reporting...

Example 1: auto-tracing with dummy dump of stack and executed instructions:

Example 2: context manager at work:

And, at last, a demo of how list comprehensions work:

Example 3: output of dummy tracer

Example 4: output of Stack aggregation tracer

Conclusion

This little project was a good way to have an introduction to some low-level Python, the interpreter's main loop, Python C-coding and Python bytecode. Also, the resulting tool is a good way to have a quick look at the bytecode-behavior of some fun Python constructions like generators, context managers or list comprehensions.

Here is the complete code of this experimentation.

Another thing that could be done is adding a way to modify the stack of the function we are tracing. I am not sure what it could be used for, but that would be fun!

Permalink & comments

Hakril's blog

Understanding Python by breaking it - Lvl 2

Using ctypes.Structure

Dummy code execution

Instruction Pointer Control

Executing our code

Object hand-crafting

Writing raw objects in memory

Giving life to the crafted object

Reliable native code execution

Windows/Linux and 32/64 bits compatibility

Final result

Conclusion

Understanding Python by breaking it

Reference counting and the Garbage Collector

Special case : int and str

Breaking reference counter

Messing with tuple

Diving into function and code objects.

Code object

Function closure!

Conclusion

Understanding Python execution from inside: A Python assembly tracer

A new opcode for CPython

Our new opcode: DEBUG_OP

Implementing DEBUG_OP

Injecting opcode DEBUG_OP into Python bytecode

Reminder about code object

Our tools

Testing our new opcode

Rewriting bytecode

Adding some Python wrapping

Conclusion

Using `ctypes.Structure`

Special case : `int` and `str`

Our new opcode: `DEBUG_OP`

Implementing `DEBUG_OP`

Injecting opcode `DEBUG_OP` into Python bytecode