5  Metaprogramming

What is Metaprogramming?

Metaprogramming is writing code that generates (or modifies) existing code.

This allows us to work a higher level of abstraction, often using (abusing) Python’s syntax.

Metaprogrammed code can quickly get very hard to read– defying assumptions and guarantees.

exec / eval

Most interpreted languages provide a way to dynamically execute strings as code:

In Python we have two built-in functions for this:

exec(source: str, /, globals: dict=None, locals: dict=None, *, closure: dict=None)

Executes Python code in source with a given context.

eval(source: str, /, globals: dict=None, locals: dict=None)

Executes a single Python expression in a given context and returns the value.

# our own Python REPL: read-exec-print-loop
while True:
    stmt = input(">>> ")
    result = exec(stmt)
    print(result)

Problems

Debugging/tracebacks are very difficult.

Tool support: Editor/IDE cannot evaluate code without risk of side effects, things done with exec are effectively invisible to autocomplete/type checker/etc.

Most importantly: security!

Major Security Vulnerability!

Console program that plots various functions using eval

import math
import sys

WIDTH = 80
HEIGHT = 24
BLOCK = "█"


def plot(fn, x_min=-2 * math.pi, x_max=2 * math.pi):
    # This is where the "magic" happens-- the function
    # provided on the command line is evaluated once for
    # each step of X
    y_vals = [
        eval(fn)
        for x in [x_min + (x_max - x_min) * i / (WIDTH - 1) for i in range(WIDTH)]
    ]
    y_min, y_max = min(y_vals), max(y_vals)

    grid = [[" "] * WIDTH for _ in range(HEIGHT)]

    for col, y in enumerate(y_vals):
        row = round((y_max - y) / (y_max - y_min) * (HEIGHT - 1))
        grid[row][col] = BLOCK

    # axes
    zero_row = round(y_max / (y_max - y_min) * (HEIGHT - 1))
    zero_col = round(-x_min / (x_max - x_min) * (WIDTH - 1))
    if 0 <= zero_row < HEIGHT:
        grid[zero_row] = [
            "─" if c == " " else BLOCK if c == BLOCK else c for c in grid[zero_row]
        ]
    if 0 <= zero_col < WIDTH:
        for r in range(HEIGHT):
            if grid[r][zero_col] == " ":
                grid[r][zero_col] = "│"

    for row in grid:
        print("".join(row))


if __name__ == "__main__":
    plot(sys.argv[1])
  • uv run metaprogramming/calc01.py sin(x)
  • uv run metaprogramming/calc01.py sin(2*x)
  • uv run metaprogramming/calc01.py print(__import__('os').listdir())

There are ways to control the environment that exec runs in, but these are not true security mechanisms. If you are going to execute untrusted code, you need operating system level security. Sandboxing via a container or similar, where escaping the Python sandbox doesn’t give access to the full system.

Where they are used:

  • Custom Python REPL
  • Spreadsheet-like formula evaluation
  • Code-generation in highly-controlled environments (dataclasses)
    • more performant than other dynamic options, code is compiled once
  • Template engines

Alternatives

Modifying functions: function decorators

Dynamic execution: getattr/setattr, descriptors

Modifying classes: class decorators / metaclasses

Decorator Review

A decorator is a function that takes a function as an argument, and returns another function.

from typing import Callable

def call_twice(fn: Callable) -> Callable:
    def newfn(*args, **kwargs):
        fn(*args, **kwargs)
        fn(*args, **kwargs)
    return newfn


@call_twice
def myfunc():
    print("hello")

myfunc()
hello
hello

If we want to have options for our decorator, we need a third function, that returns the decorator itself:

from typing import Callable

# call_n now returns a decorator call_dec that calls N times
def call_n(n: int) -> Callable:
    def call_dec(fn: Callable) -> Callable:
        def newfn(*args, **kwargs):
            for _ in range(n):
                fn(*args, **kwargs)
        return newfn
    return call_dec


@call_n(4)
def myfunc2():
    print("four times now")

myfunc2()
four times now
four times now
four times now
four times now

Register this function with a central resource

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/users", methods=["GET"])
def get_users():
    return jsonify(["alice", "bob"])

@app.route("/users/<int:user_id>", methods=["GET"])
def get_user(user_id):
    return jsonify({"id": user_id, "name": "alice"})

@app.route("/users", methods=["POST"])
def create_user():
    data = request.json
    return jsonify(data), 201

Simplified picture of Flask’s internals:

class Flask:
    def __init__(self):
        self.url_map = {}

    def route(self, path, methods=("GET",)):
        def decorator(fn):
            for method in methods:
                self.url_map[(path, method)] = fn
            return fn   # function returned unchanged!
        return decorator

Alter when/how this function is called

import functools
import time
import warnings

We can decide we don’t actually want to call the function, such as in an authentication or caching decorator:

def cache(fn):
    store = {}
    @functools.wraps(fn)
    def wrapper(*args):
        if args not in store:
            store[args] = fn(*args)
        return store[args]
    return wrapper

@cache
def fib(n):
    if n < 2:
        return n
    return fib(n - 1) + fib(n - 2)

We can call the function more than once:

def retry(times=3, delay=1.0, exceptions=(Exception,)):
    def decorator(fn):
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            for attempt in range(times):
                try:
                    return fn(*args, **kwargs)
                except exceptions as e:
                    if attempt == times - 1:
                        raise
                    time.sleep(delay)
        return wrapper
    return decorator

@retry(times=3, delay=0.5, exceptions=(ConnectionError,))
def fetch(url):
    ...

Here we wrap a function with a simple helper to annotate deprecation:

def deprecated(replacement=None):
    def decorator(fn):
        msg = f"{fn.__name__} is deprecated."
        if replacement:
            msg += f" Use {replacement} instead."
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            warnings.warn(msg, DeprecationWarning, stacklevel=2)
            return fn(*args, **kwargs)
        return wrapper
    return decorator

@deprecated(replacement="fetch_user_v2")
def fetch_user(user_id):
    ...

Dynamic Attribute Access

One of the most common use cases for metaprogramming is dynamic access to attributes.

Imagine code like this:

df = load_dataframe()

operation = input("Perform operation> ")
if operation == "mean":
    return df.mean()
elif operation == "sum":
    return df.sum()
elif operation == "std":
    return df.std()
elif operation == "median":
    return df.median()

We can shortent this with eval:

df = load_dataframe()

operation = input("Perform operation> ")
if operation in ("mean", "sum", "std", "median"):
    return eval("df.{operation}()")

This of course introduces a security risk, and requires us maintaining an allow list.

Instead, we could use hasattr and getattr to dynamically check the attributes:

df = load_dataframe()

operation = input("Perform operation> ")
if hasattr(df, operation):
    return getattr(df, operation)() # trailing () are calling the returned function
else:
    raise AttributeError(f"no such operation: {operation}!")
  • hasattr(obj, name: str) -> bool - check if an attribute exists on an object (anything: class, module, etc.)
  • getattr(obj, name: str, default=None) - get an attribute, optionally returning a default if it doesn’t exist
  • setattr(obj, name, value) - set an attribute on an object
  • delattr(obj, name) - delete an attribute from an object (rarely used)

__getattr__, __setattr__ - Dynamic lookup response

What if if we want our classes to respond to different attributes?

When an attribute is accessed via ., it is done so through a protocol consisting of three dunder methods:

__getattr__(self, attr_name) - called when a lookup fails to find attr_name, gives a chance to return a dynamic value for that attribute. Should raise AttributeError if no value is going to be returned.

WARNING: There is also a __getattribute__ but it is called on every lookup and difficult to implement without accidentally triggering infinite recursion.

__setattr__(self, attr_name, value) - called when there is an assignment to any attribute.

__delattr__(self, attr_name) - called when del obj.attr_name is called on any attribute. (rarely used)

Example: vector swizzle

A common operation in 3D graphics APIs is the vector swizzle, it can be helpful to get the elements of a vector in different orders:

v = Vector(1, 2, 3)
v.xyz == (1, 2, 3)
v.zyx == (3, 2, 1)
v.yzx == (2, 3, 1)
# and so on..

We could define every possible permutation, but this is also a place where we could use __getattr__ to define a single function that handles all of these cases.

class Vec3:
    def __init__(self, x, y, z):
        self.x, self.y, self.z = x, y, z

    def __getattr__(self, name):
        # only called when a name lookup fails
        # turns xyz into (self.x, self.y, self.z)
        if not all(c in "xyz" for c in name):
            raise AttributeError(name)
        return tuple(getattr(self, c) for c in name)

    def __repr__(self):
        return f"Vec3({self.x}, {self.y}, {self.z})"
v = Vec3(1, 2, 3)
v.xyz
(1, 2, 3)
v.zzz
(3, 3, 3)
v.xyz
(1, 2, 3)

Example: Mock

class SpyMock:
    """Records every attribute access and call."""
 
    def __init__(self):
        object.__setattr__(self, "_log", [])
 
    def __getattr__(self, name):
        log = object.__getattribute__(self, "_log")
 
        def method(*args, **kwargs):
            log.append({"attr": name, "args": args, "kwargs": kwargs})
 
        return method
 
    @property
    def calls(self):
        return object.__getattribute__(self, "_log")
 
 
spy = SpyMock()
spy.send_email("alice@example.com", subject="Hi")
spy.save(record={"id": 1})
spy.send_email("bob@example.com", subject="Bye")
 
for entry in spy.calls:
    print(entry)
{'attr': 'send_email', 'args': ('alice@example.com',), 'kwargs': {'subject': 'Hi'}}
{'attr': 'save', 'args': (), 'kwargs': {'record': {'id': 1}}}
{'attr': 'send_email', 'args': ('bob@example.com',), 'kwargs': {'subject': 'Bye'}}

importlib + getattr

A common pattern is to dynamically import a class or function based on user input.

Web frameworks like Django will use this pattern to load plugins:

MIDDLEWARE = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.click-jacking.XFrameOptionsMiddleware',
]

Each of these dotted paths represents a module ending in a class (or function) name:

def import_string(dotted_path):
    # e.g. django.middleware.security & SecurityMiddleware
    module_path, class_name = dotted_path.rsplit(".", 1)
    # import django.middleware.security
    module = importlib.import_module(module_path)
    # access SecurityMiddleware
    return getattr(module, class_name)

Note that getattr works here on a module the same way it did on a class! (Any dotted lookup)

Descriptors

Property Review

property(fget=None, fset=None, fdel=None, doc=None)

  • fget is a function to get value of the attribute
  • fset is a function to set value of the attribute
  • fdel is a function to delete the attribute
  • doc is a docstring for the attribute
class Person:
    
    def __init__(self, name, age):
        self.name = name  #  Assume it has getter/setters 
        self.age = age

    def _get_age(self):
        print("inside get age")
        return self.__age

    def _set_age(self, age):
        if age < 0:
            raise ValueError("Person can't have a negative age!")
        self.__age = age
        
    def __repr__(self):
        return f"Person({self.__name!r}, {self.__age})"
        
    age = property(_get_age, _set_age, doc="age of the person")
p = Person("Wayne", 30)
p.age # will call _get_age

try:
    p.age = -1 # will call _set_age
except Exception as e:
    print(repr(e))

print(p.age)
inside get age
ValueError("Person can't have a negative age!")
inside get age
30

@property

We can also use property as a decorator.

The usage looks a bit strange since we need to decorate multiple functions:

  • Place the @property directly above the function header of the getter function.
  • Place the code @name_of_property.setter above the function header of the setter function. You need to replace the name_of_property with the actual name of the property.
  • The function names for both the setter/getter need to match.
class Person:
    def __init__(self, name, age):
        self.__name = name  #  Assume it has getter/setters 
        # invokes setter
        self.age = age #self.set_age(age)
        self.birth_date = ...

    @property
    def age(self):
        """ returns the age property """
        print('getter called')
        return self.__age
    # same as 
    #age = property(age)
    
    @age.setter
    def age(self, age):
        print('setter called')
        if age < 0:
            raise ValueError("Person can't have a negative age!")
        self.__age = age
        
    def __repr__(self):
        return f"Person({self.__name!r}, {self.__age})"

The existence of properties allows us to start all attributes out as public ones, and convert to properties as needed. The user of the class does not need to know that a change was made, preserving encapsulation without forcing us into calling setter/getters.

Descriptor Syntax

Attributes on classes follow the descriptor protocol, which consists of four dunder methods.

These methods are typically implemented on their own class, which will then be used by other classes. For example:

class User:
    age = ValidatedInt(min=13, max=100)
    username = ValidatedStr(min_len=2, max_len=10, regex=f"[a-z]+")

User is a normal class, but it relies upon two descriptor classes that will work similar to property, adding an attribute with accessor functions.

The difference between this and property is that by using a class for our descriptors, we can reuse behavior.

Before we implement our descriptors lets look at what dunder methods a descriptor class can have:

__set_name__(self, owner, name)

  • self is the instance of our descriptor class.
  • owner is the name of the class that is getting an instance of our descriptor.
  • name is the name of the attribute on our class.

This function is called at class definition time. It allows our field to know the name assigned to it.

In the above example these would be invoked once, when User is declared:

  • ValidatedInt.__set_name__(self, User, "age")
  • ValidatedStr.__set_name__(self, User, "username")

Let’s look at how ValidatedInt would be declared to see the remaining methods:

class ValidatedInt:
    def __init__(self, min=None, max=None):
        self.min = min
        self.max = max

    def __set_name__(self, owner, name):
        # set after constructor is called
        self.name = name

    def __get__(self, obj, objtype=None):
        # optional special case: when accessing Class.descriptor as opposed to
        # instance.descriptor
        if obj is None:
            return self
        # store actual data on `obj.__dict__` using name
        return obj.__dict__.get(self.name)

    def __set__(self, obj, value):
        # validate inputs on set
        if not isinstance(value, int):
            raise TypeError(f"{self.name} must be an int, got {type(value).__name__}")
        if self.min is not None and value < self.min:
            raise ValueError(f"{self.name} must be >= {self.min}")
        if self.max is not None and value > self.max:
            raise ValueError(f"{self.name} must be <= {self.max}")
        obj.__dict__[self.name] = value

def __get__(self, obj, objtype=None)

This method is called on the descriptor class whenever the attribute is accessed (getattr), for example:

user = User(...)
print(user.age) # calls `ValidatedInt.__get__`

def __set__(self, obj, value)

This method, unsurprisingly, is called when setattr is invoked on the descriptor.

user = User(...)
user.age = 12 # will raise an error in `ValidatedInt.__set__`

def __delete__(self, obj) like the other del methods, rarely needed, but called when the attribute is deleted.

In the next section, we’ll see even more powerful metaprogramming tools for classes.