6  Metaclasses

With __getattr__ and __setattr__, then descriptors– we have seen how we can add custom behavior to specific attributes, now we’ll take the next step into metaprogramming entire classes.

What happens when Python sees a class statement?

If you’ve ever tried to reference a class by its own name within the body you’ve run into an error:

class MyClass:
    # NOT ALLOWED! MyClass isn't defined yet
    def func(a: MyClass) -> None:
       ...

This is because, on the def line, the class is still being defined.

(If you need to do what we’re showing in the example, you can use “MyClass” to create a forward-reference, or use typing.Self as the type.)

A class is created in steps:

First code inside the class body executes one line at a time, these are gathered into a “namespace”, essentially a dictionary:

class Icon(Widget):
    WIDTH = 32

    def __init__(self):
        ...

    def show(self):
        ...

Would create a namespace resembling:

namespace_dict = {
  "WIDTH": 32,
  "__init__": <function>,
  "show": <function>,
}

This dictionary is then passed into the type() function. A constructor that creates new types.

# `type` takes three arguments:
#   - name of the new type
#   - a tuple of base classes
#   - the namespace dictionary with all members
type("Icon", (Widget,), namespace_dict)

We can call this function ourselves to create dynamic types:

class Unit:
    def __init__(self, num):
        self.num = num

    def __str__(self):
        return f"{self.num}{self.symbol}"

UNITS = [
  ("Meter", "m"),
  ("Second", "s"),
  ("Watt", "W"),
]

TYPES = {}

# dynamically create subclasses for each unit
for name, symbol in UNITS:
    TYPES[name] = type(name, (Unit,), {"symbol": symbol})

# expose subclasses to local namespace
locals().update(**TYPES)

print(Meter(3))
print(Second(10))
3m
10s

Class Decorators

Class decorators are similar to function decorators, functions which take a class and return a new class.

Part of understanding how this works is recognizing that a class is comprised of the namespace we saw above. This is accessible on a class cls as cls.__dict__.

With this, we can add to, remove, or otherwise manipulate a class definition.

Recall that without a __repr__ a class prints an ugly version of itself that isn’t very useful:

class Vector3:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z


print(Vector3(1, 2, 3))
<__main__.Vector3 object at 0x7f8b98b2cad0>
# A class decorator, takes a class, returns a class
def autorepr(cls):
    def __repr__(self):
        attrs = ", ".join(
            f"{k}={v!r}" for k, v in self.__dict__.items()
        )
        return f"{cls.__name__}({attrs})"
    cls.__repr__ = __repr__
    return cls


@autorepr
class Vector2:
    def __init__(self, x, y):
        self.x = x
        self.y = y

# @ syntax is still doing the same thing it did with functions:
# Vector2 = autorepr(Vector2)


@autorepr
class Vector3:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z


print(Vector2(1, 2))       # Vector2(x=1, y=2)
print(Vector3(1, 2, 3))    # Vector3(x=1, y=2, z=3)
Vector2(x=1, y=2)
Vector3(x=1, y=2, z=3)

__init_subclass__

Python 3.6 added a powerful new dunder method that is invoked on the parent class when a subclass is instantiated.

It is common to want to register child classes with their parent, and this gives a way to do so automatically without an additional call or decorator:

class Serializer:
    _registry = {}

    # __init_subclass__ takes the argument cls, the subclass being created
    # as well as any number of optional kwargs. here we add format as
    # an argument
    def __init_subclass__(cls, format, **kwargs):
        super().__init_subclass__(**kwargs)
        Serializer._registry[format] = cls

    @classmethod
    def get(cls, format):
        if format not in cls._registry:
            raise ValueError(f"No serializer for {format!r}")
        return cls._registry[format]()


# when we subclass Serializer, __init_subclass__ will be called.
# our optional additional parameter (format) is passed here
class JSONSerializer(Serializer, format="json"):
    def dumps(self, data): ...


class CSVSerializer(Serializer, format="csv"):
    def dumps(self, data): ...


# at this point, Serializer._registry contains two entries
# from two calls to __init_subclass__ for each of the above classes
print(f"{Serializer._registry=}")

s = Serializer.get("json")   # returns a JSONSerializer instance
print(f'{Serializer.get("json")=}')
Serializer._registry={'json': <class '__main__.JSONSerializer'>, 'csv': <class '__main__.CSVSerializer'>}
Serializer.get("json")=<__main__.JSONSerializer object at 0x7f8b4c8e9810>

We can of course inspect & modify the cls reference here as well, just like we did in a decorator. This gives us the ability to have all subclasses of a given type have behavior enforced.

Limitations

The biggest limitation with __init_subclass__ is that the method is called after the namespace is created & passed to type(). The received cls is the fully-realized type already.

Sometimes we want to intercept the collected namespace and modify it before it is handed off to the type() constructor– which finally brings us to metaclasses.

Aside: type vs. object

This can be hard to reason about, so let’s review what type and object are:

object

Everything in Python is a subclass of object, this is a base class that provides common functionality (memory management, that ugly default repr, etc.)

When we create a new class, it is a subclass of `object:

# same as 
class MyClass:
    pass

# same as 
class MyClass(object):
    pass

An instance of MyClass is an instance of object, since isinstance consideres all subclasses to also be of their parent types:

class MyClass:
    pass
myobj = MyClass()

# both True!
print(isinstance(myobj, MyClass))
print(isinstance(myobj, object))
True
True

type on the other hand is the type of the class itself, not an instance of the class:

# not a type!
isinstance(myobj, type)
False
# is a type!
isinstance(MyClass, type)
True

Where this can be somewhat confusing is that MyClass is also an objecteverything in Python is.

As we saw above, the type() function is a constructor that makes an instance of the class.

Metaclasses

Class creation goes through three phases:

  • Prepare: create the namespace the body will execute in.
  • Execute: run the body, storing names into that namespace.
  • Build: call type(name, bases, namespace) to produce the class object

This process expanded out for class Icon(Widget) looks like:

namespace = type.__prepare__("Icon", (Widget,))   # prepare
exec(body, namespace)                           # execute in namespace
Foo = type.__new__(type, "Foo", (Base,), namespace)   # call type constructor

A metaclass replaces type in these operations as the underlying class which will have a __prepare__ and __new__ that can be called to create the new type.

# simple metaclass that just extends `type`'s existing implementation
class MyMeta(type):
    def __new__(mcs, name, bases, namespace):
        print(f"Building class {name} with attrs: {list(namespace)}")
        return super().__new__(mcs, name, bases, namespace)

class Foo(metaclass=MyMeta):
    x = 1
    y = 2

mcs vs. cls vs. self

The first parameter is conventionally named to help you understand the type:

  • mcs for __new__, it is the user-defined metaclass
  • cls for @classmethods as it will be the user-defined class
  • self for instances of classes

__new__

__new__ is a metaconstructor, a function that creates new types.

Typically the implementation of __new__ would be to modify the name, base classes, and/or namespace (typically the latter)– then pass them along to super().__new__, our parent type.

__prepare__(name, bases, **kwargs)

If defined, __prepare__ runs even earlier, it returns the namespace object used in the rest of the class definition.

Note that it does not receive a cls or self argument, this is because the type does not exist yet!

__prepare__ allows you to inject variables into the namespace that can then be used in the class definition. Here we add a field() keyword only present within classes:

from dataclasses import dataclass
from pprint import pprint

@dataclass
class _Field:
    kind: type
    default: object = None
    required: bool = False


class SchemaMeta(type):
    def __prepare__(name, bases, **kwargs):
        return {"field": _Field}   # inject field as a name in the class body

    def __new__(mcs, name, bases, namespace, **kwargs):
        fields = {k: v for k, v in namespace.items() if isinstance(v, _Field)}
        cls = super().__new__(mcs, name, bases, dict(namespace))
        cls._fields = fields
        return cls


# common to set metaclass on a base class & use inherited classes
class Schema(metaclass=SchemaMeta):
    pass


class UserSchema(Schema):
    # where is this `field` function?
    # is coming from the __prepare__d dict
    # and only in scope within the class body
    name  = field(str, required=True)
    email = field(str, required=True)
    age   = field(int, default=0)
    bio   = field(str, default="")


pprint(UserSchema._fields)
{'age': _Field(kind=<class 'int'>, default=0, required=False),
 'bio': _Field(kind=<class 'str'>, default='', required=False),
 'email': _Field(kind=<class 'str'>, default=None, required=True),
 'name': _Field(kind=<class 'str'>, default=None, required=True)}

Metaclass Example

class StrictTaskMeta(type):
    required = {"run", "email"}
    run_return_type = str

    def __new__(mcs, clsname, bases, namespace):
        # if there are base classes don't use their run/name keys
        # to satisfy the constraint
        if bases:
            missing = mcs.required - namespace.keys()
            if missing:
                raise TypeError(f"{clsname} must define: {missing}")
        if "run" in namespace:
            namespace["run"].__annotations__["return"] == mcs.run_return_type
        return super().__new__(mcs, clsname, bases, namespace)

class Task(metaclass=StrictTaskMeta):
    pass

class WorkingTask(Task):
    email = "admin@example.com"
    def run(self) -> str:
        return "ok"
try:
    class NoEmailTask(Task):
        def run(self) -> str:
            return "ok"
except Exception as e:
    print(type(e), e)
<class 'TypeError'> NoEmailTask must define: {'email'}
try:
    class NoAnnotation(Task):
        email = "hi@example.com"
        def run(self):
            return "ok"
except Exception as e:
    print(type(e), e)
<class 'KeyError'> 'return'

How to Decide

We’ve now seen lots of ways to work with dynamic classes and functions, how to choose?

As a rule: Metaprogramming techniques should be used sparingly, pick the simplest thing that can work. For classes this means:

  • __getattr__ and __setattr__ if the desire is to modify missing attribute behavior.
  • Descriptors if the logic is field-based and lends itself to reuse (validators, ORMs)
  • Class Decorators if the need is to modify multiple classes in the same way, with modifications that can be done after creation.
  • __init_subclass__ for making changes to all subclasses of a class (simpler than metaclasses for most use cases)
  • Metaclasses when you need to modify the class before instantiation, or when you need control of the namespace with __prepare__
  • type() or types.new_class() to generate entirely new classes from data declarations.