def http_request() -> tuple[int, str]:
...
return status, response_text7 Evolution of Data Types
We often want to pass around multiple related values as arguments to functions and return values.
dict and tuple
When you first learn Python you encounter two common solutions to this problem:
This implicit tuple return is OK for two or maybe three values, but any more than that risks hurting readability & introducing hard-to-spot bugs:
def http_request() -> tuple[int, str]:
...
return status, response_text, headers # third param, requires updating
# now this is broken!
status, response_text = http_request(...)It’s also easy to mix up ordering:
# oops!
response_text, status = http_request(...)One solution to this is to use a dict with named fields.
def http_request(...) -> dict:
return {"content": content, "status": status, "headers": headers}This removes the risk of order-based bugs, and makes it easier to add fields.
Also, remember that accidental mutation is an incredibly common source of bugs in Python.
def mutable_default(k, v, d={}):
d[k] = v
return d
mutable_default("a", 1)
mutable_default("b", 2){'a': 1, 'b': 2}
It is also easy to accidentally return dictionaries with different keys in different branches, requiring careful use of dict.get or similar.
def http_request(...) -> dict:
if error:
return {"status_code": 404}
else:
return {"status_code": 200, "body": "...", "headers": {...}}
response = http_request()
# now using response["body"] requires careful checksnamedtuple
collections.namedtuple was introduced in Python 2.6, and has been superceded by typing.NamedTuple which uses type annotation syntax to define a new type which is a specialized tuple where elements have names as well as numeric positions.
from typing import NamedTuple
class Response(NamedTuple):
status: int
text: str
headers: dict[str, str]These tuple types can be constructed with positional or named args:
resp1 = Response(200, "<html>...", {"content-type": "text/html"})
resp2 = Response(status=404, text="{}", headers={"content-type": "application/json"})They can also be accessed as attributes or by index:
print(resp1.status, resp1.text, resp1.headers)
print(resp2[0], resp2[1], resp2[2])200 <html>... {'content-type': 'text/html'}
404 {} {'content-type': 'application/json'}
They are immutable, which works well for parameters, but sometimes one needs a mutable alternative.
classes
An obvious alternative to using a tuple or dict for data that is frequently being passed around together is to write a class.
This comes with some overhead, but also has the advantage of offering the opportunity for custom behavior.
An application managing complex state with dozens of variables is almost always going to settle on one or more classes, but it is often unclear when it is appropriate to introduce a class as opposed to a dict or tuple.
One potential downside is the dynamic nature of classes. The ability to add attributes can lead to bugs, and that isn’t necessary if we know exactly what fields our class is going to have.
__slots__
For these data bundles, one option is to define the __slots__ attribute on a class:
class Point2:
def __init__(self, x, y):
self.x = x
self.y = y
class Point2S:
# with slots
__slots__ = ("x", "y")
def __init__(self, x, y):
self.x = x
self.y = y
ptA = Point2(0, 0)
ptA.z = 0 # valid, but probably an error!
try:
ptB = Point2S(5, 5)
ptB.z = 0
except Exception as e:
print('AttributeError', e)AttributeError 'Point2S' object has no attribute 'z' and no __dict__ for setting new attributes
@dataclass
As type-checking became standardized, the attrs library introduced a new way to think about creating data container classes. This heavily influenced the design of dataclassses, added in Python 3.7.
In their simplest form, they resemble how we declared our NamedTuple before:
from dataclasses import dataclass
@dataclass
class Point2D:
x: float
y: float
ptD = Point2D(1, 2)
print(ptD)
ptD.z = 0Point2D(x=1, y=2)
By default, the dataclass decorator adds an __init__, __eq__, and __repr__, perfect for a dataclass.
It is also possible to customize the created class, the full signature of the decorator:
@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)¶init- generate constructorrepr- generate repr methodeq- generate__eq__method using==on all attributesorder- generate ordering methods (<,>,<=,>=)unsafe_hash- generate hash method even if unsafe (will be generated by default ifeqandfrozenare true)frozen- make instances immutablematch_args- generate__match_args__, a dunder method used for customizingmatchbehaviorkw_only- make constructor parameters keyword onlyslots- generate__slots__, ensuring no additional attributes are addedweakref_slots- add__weakref__to slots (see documentation)
https://docs.python.org/3/library/dataclasses.html
Depsite the name, dataclass allows creating ordinary classes by default, or using frozen and/or slots, classes more suited for data packaging.
__post_init__
Dataclasses can also define a “secondary constructor” that will be called by their generated __init__. The generated __init__ will call self.__post_init__().
Field Options
Fields on a dataclass are typically just type annotations on class variables. It is also possible to assign a default that will be used in the generated __init__.
Sometimes it is desirable to control more about the field, in which case you’d assign it to a field:
dataclasses.field(*, default=MISSING, default_factory=MISSING, init=True, repr=True, hash=None, compare=True, metadata=None, kw_only=MISSING, doc=None)¶from dataclasses import dataclass, field
@dataclass
class User:
name: str
age: int = -1
aliases: list[str] = field(default_factory=list, repr=False)
f = User("Finn")
j = User("Jake", 30)
r = User("Robert", 30, aliases=["Bob"])
print(f)
print(j)
print(r)User(name='Finn', age=-1)
User(name='Jake', age=30)
User(name='Robert', age=30)
Dataclasses: Under the Hood
Dataclasses are implemented using the existing metaprogramming machinery we’ve already seen.
Take a look at the implementation: https://github.com/python/cpython/blob/main/Lib/dataclasses.py
Pydantic
dataclasses do not validate input, this is in line with how Python typically handles types and type hints.
That said it isn’t uncommon to want validation, especially for data shared over a network. APIs, databases, and data pipelines can all benefit from data validation.
pydantic is a popular library which uses dataclass-like syntax to enable validation:
from datetime import datetime
from pydantic import BaseModel, PositiveInt
class User(BaseModel):
id: int
name: str = 'John Doe'
signup_ts: datetime | None = None
try:
u = User(id="abc", name=123) # oops transposed arguments!
except Exception as e:
print(e)2 validation errors for User
id
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='abc', input_type=str]
For further information visit https://errors.pydantic.dev/2.10/v/int_parsing
name
Input should be a valid string [type=string_type, input_value=123, input_type=int]
For further information visit https://errors.pydantic.dev/2.10/v/string_type
How to Choose
aside: performance test
$ uv run perftest.py
Type Time (ms) vs class
--------------------------------------
dict 1761.1 (0.93x)
namedtuple 3277.1 (1.73x)
class 1889.8 ◄ baseline
class+slots 2170.6 (1.15x)
dataclass 1855.1 (0.98x)Note: These differences are minor, it took 10 million runs to see notable/consistent difference.
There should be one– and preferably only one– obvious way to do it. Although that way may not be obvious at first unless you’re Dutch.
- Do you need validation? Pydantic
- Do you want your type to have methods? dataclasses
- Immutable? NamedTuple or dataclass(freeze=True)
- Full set of fields not known? dict
- Complex constructors not based on attributes? class