4 Testing in Practice
If you are newer to Python testing, you may want to start here: Testing in Python. This introduces some fundamental ideas behind testing as well as pytest specifics.
Why Testing is Hard
Typically most of the tests written for a piece of software are unit tests. Unit tests target the smallest “unit of behavior”. A single function can have multiple behaviors, for example:
def find_user(db: Connection, name: str) -> User | None:
try
row = db.users.find(name=name)
return User(row[0], row[1], row[2])
except NotFound:
return NoneThis function would need at least two tests:
test_find_user- the successful casetest_find_user_missing- the unsuccessful case
It is important for unit tests to be independent, repeatable, and readable.
In practice, following these rules is challenging:
- What constitutes a behavior is not always clear cut, often functions perform multiple related behaviors.
- If a function contains lots of edge cases, it can require dozens of tests.
- Called functions often take complex data structures, sometimes overshadowing the actual test code.
- In applications, programs often modify internal state: files, databases, network, etc.. This requires setup (and teardown) and risks making tests fragile.
The difficulty of each of these is compounded by code that doesn’t consider how it will be tested as it is written. This is one of the key insights/motivations of TDD.
Let’s start by looking at some functions and considering how we’d test them:
Example 1: Random Avatar
def assign_random_avatar(user: User) -> None:
""" generate a unique random avatar based on username and assign to user """Tests: random avatar assigned (varies on username)
Challenges: randomness, no return value
Example 2: Login
def login(username: str, password: str) -> User:
""" Return User if success or raises LoginError """- What behaviors should exist on this function?
- What tests would we write?
- What challenges will we encounter with test simplicity and independence?
Tests: success, bad_username, bad_password
Challenges: Database setup w/ valid user. (mock or fixture)
Example 3: Messaging
def send_message(from_user: str, to_user: str, message: str):
"""
Uses messaging API service to add message to to_user's inbox from from_user.
If the to_user has notifications on, they should receive a text message.
If either user does not exist an error is raised.
- Messages must be <=200 characters.
- Only one message can be sent each second.
"""Tests: success, bad user, bad message, rate limited, …
Challenges: Database setup again, testing message was sent (mocking!)
This is heading in the direction of an integration test, not a unit test. Do these interrelated systems work properly together? It will be useful to add unit tests of smaller components as well as this integration test.
Example 4: Cache
class FilesystemCache(Cache):
def get_with_cache(url: str) -> Response:
"""
If url has been seen, return old response, otherwise
make a new request (using `self._request`) and add to cache.
"""Tests: cache hit, cache miss
Challenges: filesystem (mock/helpers), relies on _request (mock)
Writing Testable Code
As we saw above, the signature of a function (and its dependencies) can have a major impact on our ability to test the code.
Avoid Unnecessary State
- never mutate arguments
- prefer return values
- “do one thing”
Proper Decomposition
In the example above testing send_message is nearly impossible if it does not rely on a send_sms function of some kind.
With send_sms, we have a function we can mock
if we had a send_sms function we could test that independently somehow & then mock it within this function.
Dependency Injection
- Another approach is to use dependency injection where these functions (or their containing classes):
class HTTPClient(Protocol):
"""Typed Protocol for a simple HTTP client."""
def get(self, url: str, **kwargs: Any) -> "HTTPResponse": ...
def post(self, url: str, body: Any = None, **kwargs: Any) -> "HTTPResponse": ...
class FilesystemCache(Cache):
def __init__(self, http_client: HTTPClient)FilesystemCache no longer provides its own _request, but instead dispatches to http_client.get. This makes testing easier as we can pass in a mocked HTTPClient that performs as we prefer for a given test.
This also makes our code more robust– we have decoupled behavior of caching & making the request. We could now easiliy swap out requests for httpx or whatever our preferred library is.
Fixtures / Parametrize
Often testing a single behavior is best done through a handful of tests:
def test_lang_add():
assert parse_and_execute("3 + 4") == 7
assert parse_and_execute("3 + -4") == -1
assert parse_and_execute("0 + 0") == 0
assert parse_and_execute("3 + 4") == 7Can be replaced with parametrized tests:
@pytest.mark.parametrize("expr, expected", [
("3 + 4", 7),
("3 + -4", -1),
("0 + 0", 0),
])
def test_lang_add(expr, expected):
assert parse_and_execute(expr) == expectedWhen the test inputs require some setup, a fixtures are helpful:
@pytest.fixture
def db():
conn = connect("sqlite:///:memory:")
conn.execute("CREATE TABLE users ...")
yield conn
conn.close()
def test_insert(db):
db.execute("INSERT INTO users ...")
assert db.execute("SELECT count(*) FROM users").scalar() == 1
def test_empty(db):
assert db.execute("SELECT count(*) FROM users").scalar() == 0When to use parametrize
Lots of similar inputs with slight variations, a very short (often one-line) actual test.
When not to use fixtures
When it requires a convoluted test harness that makes the test harder, not easier to read.
Mocking
Dependency injection is not always feasible or desired.
Let’s imagine we aren’t in a place to refactor this function:
def send_message(from_user: str, to_user: str, message: str):
# ...
from_number = get_user_number(from_user)
to_number = get_user_number(to_user)
if message_is_valid(message):
send_sms(from_number, to_number, message)
# ...How would you test it without sending actual text messages?
from unittest.mock import patch
def test_send_message():
with patch("yourmodule.send_sms") as mock_sms:
send_message("alice", "bob", "hello")
mock_sms.assert_called_once()This uses a context manager (with) to replace the function yourmodule.send_sms with a Mock.
This is an object that can be treated as a function (Callable) or object. It uses metaprogramming techniques (our next topic) to work with any arguments/methods you attempt to call it with– unless otherwise specified these calls do nothing and return None.
The final line of the test: mock_sms.assert_called_once() is a method on the Mock that asserts that a function was called.
Some additional methods of Mock:
assert_called()assert_called_once()assert_called_with(*args, **kwargs)- last call onlyassert_called_once_with(*args, **kwargs)assert_any_call(*args, **kwargs)- has been calledassert_not_called()
Providing fake data
In practice, we often want a method to return certain values, such as when we’re trying to avoid a database call:
from unittest.mock import patch, call
def test_send_message():
with (patch("mymodule.get_user_number", side_effect=["555-1111", "555-2222"]),
patch("mymodule.send_sms") as mock_sms):
send_message("alice", "bob", "hello")
mock_sms.assert_called_once_with("555-1111", "555-2222", "hello")from unittest to pytest
If you’ve learned Python in the past decade, likely using pytest– you may be surprised to know it ships with a testing library built-in.
unittest
unittest has been part of Python since Python 2, but has largely fallen out of favor. It is based on the a pattern described in Kent Beck’s Simple Smalltalk Testing: With Patterns and popularized in Java.
Let’s look at the first example from the unittest docs:
import unittest
class TestStringMethods(unittest.TestCase):
def test_upper(self):
self.assertEqual('foo'.upper(), 'FOO')
def test_isupper(self):
self.assertTrue('FOO'.isupper())
self.assertFalse('Foo'.isupper())
def test_split(self):
s = 'hello world'
self.assertEqual(s.split(), ['hello', 'world'])
# check that s.split fails when the separator is not a string
with self.assertRaises(TypeError):
s.split(2)
if __name__ == '__main__':
unittest.main()Which in pytest we’d typically write:
def test_upper(self):
assert 'foo'.upper() == "FOO"
def test_isupper(self):
assert 'FOO'.isupper() is True
assert 'Foo'.isupper() is False
def test_split(self):
s = 'hello world'
assert s.split() == ['hello', 'world']
# check that s.split fails when the separator is not a string
with pytest.raises(TypeError):
s.split(2)The latter is more Pythonic because in Python we avoid creating unnecessary classes where modules will do, and there’s nothing about most tests that requires a class.
Also, instead of custom assert methods we can mostly use the built-in assert and Python’s powerful introspection can give helpful error messages. Of course, pytest does come with built-in helpers for comparison of floating points, assertion checks, and other things hard to do with a bare assert.
Even when some setup/common methods are required, we can put those in the module, or use other pytest features to do this in a more flexible way than classes allow.
With that in mind, we can think of pytest as one of our core tools as Python developers, and one of the theories of this course is that it is important to know your core tools well. What else does pytest have to offer beyond a test runner with improved test semantics?
Testing Antipatterns
Testing Other People’s Code
from PIL import Image
def make_thumbnail(path: str, size: tuple) -> Image.Image:
img = Image.open(path)
img.thumbnail(size)
return img# This is a bad test-- it is actually testing Pillow
def test_make_thumbnail():
result = make_thumbnail("photo.jpg", (128, 128))
assert result.size == (128, 128)The example above is in fact simple enough that we may not need a test, but if we did need to test code that relies upon a library we can mock it:
from unittest.mock import patch, MagicMock
def test_make_thumbnail():
mock_img = MagicMock()
with patch("mymodule.Image.open", return_value=mock_img):
result = make_thumbnail("photo.jpg", (128, 128))
# testing that we called the method properly, not what it returns
mock_img.thumbnail.assert_called_once_with((128, 128))
assert result == mock_imgTests That Can’t Fail
When fixing a bug the typical advice is to write a failing test, then fix the bug ensuring the test passes.
The step of seeing the test actually fail is important to ensure you are testing (and fixing) the right thing!
Focusing on coverage above anything else
coverage, and the associated pytest plugin pytest-cov are incredibly useful tools.
coverage demo
At the same time, there are two common ways coverage is misused:
- Thinking 100% coverage means no bugs.
- Fixating on 100% coverage at the expense of everything else.
- Coverage cannot tell the difference between pure unit tests & integration tests. If we rely on mocking we may not properly test units of behavior but still reach 100% coverage.
Example
For many years, I maintained a large open source project with hundreds of contributors. The project consisted of a data pipeline and 150 or so web scrapers. Our core code was well-tested, but the web scrapers had no tests, why not?
What would a test of a web scraper look like? What principles would it have a hard time not violating?
Alternative Forms of Testing
Snapshot Testing
import vcr
@vcr.use_cassette("fixtures/get_user.yaml")
def test_get_user_email():
result = get_user_email(1)
assert result == "alice@example.com"interactions:
- request:
body: null
headers:
Accept:
- '*/*'
User-Agent:
- python-requests/2.28.0
method: GET
uri: https://api.example.com/users/1
response:
body:
string: '{"id": 1, "email": "alice@example.com", "name": "Alice"}'
headers:
Content-Type:
- application/json
status:
code: 200
message: OK
version: 1hypothesis
from hypothesis import given
from hypothesis import strategies as st
@given(st.text())
def test_encode_decode_roundtrip(s):
assert decode(encode(s)) == sContinuous Integration
It is generally accepted that tests should be run on every commit.
Most teams only allow code (at least on main) that is passing all tests.
Tests that aren’t run often aren’t serving their purpose & risk drifting out of sync with the codebase.
pre-commit- GitHub Actions (& equivalents)