Static Visualization Project

For the first half of the course you will be building a suite of static visualizations on a topic of your choosing.

The final product will be 6-10 visualizations connected by a narrative.

Examples:

2024 - From last year’s class.
2021 - slightly different requirements, but these capture the spirit of what we’re trying to create here.

The final deliverable will consist of a document (HTML or PDF) that weaves your visualizations into a narrative of some kind. This can take the form of an article, blog post, infographic, poster, or anything else that allows you to combine your visualizations with written explanations.

In the second half of the quarter, you will work on an interactive viz. You may decide to use the same data sources, or pick a different topic altogether.

Milestone 1: Proposal

For your proposal you will:

select a policy area of interest to you;
explore what data is available; and
decide on a goal. What do you hope to show with your suite of visualizations?

Some things that might be useful to consider:

Availability of data. The less time you have to spend cleaning & preparing your data the better. You will need a high quality source of open data, something that does not require scraping or other time-consuming efforts. It is essential that you take some time to verify assumptions you are making about data availability, having to pivot mid-quarter will present major challenges.

Uniformity of sources. Be sure that your data sources have a common identifier / unit of analysis. Useful things to look for would be commonly accepted identifiers like zip code or census tract, or data that is provided via a relational API, and thus comes with built-in identifiers for each record.

Different units of observation. You will need different categories of data to create different visualizations. Consider looking for data that has some combination of:

geospatial features
time-series characteristics
data at different hierarchies (e.g. state/county/city)
networks/connections between people/organizations
variety of categorical and numeric variables

Paying attention to these categories will give you lots to explore, and make it unlikely you run out of good viz ideas.

Avoid too much data. There’s no “right” number of records, but I would probably be most comfortable in the 10,000-1,000,000 range. Reach out if you feel you’ll be significantly outside of that range.

Consider that many datasets can be segmented by year. That can be a convenient way to scale the size of your work by picking an appropriately sized subset.

Sources of Data

Some possible sources of data:

Data.gov - US Gov data portal
Data is Plural - a newsletter highlighting interesting data sets
Kaggle - curated datasets for ML tasks
City of Chicago Data Portal - most large cities have their own as well!
Census.gov Data - great for supplemental demographics and/or geospatial data
UNC Dataverse
Information is Beautiful
Google Dataset Search

Submission

For the first milestone, you will be asked to share a git repository.

It should contain a milestones/static-proposal.md resembling:

# {Your Name}

## Description

{what are you trying to accomplish?}

## Data Sources

{include 2-4 data sources with the following:}

### Data Source 1: {Name}

URL: {URL}

Size: {Number of Rows} rows, {Number of Columns} columns

{Short Description, including any exploration you've done already}

## Questions

{Numbered list of questions for course staff, if any.}

1.
2.
3.

Grading Specifications

To earn an S on this assignment it must be complete. If any portion is missing (e.g. for data sources, you include URLs but no description of how you will use the data) you will receive an N.

You are not being graded on having the “right” answers to any of these questions, this is to get you thinking about it early— and for us to get you feedback.

Milestone 2: Draft

You will have two weeks to go from your initial proposal to a working draft.

This milestone serves as a check-in to ensure you are on track to complete the assignment and as an opportunity obtain valuable peer feedback that will enhance your final product.

Specifications

To receive an S you must:

8-12 images (this is higher than the final requirement because some will likely be cut as you develop your narrative)
- >= 5 truly distinct types (e.g. choropleth, scatter plot, network graph, bar chart)
- >= 4 should be using real data. (Ideally, all data acquisition/cleaning would be done.)
Each image should:
- Use appropriate visual encodings.
- Include all essential elements (titles, labels, etc.)
- Some effort should be put in to visual design. (Not be using default themes, etc.)
- Have a few sentences explaining the context; a draft of how the image will fit your final narrative.
- Generally, be in a state where specific feedback would be useful.
Include Python code that generates your graphs. This can be in Python files or notebooks at this point.
Apply a non-default theme to all images that is distinct from the defaults and appropriate to your intended usage.
- You may opt to try multiple themes, for the purpose of obtaining feedback.

Include a milestones/static-draft.md with the following:

# {Project Name}

{Your Name}

## What is your current goal? Has it changed since the proposal?

## Are there data challenges you are facing? Are you currently depending on mock data?

## Describe each of the provided images with 2-3 sentences to give the context and how it relates to your goal.

Tip: The markdown syntax ![](image-name.png) will let you embed images directly, or you can number them and describe them by number in this file.

## What form do you envision your final narrative taking? (e.g. An article incorporating the images? A poster? An infographic?)

Non-Factors

A few things that you will not be graded on at this point:

Quality of narrative, beyond the inclusion of some explanatory sentences as noted above.
Quality of design choices, beyond fulfillment of requirements stated above.
Cohesiveness, which is only a factor in the final submission.
Code quality, comments.

Milestone 3: Critique

You will provide feedback to a group of your peers, and receive feedback on your own work.

Understanding how other people see your work is essential to finding its flaws and strengthening it. Learning to give and receive critique is an important skill in itself.

Critique should:

Not be cruel. Follow guidelines given in class.
Include specific constructive suggestions.
Use context of the project to ground suggestions. (e.g. “Since the audience you mentioned wanting to reach X, I think Y.”)

Your group should plan to meet during Week 4 and exchange feedback. All group members will fill out a form describing their participation in the meeting.

Good faith participation in your group’s critique process will earn you an S.

Possible reductions in this grade:

Group feedback that your participation in the critique process did not meet expectations.
If your Milestone 2 is late enough that you are not able to participate meaningfully in critique, it will result in a reduced grade for this assignment.

Milestone 4: Final Submission

Your final submission will be a web page or PDF that incorporates your images into a goal-driven narrative.

Your project by now should be using real data, have incorporated feedback from the critique process, and been pulled together with a consistent visual theme.

You will receive feedback in three different areas:

Grading Specification: Visual Design

This portion of the grade looks at how well you apply the principles of visualization design discussed in class. To receive an S:

At least 6 images.
Of at least 4 distinct types.
No significant presentation errors: missing elements, misleading axes, improper use of graph type, etc.
Demonstration of critique feedback incorporated into final product.

Grading Specification: Narrative

This portion focuses on your incorporation of your images into a larger narrative. To receive an S, your narrative will be evaluated for:

Appropriate use of real data. (No mocked data at this stage.)
Writing and visuals are clear and relevant to stated goal.
Narrative is well-supported by appropriately chosen graphs.
Cohesive visual design between images and supplemental content.
Data citations must appear within the final PDF/HTML.
Can not rely upon interactivity.
- If you decide to publish as HTML, minimal interactivity is allowed, such as hover effects/tooltips, but this interactivity should not be required to understand the narrative. In other words, a printed copy should convey all pertinent information.

Grading Specification: Code Quality / Documentation

Finally, for your code quality and documentation, an S requires:

Your repository should make use of directories to keep it manageable, I’d suggest: - data/ - All data (or non-graded data fetching code). If your data will exceed 100MB, please reach out to discuss options on how to share your data with us, as it will not fit in Git. - scratch/ - Anything in this directory won’t be graded, useful for experimentation. - src/ - Code that you would like us to include in final evaluation, should at minimum generate all graphs from included data. - static-viz/ - Files specific to the final milestone.
Code may be either:
- One or more .py files with clear instructions on how to generate all graphs by running 1-2 (uv run) commands.
- A jupyter or marimo notebook where executing all cells works and produces only the graphics being submitted. (move other work to scratch/)
In either case, code (non-scratch) must be:
- correct (without major errors in how Python is used)
- appropriately commented
- in accordance with the Python Style Guide as amended by the Style Addendum below.
A README.md resembling:

# {project name}

{your name}

## Description

{a short description of your project and its goals}

{REQUIRED: an embedded screenshot of your final project}

## Data Sources

{ citations for your data }

Reminder: Your final PDF/HTML also **must** contain citations for your data.

Example repository structure

├── LICENSE
├── README.md
├── data
│  ├── crown_exxon_ash.json
│  ├── download_census.py
│  └── fips.csv
├── scratch
│  ├── altair_theme_demo.py
│  └── exploration.ipynb
├── src
│  ├── graphs_marimo.py
│  └── theme.py
├── milestones
│  ├── proposal.md
   ├── draft.md
│  ├── draft1.png
│  ├── draft2.png
│  ├── draft3.png
   └── draft4.png
└── static-viz
   ├── final.pdf
   └── zine_layout.svg

Style Addendum

The standards for code written for data science contexts, in particular in notebooks is somewhat different from larger software applications where code is expected to be maintained for longer periods.

Your final code submission should follow the UChicago Python Style Guide, with these two amendments:

You may use global variables in moderation. A common reason for this is to load & mutate a single master data set that is then used in the remainder of the file. That usage is acceptable, but be sure to use ample commenting to ensure code remains readable.

Notebooks must execute sequentially. Jupyter notebooks allow non-sequential execution of cells, this can be confusing and lead to different results during development vs. subsequent runs. If you are using Jupyter, be absolutely certain that your notebook can run by restarting the kernel and running cells in order.