graph BT;
A --> B;
B --> C;
C --> D;
C --> E;
A --> W;
W --> F;
D --> F;
E --> F;
git
This is a quick primer on Git, mostly meant as a refresher/reference.
For learning more Git I recommend:
Using Git Solo
There are two ways to initialize a Git repository: locally via git init, or by creating a repo on GitHub and cloning it.
Local Initialization
From within a directory that you want to treat as a repo:
intermediate-git/$ git init
Initialized empty Git repository in /Users/james/repos/intermediate-git/.git/That’s all it takes. The current directory you’re in is now a git repository with 0 commits.
It’s a good idea to create a .gitignore file at this point:
.gitignore
If you create your project via GitHub it’ll create a .gitignore file for you. Otherwise, you’d create one yourself.
This file should be a list of files/patterns that you’d like to exclude.
For example:
*.pyc
.vscode
scraped_data/
.DS_Store
This would avoid checking in .pyc files, your local .vscode settings, and the scraped_data directory. MacOS makes .DS_Store files that you probably don’t want to check in either.
Adding A Remote
If you want to push your repo to GitHub, you’ll need to add a remote:
$ git remote add origin git@github.com:jamesturk/git-workshop-example.gitGitHub Repo First
GitHub can provide reasonable defaults based on the language of your project as well, but don’t feel like you need everything that they add, a lot of the files in their list are from editors/IDEs you aren’t using.
Data Model
Git is what we call a leaky abstraction. This means that it is sometimes necessary to understand how it works under the hood in order to use it effectively.
If you read about Git or use some of the more advanced features you’ll eventually see references to some key data structures:
- Blobs
- Trees
- Commits
- Tags
Blobs are essentially the contents of a file at a given point in time. Trees are a collection of blobs in a directory-like hierarchy. We don’t need to worry about these too much for what we’re talking about today but I wanted to mention them.
We do want to talk about commits however.
You’re familiar with making commits, but let’s talk a bit more about what is actually stored:
- Commit ID (a SHA-1 Hash)
- Author information (name, email)
- Committer information (name, email) [can be different from author, we won’t worry about this]
- Commit message
- Timestamp
- A reference to the tree at the time of the commit.
- Parent(s) (zero or more)
(I’ll draw git diagrams with the root at the bottom and the most recent commit at the top, which is what you’ll usually see by convention.)
Commits form a Directed Acyclic Graph (DAG).
A is a root commit, because it has no parent.
(Typically repos only have one root commit.)
F is a merge commit, because it has more than one parent.
Branching
The simplest Git repo would be one with a purely linear history:
graph BT;
A(initialize) --> B(add feature #1);
B --> C(add feature #2);
C --> D(add feature #3);
But let’s say that we were considering an alternate way to implement our next feature. We might instead create a new branch:
git branch new-featureAll that this has done is create a new pointer to the same commit that main was already pointing to.
$ git log
commit 8ea904f (HEAD -> main, new-feature)
Author: James
Date: Thu Apr 6 17:51:20 2023 -0500
second commit
commit 908ee8c
Author: James
Date: Thu Apr 6 17:48:12 2023 -0500
first commitgraph BT;
A(first commit) --> B(second commit : main, new-feature);
Both main and new-feature are pointing to the same commit.
This is a key concept in Git: branches are mutable labels that point to commits.
So here’s what happens when we make a new commit:
$ ...
$ git commit -m "third commit"
...
$ git log
commit 1337c4a (HEAD -> main)
Author: James
Date: Thu Apr 6 17:52:04 2023
third commit
commit 8ea904f (new-feature)
Author: James
Date: Thu Apr 6 17:51:20 2023
second commit
commit 908ee8c
Author: James
Date: Thu Apr 6 17:48:12 2023
first commitgraph BT;
B --> C(third commit: main);
A(first commit) --> B(second commit : new-feature);
Notice that main moved forward, but new-feature was left behind.
Whenever you git commit, the branch that you’re currently on will move forward to point to the new commit.
To actually use new-feature, we need to switch to it:
$ git switch new-featureNow commits will move new-feature forward. So typically the workflow for starting a new branch looks like:
git branch new-branch
git switch new-branchAside: git checkout
You will also see people use git checkout -b to create a new branch and switch to it in one step.
git checkout -b new-branch
# same as
git branch new-branch
git checkout new-branchgit checkout is an older command, and can do a lot of different things. Feel free to use it, but I prefer to use the newer commands because they are less overloaded with unrelated behavior.
Finally, git branch without a branch name will list all of the branches in your repo.
$ git branch
main
* new-feature
pr/11
pr/12
experimentsRecap
- Branches are (mutable) labels that point to (immutable) commits.
git commitmoves the branch that you’re currently on forward.git switchchanges which branch you’re currently on.git branch <branchname>creates a new branch.git branchwithout a branch name will list all of the branches in your repo.
Merging
Now that we can create branches, we can work on multiple features at once. Whether we’re working alone or on a large team, we’ll eventually want to combine our work.
graph BT;
A(initial commit : main)
A --> B(wireframe UI);
B --> C(add bootstrap CSS: ui);
C --> D(add profile page: profile-page);
C --> E(add login page);
E --> F(fix login page bug: login-page)
A --> W(backend prototype, very slow : backend);
W --> X(add benchmarks);
X --> Y1(optimized via rpython : try-pypy);
X --> Y2(wrote C version: try-c);
X --> Y3(rewritten in Rust: try-rust);
We have a lot of different branches here:
- main
- ui
- profile-page
- login-page
- backend
- try-pypy
- try-c
- try-rust
- ui
Typically, we’ll see branches merge back to their parent, so we can consider the ui and backend branches separately. Let’s look at UI for now:
graph BT;
A(initial commit : main)
A --> B(wireframe UI);
B --> C(add bootstrap CSS: ui);
C --> D(add profile page: profile-page);
C --> E(add login page);
E --> F(fix login page bug: login-page)
Fast-forward merge
Let’s say that we’ve finished the login page, and we want to merge it back into ui.
We can do that with git merge:
Whenever we’re modifying a branch, we want to switch to it first. So just as we do before a git commit, we switch to the destination ui branch.
Then we run git merge login-page.
git switch ui
git merge login-page
Updating e6512d6..d45dee9
Fast-forward
README.md | 3 ++-
...You’ll see in this example, Git did a “fast-forward” merge. This means that Git was able to move the ui branch forward to the same commit that login-page was already pointing to.
This was possible because no new commits were created on ui since we created login-page.
Our updated commit graph:
graph BT;
A(initial commit : main)
A --> B(wireframe UI);
B --> C(add bootstrap CSS);
C --> D(add profile page: profile-page);
C --> E(add login page);
E --> F(fix login page bug: login-page, ui)
(The UI label has moved forward to point to the same commit as login-page.)
Deleting Branches
At this point, we’d likely delete the login-page branch, since it’s no longer needed.
git branch -d login-pageAll that this command does is delete the label, the underlying commits will never be deleted.
If you try to delete a branch that isn’t yet merged, Git will warn you and prevent you from doing this. If you want to do it anyway, you can use git branch -D.
(Deleting a branch with unmerged commits makes those commits harder to find, but still doesn’t actually remove the commits.)
Clean Merges
Let’s continue, and say that it is now time to merge in the profile page.
git switch ui
git merge profile-pageLet’s say profile-page only touched the profile.html file, and login-page only touched login.html. In this case, Git will be able to automatically merge the two branches together.
Auto-merging profile.html
Merge made by the 'recursive' strategy.
profile.html | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)Git will automatically create a new commit with two parents, one for each branch.
graph BT;
A(initial commit : main)
A --> B(wireframe UI);
B --> C(add bootstrap CSS);
C --> D(add profile page: profile-page);
C --> E(add login page);
E --> F(fix login page bug)
F --> G(merge commit: ui)
D --> G
Merge Conflicts
But things aren’t always so clean of course, maybe both branches also modified a base_template.html file instead. In this case, Git will be unable to automatically merge the two branches together.
Auto-merging base_template.html
CONFLICT (content): Merge conflict in base_template.html
Automatic merge failed; fix conflicts and then commit the result.At this point, your repository will be in a “merge conflict” state. Git will have modified the file to show you the conflicts, in this case two different CSS files were added to the HTML:
<title>My Website</title>
<head>
<<< HEAD
<link rel="stylesheet" href="css/login.css">
=======
<link rel="stylesheet" href="css/profile.css">
>>> profile-page
</head>
<body>The <<< HEAD and >>> profile-page lines show you the two different versions of the file split by ======.
The portion between <<< HEAD and ==== is the version of the file that was on the current branch, in this case ui.
The portion between ==== and >>> profile-page is the version of the file that was on the branch we’re merging in, in this case profile-page.
We probably want both of these lines, so we’ll edit the file to look like this:
<title>My Website</title>
<head>
<link rel="stylesheet" href="css/login.css">
<link rel="stylesheet" href="css/profile.css">
</head>
<body>When we’ve made these changes, we add and commit our changes just like we usually would. The commit that we create from this state will have two parents, just like we saw above.
Aborting a Merge
Sometimes you attempt a merge and discover the conflict will be hard to resolve.
In this case, you can abort the merge with git merge --abort.
This will rewind your repository to the state it was in before you tried to merge, so you can consider other approaches.
Merging and Testing
Of course, this is a trivial example, and in a real merge conflict it can be necessary to figure out how the changed lines should be combined.
If you’re using VS Code or another editor with Git integration, you can use the editor to resolve the conflict. Otherwise, you’ll need to edit the file manually.
Also, note that merge conflicts only occur when the same section of a file was edited in both branches.
If the edit is in completely different parts of the file, git will merge them automatically by default. That doesn’t mean that the code works, as you may find that a change to a function in a different file (or part of the same file) changes how the code works.
This is another reason that tests are so important, as running the tests after a merge can provide some peace of mind that the code still works as expected if your test suite is comprehensive.
Remote Branches
So far, we’ve been working with branches that only exist on our local machine. To share branches with other developers, we need to push them to a remote repository.
Pushing
To work with remote branches, you’ll need a remote set up, which we saw in Part 1. (If you created/cloned the repo from GitHub a remote already exists).
To push a branch to GitHub:
git push origin ui # push the ui branch to the origin remoteIf you’d like to be able to just type git push to push the current branch, you can set up a default remote branch:
git push -u origin ui # push the ui branch to the origin remote, and set it as the defaultFrom then on, you can just type git push to push the ui branch to the remote.
Fetch & Pull
If you want to pull a remote branch that exists on the remote but not locally (e.g. to check out a teammates work), you can use git fetch:
git fetch origin login-page # fetch the login-page branch from the origin remoteThis will create a local branch called origin/login-page that you can check out & work with as usual.
If your intent is to merge all of the changes from the remote branch into your current branch, you can use git pull:
git pull origin login-page # fetch the login-page branch from the origin remote, and merge it into the current branchDeleting Remote Branches
If you want to delete a remote branch, you can use git push with the --delete flag:
git push origin --delete login-page # delete the login-page branch from the origin remote(You can also do this from GitHub’s web interface, which is handy if you’re using Pull Requests.)
Conclusion / Misc.
Git Book Chapter 3 https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
Handy GitHub Tricks
- Mention an issue in a commit message and it’ll be linked. (e.g. “Fixes #123” will close that issue when the commit is merged to main.)
- GitHub CLI - Command line interface for GitHub, lets you do things like create and review pull requests from the command line.
Tagging
Tags are a way to mark a specific commit as important. A common use is to tag releases. (e.g. v0.6.2 or 2023-04-05)
Tags are distinct from branches in that they do not move when new commits are added, but are similar in that they are just a pointer to a commit.