Git from the Inside Out

evergreen

Introduction

Git’s entire data model fits in one sentence: an append-only, content-addressable object store of blobs, trees, and commits, with mutable pointers (branches, tags, HEAD) layered on top. That is it. Every command - commit, merge, rebase, reset - is either creating objects, moving pointers, or both.

Most Git tutorials skip this and teach a bag of CLI recipes. The result: developers memorize commands without understanding what they do, and panic when something goes wrong. This post takes the opposite approach. It starts from the internal data model (the plumbing) and builds up to the daily commands (the porcelain). Once you see that a branch is a 41-byte file containing a commit hash, operations like merge and rebase become intuitive pointer manipulations rather than incantations.

By the end of this post, you will be able to:

Explain the three object types (blob, tree, commit) and how they form a content-addressable filesystem
Trace what git add and git commit do at the object level, step by step
Understand branching, merging, and rebasing as pointer operations on a DAG
Use reset, rebase -i, bisect, and cherry-pick with confidence - knowing exactly what each one moves
Follow a professional feature-branch workflow from creation to fast-forward merge

We start with the object model, then build up through references, branches, remotes, history inspection, history rewriting, and the professional workflow that ties it all together. The diagram below is the complete mental model - every section that follows is an operation on this structure.

graph LR
    HEAD["HEAD"] --> BR["branch<br/><small>(refs/heads/main)</small>"]
    BR --> C2["commit"] --> C1["commit"]
    C2 --> T["tree<br/><small>(root dir)</small>"]
    T --> B1["blob<br/><small>(file)</small>"]
    T --> T2["tree<br/><small>(subdir)</small>"]
    T2 --> B2["blob"]
    TAG["tag v1.0<br/><small>(refs/tags/)</small>"] --> C1

    style HEAD fill:#d9534f,stroke:#333,color:#fff
    style BR fill:#f9d71c,stroke:#333,color:#000
    style TAG fill:#f9d71c,stroke:#333,color:#000
    style C2 fill:#7eb8da,stroke:#333,color:#000
    style C1 fill:#7eb8da,stroke:#333,color:#000
    style T fill:#90c695,stroke:#333,color:#000
    style T2 fill:#90c695,stroke:#333,color:#000
    style B1 fill:#e8a87c,stroke:#333,color:#000
    style B2 fill:#e8a87c,stroke:#333,color:#000

Overview: The complete Git mental model. Objects (blobs, trees, commits) form an immutable DAG. References (branches, tags, HEAD) are mutable pointers into it. Every command you learn in this post creates objects, moves references, or both.

1. The Object Model: How Git Actually Stores Data

Every version control system must answer a fundamental question: how do you store the state of a project at a point in time? Some systems store deltas - the difference between consecutive versions. Git takes a radically different approach: it stores snapshots. Every commit captures the complete state of every file in the project. Files that have not changed are not duplicated; Git simply reuses the existing object. This snapshot model is what makes branching and merging fast: a branch is just a pointer to an existing snapshot (no need to replay deltas to reconstruct state), and a three-way merge can directly compare three complete trees rather than reconstructing them from a chain of differences.

Everything Git stores lives inside the .git directory at the root of your project. Delete that directory, and the entire history is gone. Keep it, and you have a complete, self-contained database of every version of every file ever committed.

1.1 Content-Addressable Storage

The key design decision that makes Git work is content-addressable storage. Every object Git stores - every file, every directory listing, every commit - is identified by the SHA-1 hash of its contents. The hash is a 40-character hexadecimal string (160 bits), and it serves as both the object’s name and its address in the database.

# Pseudocode for Git's storage model
def store(obj):
    id = sha1(obj)
    objects[id] = compress(obj)
    return id

def load(id):
    return decompress(objects[id])

Git uses the first two characters of the hash as a directory name and the remaining 38 as a filename, all stored under .git/objects/. For example, a hash of ee5941ab3c... is stored at .git/objects/ee/5941ab3c....

This design gives Git three properties for free:

Deduplication. Two files with identical contents produce the same hash, so Git stores only one copy. Rename a file? The blob stays the same; only the parent tree changes.
Integrity. Any corruption - even a single flipped bit - changes the hash, so Git detects it immediately.
Immutability. Objects are addressed by their content. You cannot change an object without changing its address. This append-only property is why committed data is almost always recoverable.

Git Is a Content-Addressable Filesystem

At its core, Git is not a “version control system” - it is a content-addressable filesystem with a version control UI built on top. Understanding this is the single most important mental shift for mastering Git. Every command you run - commit, merge, rebase, reset - is ultimately an operation on this object store.

1.2 Blobs: File Contents Without Names

A blob (binary large object) stores the raw contents of a single file - nothing more. No filename, no permissions, no metadata. Just bytes.

type blob = array<byte>

When Git computes the hash of a blob, it prepends a header: the string "blob", a space, the content length in bytes, and a null byte. The SHA-1 of this combined string becomes the blob’s address:

SHA-1("blob 60\0" + file_contents) → ee5941ab3c...

Because blobs store only contents and not names, two files with identical contents - even in different directories, even with different names - map to the same blob. Git stores it exactly once.

# Inspect a blob - git cat-file -p shows the raw content
# First, find a blob hash from the current tree
!git cat-file -p HEAD^{tree} | tail -2

100755 blob 1b4e54a00da0021ed03ce4facf67cea88d230300    til.qmd
040000 tree 02a16aeaeaa4ae9113498e37832f78ece2bb27f4    til

!zlib-flate -uncompress < 1b/4e54a00da0021ed03ce4facf67cea88d230300 | head -2

blob 528---
title: Today Imad Learned

1.3 Trees: Directory Snapshots

A tree object represents a directory. It contains a list of entries, where each entry maps a name (filename or subdirectory name) to either a blob or another tree, along with a file mode (permissions).

// A directory maps names to blobs or subtrees
type tree = list<(mode, type, hash, name)>

Each line in a tree object looks like:

100644 blob a1b2c3d4...  README.md
040000 tree e5f6a7b8...  src/

Trees have several important properties:

Trees do not store their own name. A tree’s name is assigned by its parent tree. The root tree - the top-level directory of the project - has no name at all, which is why renaming your repository’s local directory has zero effect on Git.
Empty directories are invisible. A tree must contain at least one entry. Git cannot track an empty directory. The common workaround is placing a .gitkeep file inside it.
Renaming is cheap. If you rename a subdirectory, only the parent tree changes. The subtree object and everything below it remain untouched - they have the same hashes and the same addresses.
Trees are themselves hashed. The hash of a tree is computed over its list of entries. Change any entry (add a file, rename something, update a blob hash), and the tree gets a new hash - which propagates up to every parent tree, all the way to the root.

# Inspect the root tree of the latest commit
!git cat-file -p HEAD^{tree} | head -3

100755 blob bb6df347bcb140d911763ede8d70c72981c6b760    .gitignore
100755 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    .nojekyll
100755 blob 14d0bc4d92bee96cd7b778e97222ed8f18b5b8c5    404.md

1.4 Commits: Snapshots with Context

A commit object ties everything together. It points to a root tree (the snapshot), records who made the change, when, and why, and links to its parent commit(s) to form a history.

type commit = struct {
    tree:    hash           // pointer to the root tree (the snapshot)
    parent:  list<hash>     // zero parents (initial), one (normal), or two+ (merge)
    author:  string         // who wrote the change
    committer: string       // who applied the change
    message: string         // why the change was made (commit message)
}

A commit does not store diffs. When you run git diff between two commits, Git compares their root trees on the fly and computes the difference on demand. This is a key design choice: storing snapshots makes branching and merging fast at the cost of slightly more storage (mitigated by deduplication and packfiles).

Because unchanged files keep the same blob hash, and unchanged directories keep the same tree hash, most of a commit’s tree structure is shared with its parent. A commit that changes one file out of a thousand creates exactly one new blob, one new tree for its parent directory, and new trees up to the root - everything else is reused via identical hashes.

# Inspect a commit object - shows tree, parent, author, message
!git cat-file -p HEAD

tree 94ab6b930e1b2be4b0e68d6eb14f81671498f14d
parent 84381f8ce551c786c0a2b1564cf9a3d8e48319a4
author ImadDabbura <imad.dabbura@hotmail.com> 1775674279 -0500
committer ImadDabbura <imad.dabbura@hotmail.com> 1775674279 -0500

feat: Add few more tips/tricks

1.5 The Three States: Modified, Staged, Committed

Git manages files through three distinct states, mediated by three areas:

Area	Location	Purpose
Working directory	Your project files on disk	Where you edit files
Staging area (index)	`.git/index`	A draft of the next commit snapshot
Repository (object store)	`.git/objects/`	The permanent, immutable database

The lifecycle of a change:

Modified. You edit a file in your working directory. Git knows it has changed (by comparing its hash to the index) but has not recorded the change anywhere.
Staged. You run git add file. Git computes the SHA-1 of the file, compresses and stores the blob in .git/objects/, and records the hash in the index file. The index is now a draft of what the next commit’s tree will look like.
Committed. You run git commit. Git reads the index, builds tree objects for every directory, creates a commit object pointing to the root tree (with parent, author, and message), and updates the current branch to point at the new commit.

Files also fall into two tracking categories:

Tracked: files that exist in the last commit or in the staging area. They can be modified, staged, or committed.
Untracked: files Git does not know about. They appear in git status but are not included in commits until explicitly added.

1.6 Putting It All Together: A Worked Example

Let’s trace exactly what happens when you create a file and commit it.

Step 1 - Create a file:

echo "Hello, Git" > greeting.txt

The file exists only in your working directory. Git status shows it as untracked.

Step 2 - Stage the file (git add greeting.txt):

Git does three things:

Computes the SHA-1: SHA-1("blob 11\0Hello, Git\n") → ab3f...
Compresses the content and stores it at .git/objects/ab/3f...
Adds an entry to .git/index: 100644 blob ab3f... greeting.txt

Step 3 - Commit (git commit -m "Add greeting"):

Git does four things:

Reads the index and creates a tree object listing greeting.txt → ab3f... → tree hash d8e7...
Creates a commit object: tree d8e7..., parent <previous HEAD>, author ..., message "Add greeting" → commit hash f1a2...
Stores both the tree and commit as compressed objects in .git/objects/
Updates .git/refs/heads/main to contain f1a2...

The result is a chain: branch → commit → tree → blob(s).

graph LR
    B["main<br/><small>refs/heads/main</small>"] --> C["commit f1a2...<br/><small>tree: d8e7...</small><br/><small>parent: 9c3b...</small><br/><small>msg: Add greeting</small>"]
    C --> T["tree d8e7...<br/><small>greeting.txt → ab3f...</small>"]
    T --> BL["blob ab3f...<br/><small>Hello, Git</small>"]
    C --> PC["commit 9c3b...<br/><small>(parent commit)</small>"]

    style B fill:#f9d71c,stroke:#333,color:#000
    style C fill:#7eb8da,stroke:#333,color:#000
    style T fill:#90c695,stroke:#333,color:#000
    style BL fill:#e8a87c,stroke:#333,color:#000
    style PC fill:#7eb8da,stroke:#333,color:#000

Figure 1: The Git object model. A branch points to a commit, which points to a root tree, which points to blobs (files) and subtrees (directories). Each object is identified by the SHA-1 hash of its contents.

Objects, Refs, HEAD - That’s All There Is

The entire Git data model consists of three object types (blobs, trees, commits), mutable references (branches, tags, remote-tracking branches), and a single HEAD pointer. Every Git command - no matter how complex - is ultimately a combination of creating objects and moving references. Once this clicks, Git stops being mysterious.

2. References: Human-Readable Pointers

Raw SHA-1 hashes are precise but unwieldy - nobody wants to type f1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0 every time they refer to a commit. References (refs) solve this by providing human-readable names that map to commit hashes. They are stored as plain-text files under .git/refs/.

# References are simply a map from names to commit hashes
references = map<string, hash>

def update_reference(name, id):
    references[name] = id

def read_reference(name):
    return references[name]

There are three kinds of references, each serving a distinct role.

2.1 Branches (Heads)

A branch is a file in .git/refs/heads/ that contains the SHA-1 hash of a single commit - the tip of that branch. When you create a branch, Git creates a 41-byte file (40 hex chars + newline). When you make a commit on that branch, Git updates the file to point to the new commit. That is the entire implementation of branching.

# Creating a branch is literally creating a file
$ cat .git/refs/heads/main
f1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0

This is why branches in Git are so cheap: they are not copies of your codebase, not snapshots, not deltas - they are 41-byte text files. Creating a thousand branches costs roughly 41 KB of disk space.

2.2 HEAD: The Singleton Pointer

HEAD is a special reference that identifies where you currently are. Unlike branches, there is only ever one HEAD - it is a singleton. It lives at .git/HEAD and typically contains a symbolic reference to a branch:

$ cat .git/HEAD
ref: refs/heads/main

This tells Git: “the current branch is main.” When you commit, Git follows the chain: HEAD → refs/heads/main → update that file with the new commit hash.

When you check out a specific commit (not a branch), HEAD points directly at that commit hash instead of a branch name. This is a detached HEAD state - you are no longer on any branch, and new commits will not be reachable from any branch unless you explicitly create one.

Detached HEAD

In detached HEAD state, any commits you make will become unreachable (and eventually garbage-collected) once you switch to a branch - unless you create a branch pointing to them first. If you find yourself in detached HEAD, Git tells you: simply run git switch -c new-branch-name to save your work.

2.3 Tags: Immutable Bookmarks

Tags are references stored in .git/refs/tags/. Like branches, they point to commits - but unlike branches, they are meant to be immutable. A tag marks a fixed point in history (typically a release: v1.0, v2.3.1).

Git supports two kinds of tags:

Lightweight tags are simple files containing a commit hash - identical in structure to a branch, just never updated.
Annotated tags are full Git objects stored in .git/objects/. They contain the tagger’s name and email, a timestamp, a message, and optionally a GPG signature. The tag file in refs/tags/ then points to this tag object, which in turn points to the commit. Annotated tags are recommended because they preserve who tagged, when, and why.

# Lightweight: just a pointer
$ git tag v0.1

# Annotated: a full object with metadata
$ git tag -a v1.0 -m "First stable release"

# Tag a historical commit
$ git tag -a v0.5 ca21323

Tags Are Not Pushed by Default

Running git push does not transfer tags to the remote. You must explicitly push them: git push origin v1.0 (one tag) or git push origin --tags (all tags). This is a deliberate safety measure - tags are meant to be curated, not automatically propagated.

2.4 The Ref Hierarchy

All references live under .git/refs/ in a clean hierarchy:

.git/refs/
├── heads/          # Local branches
│   ├── main
│   └── feature-x
├── tags/           # Tags
│   ├── v1.0
│   └── v2.0
└── remotes/        # Remote-tracking branches
    └── origin/
        ├── main
        └── feature-x

The following table summarizes the three reference types:

Reference	Location	Mutable?	Points to	Purpose
Branch	`refs/heads/`	Yes - updated on every commit	Latest commit on the branch	Track the moving tip of a line of development
Tag	`refs/tags/`	No - fixed once created	A specific commit (or tag object)	Mark releases and milestones
HEAD	`.git/HEAD`	Yes - changes on checkout	A branch (symbolic) or commit (detached)	Identify where you are right now
Remote branch	`refs/remotes/`	Yes - updated on fetch/pull	Last known commit on the remote	Bookmark the remote’s state

With the object model and reference system in place, we can now see how Git uses them for its most common operation: branching and merging.

3. Branching and Merging

Branching is where Git’s lightweight object model pays off. Because a branch is just a 41-byte file, creating one is instantaneous. Because commits share unchanged objects, branches consume almost no additional storage. This makes it practical to create a branch for every feature, every bug fix, every experiment - and to merge or discard them freely.

3.1 Creating and Switching Branches

Creating a branch copies the current commit hash into a new file:

# Create a new branch (does not switch to it)
$ git branch feature-login

# Create and switch in one step (modern syntax)
$ git switch -c feature-login

# Legacy equivalent (still works, but overloaded)
$ git checkout -b feature-login

When you switch branches, Git does two things: (1) updates HEAD to point at the new branch, and (2) rewrites your working directory to match that branch’s latest commit. Files are added, removed, and modified automatically.

git switch vs git checkout

The switch and restore commands were introduced in Git 2.23 (2019) to replace the overloaded checkout command, which handled both branch switching and file restoration. The modern equivalents: git switch for branches, git restore for files. Both checkout forms still work, but switch/restore are clearer and safer - switch refuses to overwrite uncommitted changes by default.

3.2 Fast-Forward Merges

The simplest merge scenario: you created a branch, made commits, and nobody else committed to the base branch in the meantime. The base branch’s tip is a direct ancestor of your branch’s tip - the history is linear.

In this case, Git does a fast-forward merge: it simply moves the base branch pointer forward to your branch’s tip. No new commit is created, no objects are created, no merge logic runs. It is literally updating a 41-byte file.

$ git switch main
$ git merge feature-login --ff-only

graph LR
    A["A"] --> B["B"] --> C["C<br/><small>main (before)</small>"] --> D["D"] --> E["E<br/><small>feature-login</small><br/><small>main (after)</small>"]

    style A fill:#7eb8da,stroke:#333,color:#000
    style B fill:#7eb8da,stroke:#333,color:#000
    style C fill:#7eb8da,stroke:#333,color:#000
    style D fill:#90c695,stroke:#333,color:#000
    style E fill:#90c695,stroke:#333,color:#000

Figure 2: Fast-forward merge. Main’s pointer simply advances to the feature branch’s tip. No merge commit is created.

The --ff-only flag tells Git: “only merge if a fast-forward is possible; otherwise, abort.” This is a good default for teams that value linear history.

3.3 Three-Way Merges

When both branches have diverged - the base branch has commits that your feature branch does not, and vice versa - Git cannot fast-forward. Instead, it performs a three-way merge using three reference points:

The common ancestor - the most recent commit reachable from both branches
The tip of the base branch (e.g., main)
The tip of the feature branch

Git compares each branch’s tip against the common ancestor to determine what changed on each side. If changes do not overlap (different files, or different regions of the same file), Git combines them automatically and creates a merge commit - a commit with two parents - that ties the histories together.

$ git switch main
$ git merge feature-login

graph LR
    A["A"] --> B["B"]
    B --> C["C"] --> D["D<br/><small>main</small>"]
    B --> E["E"] --> F["F<br/><small>feature</small>"]
    D --> M["M<br/><small>merge commit</small>"]
    F --> M

    style B fill:#f9d71c,stroke:#333,color:#000
    style D fill:#7eb8da,stroke:#333,color:#000
    style F fill:#90c695,stroke:#333,color:#000
    style M fill:#e8a87c,stroke:#333,color:#000

Figure 3: Three-way merge. Git finds the common ancestor (B), compares both tips against it, combines the changes, and creates a merge commit (M) with two parents.

3.4 Merge Conflicts

When both branches modify the same lines of the same file, Git cannot determine which version is correct. It pauses the merge, marks the conflicting regions in the file with conflict markers (<<<<<<<, =======, >>>>>>>), and asks you to resolve them manually.

The resolution workflow:

Open the conflicting file(s) and edit the conflict markers to produce the desired result
Stage the resolved files: git add <file>
Complete the merge: git commit (this creates the merge commit)

Alternatively, abort the merge entirely: git merge --abort restores the state before the merge began.

For complex conflicts, visual merge tools like vimdiff, VS Code’s built-in merge editor, or dedicated tools like meld can help. Configure your preferred tool with git mergetool.

Merge Strategies

Git supports several merge strategies beyond the default ort (formerly recursive). The most useful alternatives: ours (keep our version entirely, discarding their changes - useful for marking a branch as “merged” without taking its content) and octopus (for merging more than two branches at once). To resolve conflicts in favor of the other side, use the strategy option -X theirs (note: this is an option to the default ort strategy, not a standalone strategy). For example: git merge -X theirs feature-branch.

Branching and merging work locally. The next section extends these ideas to collaboration across repositories - remotes, tracking branches, and the fetch/push protocol.

4. Remote Branches and Collaboration

Git is distributed - every clone is a complete, independent repository with its own history, branches, and object store. Collaboration happens by synchronizing objects and references between repositories. The machinery for this is remotes, remote-tracking branches, and the fetch/push protocol.

4.1 The Remote Model

A remote is a named URL pointing to another Git repository. When you clone a repo, Git automatically creates a remote called origin pointing to the source URL. You can have multiple remotes - for example, origin for your fork and upstream for the original project.

# List remotes
$ git remote -v

# Add a second remote
$ git remote add upstream https://github.com/original/repo.git

Remote-tracking branches live under .git/refs/remotes/<remote>/ and act as read-only bookmarks of where each branch was on the remote the last time you communicated with it. You never update them directly with git commit - Git manages them automatically during fetch and push.

4.2 Tracking Branches

A tracking branch (or “upstream branch”) is a local branch that is linked to a remote-tracking branch. This link tells git pull where to fetch from and git push where to push to, without specifying the remote and branch name every time.

Tracking is set up automatically in several cases:

# Cloning: 'main' automatically tracks 'origin/main'
$ git clone https://github.com/user/repo.git

# Switching to a remote branch name creates a tracking branch
$ git switch feature-x  # creates local 'feature-x' tracking 'origin/feature-x'

# Explicit tracking setup
$ git branch --set-upstream-to=origin/feature-x

# Push and set upstream in one step
$ git push -u origin feature-x

You can also track branches from different remotes, or use different local and remote names:

# Track a branch from a different remote
$ git switch -c my-local-name upstream/their-branch

4.3 Cloning: What Actually Happens

When you run git clone https://github.com/user/repo.git, Git performs four steps:

Creates a directory named repo/ with a .git/ subdirectory
Downloads the entire object store - every blob, tree, and commit in the history
Creates remote-tracking branches under refs/remotes/origin/ for every branch on the remote
Creates a local main branch tracking origin/main and checks it out

The critical implication: cloning downloads all versions of every file ever committed. If someone committed a 500 MB binary three years ago and then deleted it, that blob is still in the history and gets cloned. For repositories with very long histories, git clone --depth N creates a shallow clone with only the last N commits.

Shallow Clones and CI

Shallow clones (--depth 1) are common in CI/CD pipelines where you only need the latest code to build and test. They are dramatically faster to clone but cannot perform operations that require full history (like git log across all time or git bisect to the beginning). Use git fetch --unshallow to convert a shallow clone to a full one when needed.

4.4 Pushing and Pulling

The three synchronization commands serve distinct purposes:

Command	Direction	What it does	Merges?
`git fetch`	Remote → Local	Downloads new objects and updates remote-tracking branches	No
`git pull`	Remote → Local	Runs `fetch`, then merges (or rebases) the tracking branch	Yes
`git push`	Local → Remote	Uploads new objects and updates remote branch pointers	No

Fetch is always safe - it only downloads data and updates bookmarks. It never touches your working directory or local branches.

Pull is fetch + merge (or fetch + rebase if configured). Because it merges, it can create merge commits or conflicts.

Push uploads your commits and asks the remote to update its branch pointer. If the remote has commits you do not have (someone else pushed first), the push is rejected - you must pull and integrate their changes first.

# Fetch all branches from origin
$ git fetch origin

# Pull (fetch + merge) the current tracking branch
$ git pull

# Push the current branch to its upstream
$ git push

# Push with a different remote branch name
$ git push origin local-branch:remote-branch

# Delete a remote branch
$ git push origin --delete old-branch

Configure fetch.prune

By default, remote-tracking branches for deleted remote branches linger forever in your local repo. Set git config --global fetch.prune true to automatically clean them up on every fetch or pull.

With local and remote operations covered, the next step is learning to navigate the history that these operations produce.

5. Inspecting and Searching History

Git’s immutable, content-addressable history is not just a safety net - it is a powerful investigative tool. Every commit, every line change, every contributor is recorded and searchable. This section covers the tools for navigating that history.

5.1 git log: Viewing History

git log is the primary tool for browsing commit history. Its power comes from filtering and formatting options:

# Visual overview: graph, one line per commit, all branches
$ git log --all --decorate --graph --oneline

# Last 5 commits
$ git log -5

# Custom format
$ git log --pretty=format:'%C(yellow)%h%C(reset) - %an [%C(green)%ar%C(reset)] %s'

# Commits affecting a specific file
$ git log --oneline -- path/to/file.py

# Search commit messages (extended regex, case-insensitive)
$ git log -E -i --grep 'fix.*login'

Two especially powerful search modes:

git log -S "term" (the “pickaxe”): finds commits that changed the number of occurrences of a literal string. If a function was added or removed, this finds the commit.
git log -G "regex": like -S but matches a regex pattern against the diff, finding commits where the patch itself matches.

git show displays a single commit’s metadata and diff in one view - the quickest way to understand what a commit did:

# Show the latest commit's diff
$ git show

# Show a specific commit
$ git show a1b2c3d

5.2 git blame: Per-Line Attribution

git blame annotates each line of a file with the commit that last modified it, who did it, and when. It is indispensable for understanding why code looks the way it does.

# Full blame
$ git blame path/to/file.py

# Restrict to a line range
$ git blame -L 50,75 path/to/file.py

# Detect code moved or copied from other files
$ git blame -C path/to/file.py

The -C flag is particularly powerful: if a block of code was copied from another file in the same commit, blame -C traces through the copy and attributes the lines to their true origin - not the commit that moved them.

5.3 git grep: Searching Across Time

git grep searches file contents within Git’s tracked universe. Unlike standalone tools like grep or ripgrep, it can search any committed tree - not just the current working directory. This makes it invaluable for answering questions like “when did we last use this deprecated API?” or “does this pattern exist in the v1.0 release?”

# Search current working directory with context
$ git grep -n -p --break --heading "pattern"

# Search a specific commit or tag
$ git grep "deprecated_function" v1.0

# Search across all branches
$ git grep "TODO" $(git branch -r)

The -p flag shows the function/method name containing each match - far more useful than bare line numbers when scanning results. Combined with --break and --heading, the output is grouped by file with clear visual separation.

5.4 Commit Ranges: .., …, and ^

When inspecting history, you often need to specify ranges of commits - for example, “what is on my branch that is not on main?” Git provides several notations for this:

Parent references:

HEAD^ - the parent of HEAD. For merge commits with multiple parents, HEAD^1 is the first parent (the branch you merged into), HEAD^2 is the second parent (the branch you merged from).
HEAD~N - the Nth ancestor following first-parent links. HEAD~3 means “go back 3 commits along the first-parent chain.” Equivalent to HEAD^^^.

Range operators:

Syntax	Meaning	Equivalent	Use case
`A..B`	Commits reachable from B but not A	`B ^A`	“What’s new on B since it diverged from A?”
`A...B`	Commits reachable from A or B but not both	-	“What’s unique to each branch?” (symmetric difference)
`A B ^C`	Reachable from A or B but not C	-	Multi-point exclusion

The most common use: git log main..feature shows the commits on your feature branch that are not yet on main - exactly what a pull request would contain.

These operators work with git diff too - not just git log:

# Diff between two commits
$ git diff HEAD~3..HEAD

# Diff between branches (what would the PR contain?)
$ git diff main..feature

# Diff with a specific file
$ git diff main..feature -- path/to/file.py

5.5 git bisect: Binary Search for Bugs

When a bug exists in the current commit but not in a commit from weeks ago, somewhere in between is the commit that introduced it. Searching linearly through hundreds of commits is impractical. git bisect performs a binary search, cutting the search space in half at each step.

Manual workflow:

$ git bisect start
$ git bisect bad                  # current commit has the bug
$ git bisect good v1.0            # this tag was known to be good
# Git checks out the midpoint commit
# You test it, then tell Git:
$ git bisect good                 # or 'git bisect bad'
# Repeat until Git identifies the first bad commit
$ git bisect reset                # return to where you started

Automated workflow - the real power of bisect:

# git bisect start <bad-commit> <good-commit>
$ git bisect start HEAD v1.0
$ git bisect run pytest tests/test_login.py

Git checks out each midpoint commit and runs your test script. Exit code 0 means “good,” non-zero means “bad.” Git narrows the range automatically until it finds the exact commit that introduced the failure. For a history of 1000 commits, this takes at most ~10 steps.

Bisect Is Logarithmic

Binary search through N commits takes at most $\lceil \log_2 N \rceil$ steps. For 1024 commits, that is 10 steps. Combined with an automated test script, git bisect run can pinpoint a regression in seconds - even across months of history. It is one of Git’s most underused yet powerful features.

Inspecting history is read-only - it does not change anything. The next section covers the tools that do change history: amending, rebasing, reverting, and resetting.

6. Rewriting History

Git’s immutable object model means that “rewriting history” is slightly misleading - you never change existing commits. Instead, you create new commits with different content or parentage, and move branch pointers to the new chain. The old commits still exist in the object store (and are visible via reflog) until garbage collection removes them.

This distinction matters: it means history rewriting is always recoverable, at least until git gc runs. The reflog is your safety net.

Commits Are Immutable

Everything committed in Git can almost always be recovered. Even commits on deleted branches or overwritten with --amend are still in the object store and visible via git reflog. The only data that is truly unrecoverable is uncommitted work - changes in your working directory or staging area that were never committed. This is why frequent, small commits are the safest workflow.

6.1 Amending Commits

The simplest form of history rewriting: fixing the most recent commit. This is useful when you forgot to stage a file, made a typo in the message, or want to add a small correction that belongs with the last commit.

# Change the commit message
$ git commit --amend -m "Better message"

# Add forgotten files to the last commit (keep the same message)
$ git add forgotten-file.py
$ git commit --amend --no-edit

Under the hood, --amend creates an entirely new commit object (new hash) with the same parent as the original. The original commit still exists but is no longer reachable from any branch.

Aborting a Commit

If you are writing a commit message in your editor and decide to cancel, exit with a non-zero status. In Vim: :cq (quit with error). Git receives the error and aborts the commit.

6.2 Rebase

Where merging joins two histories with a merge commit, rebasing replays one history on top of another - producing a linear sequence with no merge commit. This is the mechanism that makes fast-forward merges possible after histories have diverged.

$ git switch feature-branch
$ git rebase main

Here is what Git does under the hood:

Finds the common ancestor of feature-branch and main
Collects the commits unique to feature-branch (from the ancestor to the tip)
Saves the diffs introduced by each of those commits
Moves to the tip of main
Replays each diff as a new commit (new hash, same message) on top of main
Updates feature-branch to point at the newest replayed commit

The result: your feature branch’s commits now sit directly ahead of main, as if you had started your work from main’s current tip. A fast-forward merge is now possible.

graph LR
    A["A"] --> B["B"] --> C["C<br/><small>main</small>"]
    C --> D'["D'"] --> E'["E'<br/><small>feature (rebased)</small>"]

    style A fill:#7eb8da,stroke:#333,color:#000
    style B fill:#7eb8da,stroke:#333,color:#000
    style C fill:#7eb8da,stroke:#333,color:#000
    style D' fill:#90c695,stroke:#333,color:#000
    style E' fill:#90c695,stroke:#333,color:#000

Figure 5: Rebase replays feature commits on top of main. The original commits (D, E) are replaced by new commits (D’, E’) with the same diffs but new hashes and a new base.

If a conflict arises during replay, Git pauses and lets you resolve it for that specific commit, then continue with git rebase --continue. To abort entirely: git rebase --abort.

6.3 Interactive Rebase

Interactive rebase (git rebase -i) is the most powerful tool for crafting clean history. It presents a list of commits and lets you reorder, squash, edit, split, or drop any of them.

# Rebase the last 5 commits interactively
$ git rebase -i HEAD~5

# Rebase everything on the feature branch since it diverged from main
$ git rebase -i main

Git opens an editor with one line per commit (oldest first):

pick a1b2c3d Add login form
pick e4f5a6b Fix typo in login
pick 7c8d9e0 WIP: debugging
pick 1f2a3b4 Finalize login validation

Commands you can use:

Command	Effect
`pick`	Keep the commit as-is
`reword`	Keep the commit but edit its message
`squash`	Meld into the previous commit, combining messages
`fixup`	Meld into the previous commit, discarding this message
`edit`	Pause after applying, letting you amend or split the commit
`drop`	Remove the commit entirely
(reorder lines)	Change the order commits are applied

A typical pre-merge cleanup: squash the “Fix typo” and “WIP” commits into their parent, reword the final message to be descriptive, and produce a clean, logical history.

Never Rewrite Published History

Interactive rebase creates new commits with new hashes. If the original commits have been pushed to a shared branch (especially main), rewriting them forces everyone else to reconcile their divergent history - a painful and error-prone process. Only rebase commits that exist on your local feature branch and have not been shared. Once you merge to main, the history is permanent.

6.4 Cherry-Pick

git cherry-pick takes one or more commits from anywhere in the history and replays them on the current branch, creating new commits with the same diffs but different hashes and parents.

# Pick a single commit
$ git cherry-pick a1b2c3d

# Pick a range of commits
$ git cherry-pick main~3..main

Common use case: you accidentally committed to the wrong branch. Cherry-pick the commits onto the correct branch, then reset the original branch to remove them.

6.5 Squashing Commits

Beyond interactive rebase, there is a quick way to squash the last N commits:

# Squash the last 3 commits into one (keep changes staged)
$ git reset --soft HEAD~3
$ git commit -m "Implement login feature"

This works because --soft moves the branch pointer back 3 commits but leaves the index and working directory untouched. All the changes from those 3 commits are now staged, ready for a single new commit.

6.6 Reverting Published Commits

All the history-rewriting tools above - amend, rebase, reset - create new commits and move branch pointers. This is fine on local feature branches, but what if a bad commit is already on main and shared with the team? You cannot rewrite published history without forcing everyone to reconcile.

git revert solves this by creating a new commit that exactly undoes the changes from a previous commit. The original commit stays in the history - nothing is rewritten - but its effects are cancelled.

# Revert the most recent commit
$ git revert HEAD

# Revert a specific commit
$ git revert a1b2c3d

# Revert without auto-committing (stage the inverse, let me inspect first)
$ git revert --no-commit a1b2c3d

Under the hood, revert computes the inverse diff of the target commit and applies it as a new commit. If the inverse conflicts with subsequent changes, Git pauses for conflict resolution - just like a merge.

Revert vs Reset

git reset moves a branch pointer backward - it removes commits from the branch’s history. This is destructive to shared history. git revert moves forward - it adds a new commit that undoes an old one. Use reset on local/unpublished branches; use revert on shared/published branches. This is the safe complement to the “never rewrite published history” rule.

6.7 git reset: The Three-Level Undo

git reset is the Swiss army knife of undoing changes. It operates on up to three levels, controlled by its flags:

Flag	Moves branch pointer	Resets index (staging)	Resets working directory
`--soft`	Yes	No	No
`--mixed` (default)	Yes	Yes	No
`--hard`	Yes	Yes	Yes

Think of it as three successive stages:

--soft: move the branch pointer to the target commit. The index and working directory still reflect the old commit. All “removed” commits’ changes appear as staged. Use case: squash commits.
--mixed (default): move the branch pointer and reset the index to match. Changes appear as unstaged modifications. Use case: unstage files.
--hard: move everything - branch pointer, index, and working directory - to match the target commit. Uncommitted changes are permanently lost. Use case: discard everything and start clean.

Path-specific reset:

# Unstage a file (shorthand for git reset --mixed HEAD file)
$ git reset file.py
# Modern equivalent:
$ git restore --staged file.py

reset –hard Destroys Uncommitted Work

git reset --hard is the only common Git command that can cause permanent data loss. It overwrites your working directory and staging area. If you had uncommitted changes, they are gone - reflog cannot help because those changes were never committed. Use with extreme care, and commit or stash your work first.

checkout vs reset:

These two commands appear similar but differ in a critical way: git reset moves what the branch points to (the branch itself advances or retreats). git checkout (or git switch) moves HEAD - it changes which branch you are on without moving any branch pointer. Additionally, checkout is working-directory-safe (it does a trivial merge and refuses to overwrite uncommitted changes), whereas reset --hard overwrites everything.

graph LR
    subgraph soft ["--soft"]
        direction TB
        S1["Branch pointer ✓"] --> S2["Index ✗"] --> S3["Working dir ✗"]
    end
    subgraph mixed ["--mixed (default)"]
        direction TB
        M1["Branch pointer ✓"] --> M2["Index ✓"] --> M3["Working dir ✗"]
    end
    subgraph hard ["--hard ⚠️"]
        direction TB
        H1["Branch pointer ✓"] --> H2["Index ✓"] --> H3["Working dir ✓"]
    end

    soft ~~~ mixed ~~~ hard

    style S1 fill:#90c695,stroke:#333,color:#000
    style S2 fill:#ddd,stroke:#999,color:#666
    style S3 fill:#ddd,stroke:#999,color:#666
    style M1 fill:#90c695,stroke:#333,color:#000
    style M2 fill:#90c695,stroke:#333,color:#000
    style M3 fill:#ddd,stroke:#999,color:#666
    style H1 fill:#90c695,stroke:#333,color:#000
    style H2 fill:#90c695,stroke:#333,color:#000
    style H3 fill:#d9534f,stroke:#333,color:#fff

Figure 6: The three levels of git reset. --soft moves only the branch pointer. --mixed also resets the index. --hard resets everything - including the working directory, permanently discarding uncommitted changes.

6.8 The Reflog: Your Safety Net

The reflog (reference log) records every time a reference (HEAD, branch, etc.) is updated. It is a chronological diary of everything you have done in the repository - commits, checkouts, rebases, resets, merges - including intermediate states.

# Show the reflog for HEAD
$ git reflog

# Show the reflog for a specific branch
$ git reflog show main

# Recover a "lost" commit after a bad reset
$ git reflog
# Find the hash of the commit you want to recover
$ git reset --hard abc123

The reflog is local-only - it is not shared with remotes. It starts empty after a fresh clone. Entries expire after 90 days (reachable commits) or 30 days (unreachable) by default, after which git gc may remove the associated objects.

When in Doubt, Check the Reflog

If you think you have lost work - a bad rebase, an accidental reset --hard, a deleted branch - check git reflog before panicking. As long as the work was committed at some point, the reflog almost certainly has a reference to it.

History rewriting operates on commits that already exist. The next section zooms in on the step before committing - the staging area - and the tools for partial, selective operations.

7. The Staging Area and Partial Operations

The staging area (index) is one of Git’s most distinctive features - and one of its most misunderstood. Other version control systems commit directly from the working directory. Git inserts an intermediate step: the staging area, where you assemble the exact snapshot you want before committing it.

7.1 The Index File

The index is a binary file at .git/index that holds a sorted list of file paths, each with its blob hash, permissions, and timestamps. It represents the proposed next commit. When you run git add, you update the index; when you run git commit, Git builds trees from the index.

This design lets you do something powerful: commit a subset of your changes. You might have modified ten files, but only three are ready for this commit. Stage those three, commit, then continue working on the rest.

# Stage specific files
$ git add file1.py file2.py

# Stage all changes (tracked files only)
$ git add -u

# Stage everything including untracked files
$ git add -A

# See what's staged vs unstaged
$ git diff --cached     # staged changes (index vs last commit)
$ git diff              # unstaged changes (working dir vs index)

7.2 Interactive Staging (git add –patch)

When you have made multiple unrelated changes to the same file and want to split them into separate commits, git add --patch (or -p) lets you stage individual hunks - contiguous blocks of changes - interactively.

$ git add --patch file.py

Git presents each hunk and asks what to do:

Key	Action
`y`	Stage this hunk
`n`	Skip this hunk
`s`	Split into smaller hunks
`e`	Manually edit the hunk
`q`	Quit (don’t stage remaining hunks)

The same --patch flag works with other commands: git checkout -p, git restore -p, git stash -p - letting you selectively discard, restore, or stash parts of files.

This workflow is essential for maintaining clean, focused commits when you have been doing exploratory work across many areas of the codebase.

7.3 Stashing

Stashing saves your uncommitted changes (both staged and unstaged) onto a stack and restores your working directory to the last commit. It is useful when you need to switch branches for an urgent fix but are not ready to commit your in-progress work.

# Stash current changes
$ git stash

# Include untracked files
$ git stash -u

# List all stashes
$ git stash list

# Apply the most recent stash (keep it on the stack)
$ git stash apply

# Apply and remove the most recent stash
$ git stash pop

# Apply a specific stash
$ git stash apply stash@{2}

# Drop a specific stash
$ git stash drop stash@{0}

Stashes are portable across branches - you can stash on one branch and apply on another. If applying a stash would cause conflicts, Git reports them and you resolve them as you would a merge conflict.

git stash branch: Conflict-Free Recovery

If you are worried about conflicts when applying a stash, use git stash branch new-branch-name. This creates a new branch from the commit where you originally stashed, applies the stash, and drops it. Because you are replaying the stash on the exact commit it was created from, conflicts are impossible.

7.4 Removing and Renaming Files

Deleting or renaming a tracked file requires two steps: the filesystem operation and staging the change. Git provides commands that handle both in one step, keeping the index in sync with the working directory:

# Remove a file from the working directory and stage the deletion
$ git rm file.py

# Remove from tracking (staging area) but keep on disk
$ git rm --cached file.py

# Rename a file and stage the rename
$ git mv old-name.py new-name.py

git rm --cached is particularly useful when you accidentally tracked a file that should be in .gitignore - it stops tracking it without deleting it from your disk. After running it, add the file’s pattern to .gitignore and commit both changes.

Note that Git does not explicitly track renames. Internally, a rename is a delete + add. Git detects renames after the fact by comparing blob hashes between the old and new trees - if the content is identical (or nearly so), it infers a rename. This is why git log --follow file.py can track a file across renames.

8. Hooks: Automating Git Events

Git hooks are scripts that run automatically in response to specific Git events. They live in .git/hooks/ and can be written in any language (Bash, Python, Ruby, etc.) as long as the file is executable and has no extension.

Every new repository comes pre-populated with example hooks (files ending in .sample). To activate one, remove the .sample extension. To create a custom hook, place an executable script with the right name in .git/hooks/.

Client-Side Hooks Are Not Shared

Hooks in .git/hooks/ are not copied when a repository is cloned. This means client-side hooks must be set up independently in each clone. Teams typically manage this by storing hooks in a hooks/ directory within the project and using a setup script or Git’s core.hooksPath configuration to link them.

8.1 Client-Side Hooks

These run on your local machine in response to local operations:

Hook	Trigger	Typical use
`pre-commit`	Before commit message editor opens	Run linters, formatters, tests. Abort on non-zero exit.
`prepare-commit-msg`	After default message created, before editor	Pre-populate commit messages (e.g., branch name prefix)
`commit-msg`	After message is written	Validate commit message format
`post-commit`	After commit completes	Notifications, trigger CI
`pre-rebase`	Before rebase starts	Prevent rebase on certain branches
`post-merge`	After merge completes	Restore dependencies (`npm install`)

8.2 Server-Side Hooks

These run on the remote repository when receiving pushes:

Hook	Trigger	Typical use
`pre-receive`	Before any refs are updated	Access control, reject non-fast-forwards, validate code
`update`	Like `pre-receive`, but runs once per branch	Per-branch policies
`post-receive`	After all refs are updated	Deploy, notify, update dashboards

The pre-receive hook is the gatekeeper: if it exits non-zero, the entire push is rejected. This is how platforms like GitHub enforce branch protection rules.

9. Submodules and Advanced Features

As projects grow in scale and complexity, Git’s core model - objects, references, branches - remains the foundation, but additional tools become necessary. Submodules manage cross-repository dependencies. Worktrees let you work on multiple branches simultaneously without stashing. Sparse checkout and Git LFS address the performance challenges of large monorepos and binary files. And packfiles are the compression layer that keeps Git’s snapshot-based storage surprisingly compact.

9.1 Submodules

Submodules let you embed one Git repository inside another while keeping their histories completely separate. Each submodule is a full Git repository in its own directory, with its own .git, tracked at a specific commit by the parent project.

# Add a submodule
$ git submodule add https://github.com/lib/dependency.git libs/dependency

# Clone a project with submodules
$ git clone --recurse-submodules https://github.com/user/project.git

# Or initialize submodules after cloning
$ git submodule update --init --recursive

# Update a submodule to its latest remote commit
$ git submodule update --remote libs/dependency

Adding a submodule creates two things: a directory containing the cloned repo, and a .gitmodules file mapping submodule paths to URLs. Both must be committed to track the submodule.

The parent repository tracks each submodule at a specific commit hash - not a branch. To update the submodule, you explicitly pull new commits and then commit the updated reference in the parent.

9.2 Worktrees

Worktrees let you check out multiple branches of the same repository simultaneously, each in its own directory, sharing a single .git database. This avoids the need to stash, commit, or clone when you need to work on two branches at once.

# Create a new worktree for a hotfix branch
$ git worktree add ../hotfix-branch hotfix/urgent-fix

# List all worktrees
$ git worktree list

# Remove a worktree when done
$ git worktree remove ../hotfix-branch

Worktrees are ideal for:

Parallel development: review a PR in one worktree while continuing feature work in another
Long-running builds: keep building one branch while developing on another
Bisecting: run git bisect in a separate worktree without disrupting your current work

Unlike cloning the repo again, worktrees share the object store - no additional disk space for the history, and objects created in one worktree are immediately visible to others.

9.3 Sparse Checkout

For large monorepos where you only need a subset of the files, sparse checkout lets you check out just the directories you care about, significantly reducing disk usage and git status overhead.

# Enable sparse checkout
$ git sparse-checkout init --cone

# Check out only specific directories
$ git sparse-checkout set src/my-service tests/my-service

# Add more directories later
$ git sparse-checkout add docs/

# Disable (check out everything again)
$ git sparse-checkout disable

The --cone mode (recommended) restricts patterns to directory-level matching, which is much faster than arbitrary gitignore-style patterns. Sparse checkout works well in combination with shallow clones for CI/CD pipelines that only need to build one service in a monorepo.

9.4 Git LFS (Large File Storage)

Git’s content-addressable model stores every version of every file. For large binary files (datasets, images, videos, model weights), this causes repositories to balloon in size because binary diffs are inefficient. Git LFS solves this by storing large files on a separate server and replacing them with lightweight pointer files in the repository.

# Install and initialize LFS
$ git lfs install

# Track large file patterns
$ git lfs track "*.pth"
$ git lfs track "data/*.parquet"

# This creates/updates .gitattributes - commit it
$ git add .gitattributes
$ git commit -m "Track model weights and datasets with LFS"

After setup, git add, commit, push, and pull work transparently - LFS intercepts operations on tracked files and handles the upload/download to the LFS server. The repository itself only stores small pointer files, keeping clones fast.

When to Use LFS

Use LFS for files that are (a) large (> 1 MB), (b) binary (don’t benefit from Git’s delta compression), and (c) versioned (you need history). If you don’t need history for large files, consider .gitignore + external storage instead. Common LFS candidates: trained model weights, compiled binaries, large images, video files, and compressed datasets.

9.5 Packfiles

As a repository accumulates thousands of loose objects, Git periodically combines them into packfiles - single compressed files that store multiple objects with delta compression. A pack index file provides fast lookups by hash.

Packing happens automatically when there are too many loose objects, when you run git gc (garbage collection), or when pushing to a remote. You rarely need to think about packfiles, but understanding them explains why Git repositories are surprisingly compact despite storing full snapshots: packfiles use delta compression between similar objects, similar to how video codecs store keyframes and deltas.

# Manually trigger garbage collection and packing
$ git gc

# See pack statistics
$ git count-objects -v

10. The Professional Workflow

Understanding Git’s internals is necessary but not sufficient. A team of developers who all understand the object model but have no shared workflow will still produce a tangled history. This section describes a feature-branch workflow that produces clean, linear, reviewable history - the same workflow used by most teams.

10.1 The Feature Branch Model

The core rule: never commit directly to the main branch. Every change - no matter how small - starts on a feature branch.

# Start a new feature
$ git switch -c feature/add-search

# Make small, focused commits
$ git add search.py
$ git commit -m "Add search index builder"

# Push the branch and set up tracking
$ git push -u origin feature/add-search

This ensures that main always contains reviewed, tested, production-ready code. Feature branches are disposable workspaces where experimentation, refactoring, and work-in-progress commits are welcome - they will be cleaned up before merging.

10.2 Pull Requests: Context Is Everything

Once your feature branch is pushed, open a pull request. The PR is not just a merge request - it is a communication artifact. A well-crafted PR description answers:

Why is this change needed? What problem does it solve?
What approach did you take? Were alternatives considered?
What assumptions were made? What are the risks?
How should reviewers test or verify the change?

For large features that cannot be broken into small PRs, use GitHub’s task lists to show progress so reviewers know not to do in-depth reviews until the feature is complete.

After receiving code review feedback, push additional commits to the feature branch - they are automatically included in the PR. Do not squash during review, as that makes it harder for reviewers to see what changed between rounds. Save the cleanup for the final step.

10.3 The Merge Sequence: Rebase, Squash, Fast-Forward

Before merging, prepare a clean history using this sequence:

Step 1 - Update main and rebase:

# Fetch latest main and rebase your branch on top
$ git switch main && git pull && git switch -
$ git rebase main

This ensures your feature branch’s commits sit ahead of main, making a fast-forward merge possible. If rebase produces conflicts, resolve them commit-by-commit as Git replays each one.

Step 2 - Interactive rebase to clean up:

$ git rebase -i main

Squash WIP and fixup commits, reword messages to be descriptive, and ensure each remaining commit is a logical, self-contained unit. This is the time to craft commit messages that capture the “why” - they will be permanent history.

Step 3 - Force push the cleaned-up branch:

$ git push --force-with-lease

Use --force-with-lease instead of --force - it refuses to push if the remote has commits you have not seen, protecting against accidentally overwriting a colleague’s work.

Step 4 - Fast-forward merge:

$ git switch main
$ git merge feature/add-search --ff-only
$ git push

Step 5 - Clean up:

$ git branch -d feature/add-search           # delete local branch
$ git push origin --delete feature/add-search  # delete remote branch

GitHub auto-closes the PR when it detects that main contains the branch’s commits.

graph LR
    A["A"] --> B["B"] --> C["C<br/><small>main</small>"]
    C --> D["D'"] --> E["E'<br/><small>feature (rebased)</small><br/><small>main (after ff-merge)</small>"]

    style A fill:#7eb8da,stroke:#333,color:#000
    style B fill:#7eb8da,stroke:#333,color:#000
    style C fill:#7eb8da,stroke:#333,color:#000
    style D fill:#90c695,stroke:#333,color:#000
    style E fill:#90c695,stroke:#333,color:#000

Figure 4: The professional merge sequence. Feature commits are rebased onto main, cleaned up with interactive rebase, and fast-forward merged - producing a linear history with no merge commits.

Why Fast-Forward Only?

With fast-forward merges, no merge commit is created on main. Every commit in the history was authored and reviewed on a feature branch before arriving on main. There are no “surprise” commits from Git’s merge algorithm. The history is linear, readable, and bisectable. This is one of the main benefits of the rebase-then-merge workflow: the permanent history on main consists entirely of curated, reviewed commits.

10.4 Commit Message Craft

Commit messages are documentation that lives forever in the history. A good message explains why a change was made, not what was changed (the diff shows the “what”). Follow these conventions:

Format:

Short summary (50 chars or less)

Longer explanation wrapping at 72 characters. Explain the motivation
for the change, any trade-offs made, and anything a future reader
would need to understand the decision.

Refs: #123

Rules:

Separate subject from body with a blank line. Many tools (GitHub, git log --oneline, email patches) use only the first line.
Limit the subject to 50 characters. Forces concision.
Capitalize the subject, no trailing period.
Use imperative mood in the subject: “Add search feature” not “Added search feature” - it reads like a command, matching git merge and git revert output.
Wrap the body at 72 characters. Terminals, git log, and email all look better with wrapped text.
Explain why, not what. The diff shows what changed; the message should explain the decision.

Sign Your Commits

For teams and open source projects, signed commits provide cryptographic proof of authorship. Git supports both GPG and SSH signing:

$ git config --global commit.gpgsign true
$ git config --global gpg.format ssh
$ git config --global user.signingkey ~/.ssh/id_ed25519.pub

GitHub displays a “Verified” badge on signed commits.

11. Configuration

Git’s configuration is read automatically before every command - no reloading needed. Configuration files are read in order, with later files overriding earlier ones:

Level	File	Flag	Scope
System	`/etc/gitconfig`	`--system`	All users on the machine
User	`~/.gitconfig`	`--global`	All repositories for the current user
Repository	`.git/config`	`--local`	This repository only

11.1 Essential Settings

These three settings significantly improve the default Git experience:

# Push only the current branch to its tracked upstream
$ git config --global push.default upstream

# Reject non-fast-forward merges (enforce linear history)
$ git config --global merge.ff only

# Auto-clean stale remote-tracking branches on fetch/pull
$ git config --global fetch.prune true

11.2 Aliases and Subcommands

Git aliases live in the [alias] section of your gitconfig. Single-command aliases are straightforward; multi-command aliases use a ! prefix to invoke the shell:

# Simple alias
$ git config --global alias.co checkout
$ git config --global alias.st status

# Multi-command alias (note the ! prefix)
$ git config --global alias.mup '!git checkout main && git pull && git checkout -'

# Visual log
$ git config --global alias.graph 'log --all --decorate --graph --oneline'

Git subcommands are even more powerful: any executable on your $PATH named git-<name> becomes callable as git <name>. This lets you write complex tooling in any language:

#!/bin/bash
# Save as 'git-cm' on your $PATH, make executable
# Usage: git cm "message" OR git cm (opens editor)
if [[ $# > 0 ]]; then
    git commit -m "$@"
else
    git commit -v
fi

11.3 .gitignore Patterns

The .gitignore file tells Git which files to ignore. It uses glob patterns and applies recursively from the directory where it is placed. You can have .gitignore files in subdirectories for directory-specific rules.

Pattern rules:

Pattern	Effect	Example
`*.log`	Ignore all files ending in `.log`, recursively	Build logs, app logs
`build/`	Ignore any directory named `build`	Compiled output
`/TODO`	Ignore `TODO` in the current directory only (no recursion)	Root-level notes
`doc/*/.pdf`	Ignore PDFs in `doc/` and all subdirectories	Generated docs
`!important.log`	Do not ignore this file (negation)	Exception to `*.log`
`#`	Comment line	-

Global Gitignore

For files that are personal to your setup (editor configs, OS files like .DS_Store), use a global gitignore rather than polluting project-level .gitignore:

$ git config --global core.excludesfile ~/.gitignore_global

12. Conclusion

In this post, we built Git from the inside out - starting from the content-addressable object model, building through references and branches, exploring history inspection and rewriting tools, and arriving at the professional workflow that ties everything together.

Git’s power comes from a coherent set of design choices - each solving a specific problem with a specific mechanism.

Key Takeaways

Git is a content-addressable filesystem, not a diff tracker. Every commit stores a complete snapshot. Unchanged files are deduplicated by hash. Diffs are computed on demand, not stored. This design makes branching and merging cheap pointer operations rather than expensive file copies.
Three objects and three areas - that’s the whole model. Blobs (file contents), trees (directories), and commits (snapshots with context) are the only object types. The working directory, staging area (index), and object store are the only three areas. Every Git command is an operation on these six things.
Branches are 41-byte files, not copies. A branch is a mutable pointer to a commit. Creating a branch creates a file. Merging moves or creates pointers. Understanding this is what makes branching feel lightweight rather than scary.
reset, checkout, and rebase are pointer operations. reset moves a branch pointer (and optionally resets the index and working directory). checkout moves HEAD. rebase replays commits with new parents. None of them destroy the original commits - the reflog keeps them recoverable.
Clean history is a professional obligation. The rebase-then-fast-forward workflow produces linear, reviewable, bisectable history. Interactive rebase before merging turns messy development into clean permanent records. Commit messages should explain why, not what.
Committed data is almost always recoverable; uncommitted data is not. The reflog tracks every reference update for 90 days. But changes that were never committed - unsaved edits, unstaged modifications - are gone when overwritten. Commit early, commit often.

Git’s Design Is Elegant

Strip away the 150+ commands and Git is remarkably simple: a content-addressable object store (blobs, trees, commits), mutable pointers (branches, tags), and a single HEAD. Every feature - branching, merging, rebasing, bisecting, stashing - is built on this foundation. Understanding the foundation makes the entire surface area intuitive.

References & Resources

Pro Git by Scott Chacon and Ben Straub - the definitive, freely available Git book. Chapters 10 (Git Internals) and 7 (Git Tools) are particularly relevant to this post.
Git Ready - practical how-to pages organized by difficulty: “learn a little, learn a lot.”
Git Internals PDF - a deep dive into the object model with more detail than we could cover here.
Thoughtbot Git Guides - opinionated workflow guides from a well-respected consultancy.
GitHub CLI (gh) - interact with GitHub entirely from the command line: PRs, issues, actions, releases.
5 Rules for a Good Git Commit Message - the widely-cited guide to commit message style.
Deliberate Git by Stephen Ball - a talk on crafting intentional, meaningful commit history.
Code Review Culture by Derek Prior - implementing code review as a team practice.
Fugitive - the premier Vim plugin for Git integration.
- Fugitive Vimcasts Series - five-part screencast series.
vim-conflicted - optimized merge conflict resolution in Vim.
Pro tip: add autocmd Filetype gitcommit setlocal spell textwidth=72 to your Vim config for automatic spell-checking and line wrapping in commit messages.