Skip to main content

Storage

First Step of creating a project is creating a new git repository.

A git "repository" or "repo" represents a single project. Typically have 1 repository per project unless working in a monorepo.

A repo is essentially a directory that contains your project files. There is also a hidden .git folder where git stores all of its internal tracking and versioning for a project.

Creating a Repository

Lets make a demo project called demo

mkdir demo
cd demo

Once in the project directory, we can create a new git repository with the following command:

git init

This will create a new .git folder in the project directory.

You can verify the contents by running ls on the directory.

ls -a .git

Status

A file can be in one of the several states of a git repository, a few are...

  • untracked - A file that is not being tracked by git
  • staged - A file that is ready to be committed
  • committed - A file that has been saved to the git repository

The git status command will show you the current state of your files in the repository.

Untracked

Lets create a new file in the project called contents.md and prefill it with the content

# contents

Save the File and then Run

git status
$ git status
On branch master

No commits yet

Untracked files:
(use "git add <file>..." to include in what will be committed)
contents.md

nothing added to commit but untracked files present (use "git add" to track)

The git status command shows that the file contents.md is untracked. That means git doesnt know anything about it (in terms of version control).

danger

If you delete the file, it will be gone forever. Removing untracked files is the worst!!!

Staging/Staged

Lets add the file and make a commit message.

The git add command stages the file for commit.

git add contents.md

Check the status again

git status
On branch master
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: contents.md

The git commit command will save the file to the repository.

git commit -m "Initial Commit"

Check the status again

git status
On branch master
nothing to commit, working tree clean

Git Hashes

If we both have the same contents in our repo, we still have different hashes.

The code is the same but the hash includes additional info.

  • The commit message
  • The authors name and email
  • the date and time
  • Parent (previous) commit hashes

It uses the SHA-1 hashing algorithm to create the hash. Thats why the hash is unique to the commit and reffered to as SHAs

To check hashes you can review the git log

git log -n 10
commit c8429a0849a2c4d01ff42f1372d8d3020c812dbc
Author: brock <[email protected]>
Date: Mon Jan 27 12:15:21 2025 -0700

Initial Commit

Storage

Git is made up of objects stored in the .git/objects directory.

A commit is just a type of object. The object is stored in a file with the first 2 characters of the hash as the directory and the rest as the file name.

info

This is to prevent inode busting which is a limitation of the file system. It happens when a folder/directory gets too big.

Use the git log command to find the hash and then use..

  • ls -al .git/objects to list the contents of .git/objects directory.
    • Look for a directory that matches the first 2 letters of your commit hash.
  • ls -al .git/objects/<first 2 letters of hash> to list the contents of the directory.
ls -al .git/objects/
total 8
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:14 ./
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:36 ../
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:13 5b/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:13 a6/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:14 a7/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:13 ef/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:14 f3/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:14 f8/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:06 info/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:06 pack/
ls -al .git/objects/c8

total 5
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:13 ./
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:14 ../
-r--r--r-- 1 brock 1049089 136 Jan 27 11:13 429a0849a2c4d01ff42f1372d8d3020c812dbc

Reading the Object File

You can read the contents of the object file with the cat command.

```txt 
brock@WPWHPW78 MINGW64 ~/git (master)
$ cat .git/objects/c8/429a0849a2c4d01ff42f1372d8d3020c812dbc
x□□M
□□□□-z□o□□□j-z□ҙ□F□□)kGSDO.-li4□□□gG□\□□0+z□□:□□Z□p□□□□□□□
□xy
g□uH□□A□1 j;#□#□□□o□□uۊz□□˩TBm

Git compresses the contents of the file to raw bytes and stores it in the object file.

There is a builtin Plumbing command to read the contents of the object file. The git cat-file command.

git cat-file -p c8429a0849a2c4d01ff42f1372d8d3020c812dbc
tree 5b21d4f16a4b07a6cde5a3242187f6a5a68b060f
author brock <[email protected]> 1738005321 -0700
committer brock <[email protected]> 1738005321 -0700

Initial Commit

Lets check what the tree is

git cat-file -p 5b21d4f16a4b07a6cde5a3242187f6a5a68b060f
100644 blob ef7e93fc61a91deecaa551c4707e4c3049af42c9    contents.md

Now the blob

git cat-file -p ef7e93fc61a91deecaa551c4707e4c3049af42c9
# contents
  • tree is way of storing a directory.
  • blob is a way of storing a file.

How it Works

Lets add some contents to our file contents.md and commit it.

# contents
This is the contents
- git
- repositories

and a new file called extra.md

extra 
git add contents.md
git commit -m "Added more contents"

Now lets check the log

git log -n 10
commit 47eca7ecdf05992a9dc3b2847d656b0c7e9b9806 (HEAD -> master)
Author: brock <[email protected]>
Date: Mon Jan 27 12:15:48 2025 -0700

Added more contents

commit c8429a0849a2c4d01ff42f1372d8d3020c812dbc
Author: brock <[email protected]>
Date: Mon Jan 27 12:15:21 2025 -0700

Initial Commit

Lets check the contents of the new commit

git cat-file -p 47eca7ecdf05992a9dc3b2847d656b0c7e9b9806
tree d654c783994e17f9fc1ab7824857f8e7bbbb6b7d
parent c8429a0849a2c4d01ff42f1372d8d3020c812dbc
author brock <[email protected]> 1738005348 -0700
committer brock <[email protected]> 1738005348 -0700

Added more contents
git cat-file -p d654c783994e17f9fc1ab7824857f8e7bbbb6b7d
100644 blob 28c6fb387f1737bf1e2871e13445e47ec46b40dc    contents.md
100644 blob 0f2287157f7cb0dd40498c7a92f74b6975fa2d57 extra.md

Now there are 2 blobs and the blob has a different hash than before for contents.md

git cat-file -p 28c6fb387f1737bf1e2871e13445e47ec46b40dc
# contents
This is the contents
- git
- repositories
git cat-file -p 0f2287157f7cb0dd40498c7a92f74b6975fa2d57
extra

Now also notice the parent in the contents of the commit. it stores the reference to the parent commit. it also has a pointer to the tree and blob of that commit as well. This makes git efficient.

Now Lets add a new directory and file to the project.

mkdir newdir
echo "# newdir" > newdir/newdir.md
git add newdir
git commit -m "Added new directory"
git log -n 10
commit 97e9e24c36380caea90fddb98119c3c34ca62120 (HEAD -> master)
Author: brock <[email protected]>
Date: Mon Jan 27 12:22:32 2025 -0700

Added new directory

commit 47eca7ecdf05992a9dc3b2847d656b0c7e9b9806
Author: brock <[email protected]>
Date: Mon Jan 27 12:15:48 2025 -0700

Added more contents

commit c8429a0849a2c4d01ff42f1372d8d3020c812dbc
Author: brock <[email protected]>
Date: Mon Jan 27 12:15:21 2025 -0700

Initial Commit
git cat-file -p 97e9e24c36380caea90fddb98119c3c34ca62120
tree f7bc3073a09a103161931ab27698348490caa1eb
parent 47eca7ecdf05992a9dc3b2847d656b0c7e9b9806
author brock <[email protected]> 1738005752 -0700
committer brock <[email protected]> 1738005752 -0700

Added new directory
git cat-file -p f7bc3073a09a103161931ab27698348490caa1eb
100644 blob 28c6fb387f1737bf1e2871e13445e47ec46b40dc    contents.md
100644 blob 0f2287157f7cb0dd40498c7a92f74b6975fa2d57 extra.md
040000 tree 42a788dc03a7461a6f24e00c8507758717efb859 newdir

Since the only new thing is the newdir, notice the hash of contents.md and extra.md are the same as the previous commit.

It walks the tree and creates the files referenced at each of the hashes.