Storage
First Step of creating a project is creating a new git repository.
A git "repository" or "repo" represents a single project. Typically have 1 repository per project unless working in a monorepo.
A repo is essentially a directory that contains your project files.
There is also a hidden .git
folder where git stores all of its internal tracking and versioning for a project.
Creating a Repository
Lets make a demo project called demo
mkdir demo
cd demo
Once in the project directory, we can create a new git repository with the following command:
git init
This will create a new .git
folder in the project directory.
You can verify the contents by running ls
on the directory.
ls -a .git
Status
A file can be in one of the several states of a git repository, a few are...
untracked
- A file that is not being tracked by gitstaged
- A file that is ready to be committedcommitted
- A file that has been saved to the git repository
The git status
command will show you the current state of your files in the repository.
Untracked
Lets create a new file in the project called contents.md
and prefill it with the content
# contents
Save the File and then Run
git status
$ git status
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
contents.md
nothing added to commit but untracked files present (use "git add" to track)
The git status
command shows that the file contents.md
is untracked.
That means git doesnt know anything about it (in terms of version control).
If you delete the file, it will be gone forever. Removing untracked files is the worst!!!
Staging/Staged
Lets add the file and make a commit message.
The git add
command stages the file for commit.
git add contents.md
Check the status again
git status
On branch master
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: contents.md
The git commit
command will save the file to the repository.
git commit -m "Initial Commit"
Check the status again
git status
On branch master
nothing to commit, working tree clean
Git Hashes
If we both have the same contents in our repo, we still have different hashes.
The code is the same but the hash includes additional info.
- The commit message
- The authors name and email
- the date and time
- Parent (previous) commit hashes
It uses the SHA-1
hashing algorithm to create the hash.
Thats why the hash is unique to the commit and reffered to as SHAs
To check hashes you can review the git log
git log -n 10
commit c8429a0849a2c4d01ff42f1372d8d3020c812dbc
Author: brock <[email protected]>
Date: Mon Jan 27 12:15:21 2025 -0700
Initial Commit
Storage
Git is made up of objects stored in the .git/objects
directory.
A commit
is just a type of object
.
The object
is stored in a file with the first 2 characters of the hash as the directory
and the rest as the file name.
This is to prevent inode busting
which is a limitation of the file system. It happens when a folder/directory
gets too big.
Use the git log
command to find the hash and then use..
ls -al .git/objects
to list the contents of.git/objects
directory.- Look for a directory that matches the first 2 letters of your commit hash.
ls -al .git/objects/<first 2 letters of hash>
to list the contents of the directory.
ls -al .git/objects/
total 8
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:14 ./
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:36 ../
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:13 5b/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:13 a6/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:14 a7/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:13 ef/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:14 f3/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:14 f8/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:06 info/
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:06 pack/
ls -al .git/objects/c8
total 5
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:13 ./
drwxr-xr-x 1 brock 1049089 0 Jan 27 11:14 ../
-r--r--r-- 1 brock 1049089 136 Jan 27 11:13 429a0849a2c4d01ff42f1372d8d3020c812dbc
Reading the Object File
You can read the contents of the object file with the cat
command.
```txt
brock@WPWHPW78 MINGW64 ~/git (master)
$ cat .git/objects/c8/429a0849a2c4d01ff42f1372d8d3020c812dbc
x□□M
□□□□-z□o□□□j-z□ҙ□F□□)kGSDO.-li4□□□gG□\□□0+z□□:□□Z□p□□□□□□□
□xy
g□uH□□A□1 j;#□#□□□o□□uۊz□□˩TBm
Git compresses the contents of the file to raw bytes and stores it in the object file.
There is a builtin Plumbing
command to read the contents of the object file.
The git cat-file
command.
git cat-file -p c8429a0849a2c4d01ff42f1372d8d3020c812dbc
tree 5b21d4f16a4b07a6cde5a3242187f6a5a68b060f
author brock <[email protected]> 1738005321 -0700
committer brock <[email protected]> 1738005321 -0700
Initial Commit
Lets check what the tree
is
git cat-file -p 5b21d4f16a4b07a6cde5a3242187f6a5a68b060f
100644 blob ef7e93fc61a91deecaa551c4707e4c3049af42c9 contents.md
Now the blob
git cat-file -p ef7e93fc61a91deecaa551c4707e4c3049af42c9
# contents
tree
is way of storing a directory.blob
is a way of storing a file.
How it Works
Lets add some contents to our file contents.md
and commit it.
# contents
This is the contents
- git
- repositories
and a new file called extra.md
extra
git add contents.md
git commit -m "Added more contents"
Now lets check the log
git log -n 10
commit 47eca7ecdf05992a9dc3b2847d656b0c7e9b9806 (HEAD -> master)
Author: brock <[email protected]>
Date: Mon Jan 27 12:15:48 2025 -0700
Added more contents
commit c8429a0849a2c4d01ff42f1372d8d3020c812dbc
Author: brock <[email protected]>
Date: Mon Jan 27 12:15:21 2025 -0700
Initial Commit
Lets check the contents of the new commit
git cat-file -p 47eca7ecdf05992a9dc3b2847d656b0c7e9b9806
tree d654c783994e17f9fc1ab7824857f8e7bbbb6b7d
parent c8429a0849a2c4d01ff42f1372d8d3020c812dbc
author brock <[email protected]> 1738005348 -0700
committer brock <[email protected]> 1738005348 -0700
Added more contents
git cat-file -p d654c783994e17f9fc1ab7824857f8e7bbbb6b7d
100644 blob 28c6fb387f1737bf1e2871e13445e47ec46b40dc contents.md
100644 blob 0f2287157f7cb0dd40498c7a92f74b6975fa2d57 extra.md
Now there are 2 blobs
and the blob has a different hash than before for contents.md
git cat-file -p 28c6fb387f1737bf1e2871e13445e47ec46b40dc
# contents
This is the contents
- git
- repositories
git cat-file -p 0f2287157f7cb0dd40498c7a92f74b6975fa2d57
extra
Now also notice the parent
in the contents of the commit. it stores the reference to the parent commit.
it also has a pointer to the tree
and blob
of that commit as well. This makes git efficient.
Now Lets add a new directory and file to the project.
mkdir newdir
echo "# newdir" > newdir/newdir.md
git add newdir
git commit -m "Added new directory"
git log -n 10
commit 97e9e24c36380caea90fddb98119c3c34ca62120 (HEAD -> master)
Author: brock <[email protected]>
Date: Mon Jan 27 12:22:32 2025 -0700
Added new directory
commit 47eca7ecdf05992a9dc3b2847d656b0c7e9b9806
Author: brock <[email protected]>
Date: Mon Jan 27 12:15:48 2025 -0700
Added more contents
commit c8429a0849a2c4d01ff42f1372d8d3020c812dbc
Author: brock <[email protected]>
Date: Mon Jan 27 12:15:21 2025 -0700
Initial Commit
git cat-file -p 97e9e24c36380caea90fddb98119c3c34ca62120
tree f7bc3073a09a103161931ab27698348490caa1eb
parent 47eca7ecdf05992a9dc3b2847d656b0c7e9b9806
author brock <[email protected]> 1738005752 -0700
committer brock <[email protected]> 1738005752 -0700
Added new directory
git cat-file -p f7bc3073a09a103161931ab27698348490caa1eb
100644 blob 28c6fb387f1737bf1e2871e13445e47ec46b40dc contents.md
100644 blob 0f2287157f7cb0dd40498c7a92f74b6975fa2d57 extra.md
040000 tree 42a788dc03a7461a6f24e00c8507758717efb859 newdir
Since the only new thing is the newdir
, notice the hash of contents.md
and extra.md
are the same as the previous commit.
It walks the tree and creates the files referenced at each of the hashes.