A Beginner's Guide to Git

git is currently one of the most popular DVCS (Distributed Version Control Systems) in use. Created by Linus Torvalds, known as the "father of Linux," in 2005, git not only offers powerful features but also embodies a spirit of resilience and independence. Its history reflects the humor and defiance of master developers and open-source advocates, something that has always impressed me deeply.

Below is a quote from the Chinese version of Wikipedia that describes the backstory (original source here; English version here):

In 2002, Linus Torvalds decided to use BitKeeper as the main version control system for maintaining Linux kernel code. Since BitKeeper was proprietary software, this decision was long criticized within the community. Particularly, Richard Stallman and members of the Free Software Foundation argued that an open-source tool should be used for the Linux kernel's version control. Linus considered using existing solutions like Monotone, but these tools had various issues, particularly with performance. Other systems like CVS were dismissed by Linus for their architecture.

In 2005, Andrew Tridgell wrote a simple program that could connect to BitKeeper repositories. Larry McVoy, the owner of BitKeeper, believed that Tridgell had reverse-engineered the protocol used by BitKeeper and decided to withdraw the free usage rights of BitKeeper. Negotiations between the Linux kernel development team and BitMover failed to resolve the differences. As a result, Linus decided to create his own version control system to replace BitKeeper, and in just ten days, he developed the first version of git.

git has since become more than just a version control tool for programmers; it's widely adopted and now serves as an essential collaboration tool for many projects. Whether it's gathering data or writing articles, git offers a highly valuable skill for both work and daily life.

This article follows my usual tutorial style, aiming to provide a quick introduction. For those who want to dive deeper, I highly recommend referring to the official documentation. Here are some useful resources:

Popular git hosting platforms:

Now, let's dive into the content. I hope this guide will be helpful, and feel free to email me if you spot any errors.

Table of Contents

Step 1: Setting Up Your Identity

The first thing to do when using git is to set up your personal information, specifically your name and email address. This helps ensure that anyone reviewing your project can reach out if they encounter issues, and it also makes it easier to track contributions within a team. Additionally, it safeguards your work by properly attributing it to you. It's especially satisfying to see your name on open-source projects.

More importantly, your identity information isn't easy to change once it's set (especially when contributing to large or external projects). I didn't realize this when I first started using git, so I didn't configure it. As a result, my git logs showed only my computer's name (automatically set by the system), which I found quite frustrating. Here's how to set your identity:

$ git config --global user.name "your name"
$ git config --global user.email "example@example.com"

You can also configure different information for specific projects using:

$ git config --local

For more options, you can refer to the manual:

$ man git config

To view your current settings, use:

$ git config -l

Now that your identity is configured, let's move on to actually using git.

Step 2: Creating or Cloning a Git Repo

You can work with git within a "git repository" (referred to as "repo" here). There are two ways to do this:

Creating a Local Repo

Creating a local repo is straightforward. First, navigate to the directory where you want to create the repo:

$ cd /my/git/repo

Then, run:

$ git init

This creates a .git subdirectory containing all the necessary information. You don't need to worry about what's inside.

Cloning from the Web

If you want to clone an existing repo from the web, start by navigating to the directory where you want to store the repo, and then run:

$ git clone <url>

Either method will set up a git environment, allowing you to begin using git.

Step 3: Recording Changes

Now we get to the core of git: recording changes. As a version control system (VCS), tracking changes is its main purpose. In git, files have two states: tracked and untracked. Tracked files are under git's control, while untracked files are not. The following diagram from the official git site clearly illustrates how a file's status changes within a git repo:

The lifecycle of the status of your files (source from git-scm.com)

To check the current status of your files, use:

$ git status

This command will show the status of your files. For a deeper understanding of what these statuses mean, I recommend checking out the official guide. My goal here is simply to help you get started quickly.

Back to the example: let's say we create a new file named README in our repo:

$ touch README

This file is now untracked, so if we want git to track it, we need to run:

$ git add README

This moves the file into the staged phase. To finalize the change, commit it with:

$ git commit

This command opens your default editor so you can write a commit message. Once done, the file moves to the unmodified state.

From this point on, whenever you modify a file, you can repeat these steps to track changes. Typically, you'll follow this pattern:

$ git add <file>
$ git commit

In most cases, you can simplify the process with:

$ git add -A  # Stages all changes (equivalent to --all)
$ git commit -m "one line commit"  # Creates a simple, one-line commit

These are the commands you'll use most often as a typical git user.

Step 4: Viewing and Editing History

After making several commits, you'll likely want to review the history of your changes. The git status command only shows the current state, so to view past commits, use:

$ git log

This command displays a list of all commits. If you pay attention, you'll notice each commit has a unique hash code, which acts as its identifier. To revert to a previous commit, you'll need this hash. Use the following command to reset your repo to a specific commit:

$ git reset <hash>

This command is gentle—it primarily changes git's own internal records. For instance, if a file was untracked at the target commit, it will become untracked again after the reset, but the file itself remains. For a more forceful reset, I often use:

$ git reset --hard <hash>

This command will completely erase files that were untracked at the target commit. Be cautious, as this action is irreversible.

Step 5: Managing Remote Repositories

When people think of git, they often think of GitHub. As mentioned earlier, GitHub is the world's most popular git hosting platform. If you're working with git, you'll likely need to use such services, making it crucial to understand remote repo management.

Connecting to an Existing Remote Repo

Let's say you have an existing repo on GitHub. First, initialize a local repo (it can be empty) and run:

$ git remote add origin <URL>  # "origin" is a default alias that you can change
$ git remote -v  # Verify the remote connection

Now your local repo is linked to the remote one. Next, pull the content from the remote repo:

$ git pull origin master  # The first part is the remote name, the second is the branch name

This command fetches the content from the remote repo. You might wonder how this differs from clone. In short, clone just copies the repo without establishing a link, while the remote command creates a permanent connection between your local and remote repos.

Once you've made changes, you can push them back to the remote repo with:

$ git push origin master

Forcing a Pull to Overwrite Local Files

Reference: Git force pull to overwrite local files

Sometimes, you may need to force a pull from a remote server. In such cases, use:

$ git fetch --all
$ git reset --hard origin/master
$ git pull origin master

Pushing an Existing Local Repo

If you already have a local repo, you'll need to first create an empty repo on GitHub. Then, link it to your local repo:

$ git remote add origin <URL>
$ git add .
$ git commit -m "Initial commit"
$ git push origin master

Since you'll likely be pushing often, you can simplify the process with:

$ git push -u origin master

After that, future pushes require only:

$ git push

Conclusion

Congratulations! You've reached the end of this guide. The content covered here is just a basic introduction to git, aimed at helping you get started quickly. However, mastering git is a long journey—simply digesting this guide doesn't make you an expert. Like many other Unix-like tools, becoming proficient requires continuous learning and practice. I wish you smooth sailing as you explore the world of git!