Create your first pull request, and learn how to collaborate efficiently on software projects with git remote and github.
Collaborating with git can be quite intimidating at first...
This is due to several reasons :
After reading this guide, you will get a better picture of all this, and you will be able to collaborate smoothly with fellow developers on great projects !
This article is the second part of my series about Git. If you haven't read the first part, Git : Overcome your Fears, please take a look now to make sure you understand the basic concepts of Git.
For this tutorial, I have chosen GitHub as a remote git platform because it is accessible to everybody, and because most open-source projects are on GitHub.
But if you're using a different platform such as GitLab, don't worry, all platforms are very similar.
Here is the outline:
There's no magic here : a remote repository is simply another git repository.
The main difference with your local repository is that the remote is hosted somewhere on the internet, and not on your local machine. Most often, remotes are either hosted on github.com or on a private GitLab server belonging to an organization.
Apart from that, remotes are standard git repositories: they have commits, branches, tags, and so on.
Remotes are needed to share code.
For example, let's assume that Guido and Bjarne want to collaborate on some new software project called nohtyp. They have set up a remote repository on GitHub that would host the official version of the code.
To provide a new feature, Guido has:
At the same time, Bjarne has implemented another new feature. To contribute, he needs to:
With this simple workflow, Bjarne and Guido can exchange code and build their project together, without ever connecting to the other person's machine. Also, each of them is still in full control of his own local repository.
But this workflow should never be used, because it is not safe. Indeed, it would require giving all developers the right to push into the official repository. And with the right to push, any developer could wreak havoc into the repository, e.g. by deleting important branches.
In the next section, we will see how this security issue is avoided by using forks and pull requests.
If we forbid people to push to the official repository, how can they contribute ?
With forks and pull requests !
Here is a modified git workflow :
This time, each developer has his own remote repository on GitHub, to which he is free to push.
Here is how these remote repositories were created:
When a new developer wants to start contributing to the project he starts by "forking" the official repo to his GitHub account. Essentially, this just means that a copy of the official repo is created on his account, and that he takes full ownership of the copy. In this process, GitHub keeps track of the link between the fork and the mother repository for later.
After the copy is done, the fork and the official repo can (and will) diverge.
Please note that the GitHub repository URL indicates the name of the repository, and the GitHub account that owns the repository.
The official repository is typically owned either by an individual or a GitHub organization. Here, the organization is called "bg". The official repository is administered by people with extended rights (and maybe Guido and Bjarne are administrators).
With this architecture, the development workflow is the following.
Guido:
Official repository administrators:
Bjarne:
So what is a pull request ?
Instead of pushing his new commits from his remote to the official repository, Guido tells the administrators of the official repository that he would like to get his new commits in the repo through the PR.
This is a much less intrusive operation, since the administrators can review and refuse the PR if they so wish.
The details will become clear below when you send your first pull request.
But before that, let's get our tools ready.
Create your GitHub account if you don't already have one.
To interact with your remote repositories, you will need to connect to github with ssh from your local machine.
Follow the official ssh instructions from GitHub to set this up. This is going to take you a bit of time now, but it's really needed.
Test your ssh connection to GitHub:
$ ssh git@github.com
PTY allocation request failed on channel 0
Hi cbernet! You've successfully authenticated, but GitHub does not provide shell access.
Connection to github.com closed.
Finally, check your git configuration, which is in the file .gitconfig in your home directory.
You should make sure that your name, your email, and your github username are correct:
[user]
name = Colin Bernet
email = contact@thedatafrog.com
github = cbernet
It's important to do that, because this information will be included in all of your commits so that your collaborators can see that a commit is from you.
For this exercise, I've provided a test github repository, which will serve as your "official" repository. And I'll be the repository administrator!
In this way you can exercise with your first pull requests in a safe and friendly environment 😉
I actually hope this guide won't attract too many people ... or it will be a lot of work for me.
First, log in to your github account.
Then, go to https://github.com/cbernet/datafrog_git_test
Now, fork the repository by clicking on the fork button at the top right:
This will create your own copy of this repository and bring you to its page.
Note how the page url changed from https://github.com/cbernet/datafrog_git_test to https://github.com/<your_github_username>/datafrog_git_test.
Now that this is done, you will clone your fork to your local machine. This will :
To do this, click on the Code button and select ssh. Then copy the URL use it to clone the repo like this :
git clone git@github.com:<your_github_username>/datafrog_git_test.git
Finally, enter the working directory of your local repository:
cd datafrog_git_test/
In this directory, you will find a simple script, hello.py
, that you're going to modify later:
def great(people):
for p in people:
print(f'hello {p}')
if __name__ == '__main__':
everybody = [
'colin'
]
great(everybody)
Ok, I should have written greet and not great :-) it would be a pain in the neck to correct this at this stage, so I'll leave it like this. Apologies!
Run the script:
python hello.py
Many people are going to collaborate on this script, so don't be surprised if it evolved. The code shown above is the original version, from commit bdfe1a4
.
Now check the history with git l
(the alias that we defined in Git : Overcome your Fears :)
git l
* bdfe1a4 17 seconds ago Colin Bernet (HEAD -> main, origin/main, origin/HEAD)
| add colin
|
* b4fb697 17 minutes ago Colin Bernet
Initial commit
List the remotes connected to your local repository:
$ git remote -v
origin git@github.com:<your_github_username>/datafrog_git_test.git (fetch)
origin git@github.com:<your_github_username>/datafrog_git_test.git (push)
We see that git@github.com:<your_github_username>/datafrog_git_test.git
is connected as a remote, with local name origin
.
This was done automatically by git clone.
You can also establish remote connections easily after the fact.
Add the official repository as another remote, with name colin
:
git remote add colin git@github.com:cbernet/datafrog_git_test.git
If you print your remotes again, you now see:
colin git@github.com:cbernet/datafrog_git_test.git (fetch)
colin git@github.com:cbernet/datafrog_git_test.git (push)
origin https://github.com/<your_github_username>/datafrog_git_test.git (fetch)
origin https://github.com/<your_github_username>/datafrog_git_test.git (push)
You could remove this remote with git remote rm colin
, but don't do it, we will need it.
And there is nothing special about the origin
remote, you could remove it as well, and re-add it with a different name if you prefer, as long as you know its url (you can pick it up on its GitHub page).
Before starting to work on a new feature, I suggest to always update to the latest version of the official code.
Since you just forked and cloned, your version of the code is probably identical to the version of the official repository. But you can't be sure, so let's go through the update process.
First, we fetch the state of the official repository:
git fetch colin
This command retrieves:
Here is the git fetch documentation.
Important note :
git fetch does not modify the state of your local repository. It only gathers remote information. So this command is completely safe. Don't be afraid to use it often, so that you know what's going on remotely!
Before doing anything, check the history with git l
to see what you're going to merge into your version.
Now that you have all the necessary information from the remote, you can merge the remote branch into your current branch:
git merge colin/main
After that, your current branch contains all commits of origin/main in its history. This is good, you can now start building your new feature on top of the official version.
You could also start developing your new feature before merging, it makes no difference.
Important note 2:
some people use git pull
, which is the equivalent of a fetch followed by a merge. For example:
git pull colin main
does the following under the hood:
git fetch colin # fetch only the main branch, with its commits and tags
git merge colin/main # merge into current branch
I only do that if I exactly know what's on the remote branch. Usually, this is the case if I pull from my own remote. When I want to get code from another remote that I don't control, I always do a fetch, and I only merge when I know what I'm going to be merging.
After merging, test the code again. If the package contains unit tests (and it should), run them. If there are executable scripts, run them. Here, we can run our small script:
python hello.py
We're now ready to start developing a new feature.
In this section, you will modify hello.py
to add your name.
Open this file with your editor, and add your name to the list everybody, as shown below:
everybody = [
'colin',
'your_name'
]
great(everybody)
Test your changes by running the script. If the package features unit tests, run them.
Advice: if you can, make sure that the tests pass before each commit.
If the tests succeed, you can commit:
git commit -am 'added my name'
Check your commit history with git l
:
* 45ae389 2 seconds ago First Last (HEAD -> main)
| added my name
|
* bdfe1a4 4 days ago Colin Bernet (origin/main, origin/HEAD, colin/main)
| add colin
|
* b4fb697 4 days ago Colin Bernet
Initial commit
We see that the last commit is one commit above origin/main, and one commit above colin/main.
Push the main branch to your fork (the origin remote):
git push origin main
And use git l
again to check the history:
* 45ae389 31 minutes ago First Last (HEAD -> main, origin/main, origin/HEAD)
| added my name
|
* bdfe1a4 4 days ago Colin Bernet (colin/main)
| add colin
|
* b4fb697 4 days ago Colin Bernet
Initial commit
The push moved origin/main
in your fork to the same commit as the local branch main. In the process, the needed commit (45ae389
) was copied to your fork.
At this point, the of code of origin/main
in the remote is exactly the same as the one of the local branch main, since both branches point to the same commit.
But the code of the official repository (colin/main
) did not change.
We want to make our changes official, so it's time for a pull request to the official repository!
Go to your fork on github. You should see something like this:
In particular you can see the status of the main branch in the fork with respect to cbernet/main, which is the same branch in the official repository, the mother of your fork.
Click on Contribute and open a pull request.
This opens a new window:
Take the time to review carefully all the information on this page:
main
branch of thedatafrog/datafrog_git_test
(the source branch) to the main
branch of cbernet/datafrog_git_test
(the destination branch). You could change all this, for example if you wanted to do a PR to the fork of a colleague who is also working on this project.Then click on "Create pull request".
This brings you to a new form in which you must give a title to the pull request and provide some information as shown below. Then click on create pull request.
At this point:
That's it, you're done!
The administrators might ask you to make changes to your pull request. To do this, you can simply update the source branch in your fork by:
When you do that, your new commits will be added to the PR automatically.
You should always remember that repository administrators are people like you and I.
They are certainly quite busy, and they often maintain open source packages in their spare time, as a contribution to the community. Also, it could be that they don't even know you, let alone trust you.
So the first thing to do is to make sure that your PR is even wanted.
In any case, check the official repository to see if there are any instructions explaining how to contribute. If there are, follow them, or your PR will be ignored.
In general, I would suggest to do the following :
When you're done, and before you submit the PR, merge the latest changes from the official repo into your local repo, and check carefully that everything is ok :
After submitting the PR, be kind with the admins, and answer their questions as well as you can. Be reactive.
In this article, you have learnt how to collaborate with others using git remotes. You have :
Although we used github as a remote git platform, the process is going to be exactly the same on other platforms such as gitlab. The only thing that is going to change is the web interface of the platform.
In the next article, I will tell you about a few advanced features of git, stay tuned!
Please let me know what you think in the comments! I’ll try and answer all questions.
And if you liked this article, you can subscribe to my mailing list to be notified of new posts (no more than one mail per week I promise.)
You can join my mailing list for new posts and exclusive content: