Mirroring With Gitea

Following on the heels of my last post on why you should (not) self host your git repos, I went ahead and used Gitea to set up a local mirror of all my repositories, and all the repositories I don't want to loose access to.

The results were surprising, and after reading this, you might want to do the same. This post will be a qick overview of how I did it, some tips that'll help, and what I learned as a result.

Installation was pretty straightforward. I grabbed my old Raspberry Pi 3B and followed this spectacualar step-by-step guide for installation for installing Gitea. Afterwards a friend pointed out that I could have just installed installed it on our Synology with a Docker image.

The Goal

My goal was two-fold:

  1. I wanted a backup of all my repos. Many of them only existed in GitHub.
  2. I wanted to delete every fork I'd made just to guarantee I had a copy.

The Process

With that in mind I started working through my list of repositories1. Gitea makes it trivial to mirror a repo. You just choose "new migration" from under the "+" menu, and then click on the service you're migrating from. There are 3 things that can make your life easier here if you're importing lots2.

  1. Bookmark the form that comes up after choosing a service to migrate from.
  2. If you're going to be importing a lot you'll probably want to use a personal access token (GitHub docs). That way you won't get rate limited by the API, unless you're migrating (not mirroring) something with many thousands of issues.
  3. Use a clipboard manager. It doesn't remember your API key so you'll be pasting it into every form, along with the URL of the repo you're importing. I use Maccy, and I would recommend it.

For each repo in my list of repos on github, I'd make a couple quick decisions.

  • for things I authored

    • If it was something I am either maintaining or may poke in the future I'd just grab the url and tell Gitea to mirror it.
    • If it was something I wandered off from years ago, and am unlikely to ever touch again, I'd mark it as archived.
  • for the backups of other people's things

    • I'd click through and see if i'd added anything, or if it was just an old copy.

      • If I'd added something, and the original repo hadn't been touched, I'd mirror my fork.
      • If I'd added something, but the original had moved forwards some significant amount I just abandoned my changes, and mirrored theirs.
      • If mine was just an ancient copy of a repo that'd moved on, I'd go to the parent repo.

        • If the parent repo was archived I'd just tell Gitea to import it but not bother mirroring. There aren't going to be updates.
        • If the parent repo indicated that it too was a mirror, I'd follow that back to the source repo and mirror that instead.
        • After successfully mirroring I'd delete my copy.

The Learnings

(In no particular order)

First off, Gitea is really impressive. Those devs have done a great job and should be proud of what they've built.

Along the way, I discovered that many of the repos that I'd forked had moved on significantly. I had copies that were so old they probably wouldn't run with modern versions of things. They were also frequently missing lots of new features and bug fixes.

While this isn't terribly surprising if you think about it, the notable learning is that that's never going to be a problem again, because Gitea is mirroring them and regularly polling them for updates. If they disappear I'll have the latest copy before they went away. Github doesn't give me that option.

I'm not the only one forking repos just to have a copy. Multiple times I clicked through to the parent repo, and got curious. I'd see hundreds of forks, click into the insights page and see that essentially no-one was hacking on it except for the original authors. I guess lots of folks have gotten burned when they discovered that some repo they'd starred or bookmarked was gone when they went to check it out again.

Mirroring isn't something you can turn on later in Gitea. You must set a repo up as a mirror from the beginning.

Gitea can't mirror everything. Some repos (like this great command line time zone tool) always error out. Gitea provides no information as to why it failed to clone them down too. I think I encountered three repos like that. Also, I can't figure out how to clone anything from Sourcehut3. Even when I tried to clone repos manually their http links for cloning always error.

I realized that if i don't just mirror the tools that I author, and want backups of, but also mirror the repos of tools that I need to reference the source code of regularly then I have a single place I can search for only stuff that's relevant to me. For example, it's not uncommon for me to know that one of the libraries I use frequently has a piece of functionality, but not remember which library. Searching Google or Github would be frustrating, because there are just too many responses. But now, I can just search my Gitea install.

The experience of doing this was very cathartic for me. I now know that I have up-to-date copies of all the useful tools I care about. I was also able to delete about 100 repositories from my list of repositories on Github. Now, if a future employer checks them out, they're almost all things I have actively worked on to some degree. It's also nice to know that from now on I have a trivially easy way to mirror any useful tool I care about. No more ancient frozen it time forks, and no more bookmarks & stars that cease to exist.

My next step? Going through all the repos I've bookmarked, but not forked, and see which of those are worth mirroring.

Footnotes


1

Before working through your list of repositories, be sure to sort them by name. Mine were sorted by recency and that made things a little harder towards the end.

2

For a sense of scale I ended up mirroring or migrating 169 repos into Gitea.

3

In addition to having a visual style from the 1990's, and having UX from hell, Sourcehut appears to also just not fucking work as a git host. I don't understand why anyone uses them, nevermind why anyone pays for them.