site/content/blog/self-hosting.md

7.9 KiB

+++ title = "My personal self-hosting journey" date = "2023-02-12" slug = "self-hosting" +++

In this blog post, I'll delve into my personal experience of discovering and embracing self-hosting, and explore the advantages of this approach and why it might be worth considering for you.

For many years, I relied on free services such as Google Drive, Google Docs, Dropbox, Github, etc. and was very happy with them. I often wondered how companies could offer these services for free. It wasn't until later that I discovered the answer. Companies like Google, Microsoft, and Apple collect the data of their users and sell it to advertisers, who use it to target manipulative ads more accurately at people who are more likely to be influenced by them.

Moreover, users must trust both the service providers and the data purchasers to secure their information, prevent unauthorized access, and refrain from reselling it to entities that may pose a risk to the users' privacy. Historically, there have been numerous instances where both the service providers and data purchasers have proven to be unreliable and untrustworthy in their pursuit of higher profits.

This is where self-hosting comes into play. Instead of relying on corporations to store your data, you take control of it yourself, giving you complete control over your information and the ability to secure it to your standards. Additionally, it empowers you to access features that these corporations often restrict behind paywalls.

After learning about data collection, I was motivated to find a way to stop it. Initially, I tried using other service providers that claimed to prioritize privacy, but I soon realized that they couldn't guarantee their claims and often turned out to be just as problematic as other providers. That's when I began exploring the option of self-hosting and delving into the necessary knowledge, such as Linux servers, the internet, and various internet protocols. With time, I felt confident in my understanding and decided to take the leap into self-hosting.

At 14 years old, I saved up some birthday money, and acquired 5 Raspberry Pi 4s. I set them up in my room, and started trying things. At first, it was rough. I failed at all my attempts, but I was determined to continue, and eventually, I got my first service reachable from the internet. With that experience, I continued setting up more services. However, I eventually faced a problem - the Raspberry Pi models I purchased only had 1 GB of RAM, and I used 16 GB microSD cards for storage, which was insufficient for hosting some of the services I wanted. Additionally, this prevented me from using orchestration tools such as Kubernetes, as even the stripped down variants required more RAM to function properly.

After a while, I decided to expand my cluster, so I bought 5 more Raspberry Pi 4s, this time with 2 GB of RAM, as well as a 24-port network switch and an extremely long ethernet cable, using money I got from New Year gifts. With the help of my dad, I ran the Ethernet cable from our router to my room. On the router, I installed OpenWrt, a custom firmware for routers, and used its VLAN features to isolate my servers from the rest of the network and protect them from potential attacks related to WiFi and IoT devices. My dad also helped me construct a glass display case, which I used to house all my servers.

The extra power from the new Raspberry Pis allowed me to host even more services, but eventually, I ran into issues where certain services were designed to run on x86 CPUs such as those made by Intel and AMD, while Raspberry Pis use ARM CPUs like the ones often used in mobile devices. Around this time, I built myself a new desktop computer, which meant I no longer needed to use my 2012 Mac Mini with a broken drive. I replaced the old HDD with an SSD, upgraded the RAM to 16 GB, and added it to my cluster.

The addition of an x86 system with lots of RAM and lots of fast storage allowed me to host services that previously would've been difficult or impossible to host. The Mac Mini ran services such as Nextcloud, OnlyOffice, and Gitea to replace Google Drive, Google Docs, and Github respectively. These services required lots of RAM and storage, so they could not be run on my Raspberry Pis. It also ran services such as Invidious, which is written in the Crystal programming language, which couldn't be compiled for ARM at the time.

However, as my cluster continued to grow, I found it increasingly difficult to manage and keep track of all the devices and the various services they were hosting. The typical solution to this problem is to use an orchestration tool like Kubernetes, however, that wasn't an option for me since some of my nodes only have 1 GB of RAM, and even the most lightweight Kubernetes clients consume a substantial amount of it.

Eventually, while looking for a completely different tool, my friend found and showed me a tool called Nomad. I looked at its documentation and discovered that it was much more lightweight than Kubernetes because it omitted features that were unnecessary for many clusters. I successfully set up a Nomad server on my Mac Mini and effortlessly added my other servers as clients. With minimal setup, everything was up and running. While exploring Nomad, I discovered two additional powerful tools, Consul and Traefik. Consul keeps track of the services running and their metadata, while Traefik acts as a reverse proxy, dynamically communicating with Consul to discover services. I incorporated these tools into my Nomad cluster, enhancing its functionality.

I quickly began converting the services I was running to Nomad services, fascinated by the extremely high level of automation. I was able to simply write a configuration file describing the service I wanted to run, and Nomad would automatically figure out which server would be optimal for that service, download it on that server, configure it, run it, and then manage it while it's running. Then, Nomad sent the metadata to Consul, from where Traefik was able to pick it up and automatically reconfigure itself to proxy requests to that service and acquire a TLS certificate.

I was impressed by Nomad's resilience, as I discovered when I intentionally shut down one of the servers running various services. Within seconds, Nomad stepped in and rescheduled the services on a different server, downloading and configuring them, and updating the Consul data. This prompted Traefik to reconfigure itself, and in no time, everything was back up and running seamlessly with no intervention from me.

Over time, I added more nodes, and every time, Nomad was able to add them to the cluster seamlessly and began scheduling tasks on them. I now run lots of services, including SearXNG, Matrix Dendrite, Homer, Woodpecker CI, MinIO, CyberChef, Gitea, and much more. In fact, this site is hosted on that cluster. It's stored in a git repo on my Gitea server. When I update it, Woodpecker rebuilds it, and then uploads it to MinIO, from where Nomad downloads and deploys it.

I release all the files I use for my services publicly at https://gitea.arsenm.dev/Arsen6331/nomad (mirrored to https://github.com/Arsen6331/nomad)

Beyond the benefits of increased privacy, this experience has been a valuable learning opportunity for me. I have gained knowledge that I otherwise would not have had the chance to acquire. I highly recommend that anyone with an interest in self-hosting should give it a try. You don't have to go for a complex setup like mine to experience the benefits of self-hosting. Many people self-host their services on an old laptop that they no longer use.