While reading many of the blogs and posts here about self hosting, I notice that self hosters spend a lot of time searching for and migrating between VPS or backup hosting. Being a cheapskate, I have a raspberry pi with a large disk attached and leave it at a relative’s house. I’ll rsync my backup drive to it nightly. The problem is when something happens, I have to walk them through a reboot or do troubleshooting over the phone or worse, wait until a holiday when we all meet.
What would a solution look like for a bunch of random tech nerds who happen to live near each other to cross host each other’s offsite backups? How would you secure it, support it or make it resilient to bad actors? Do you think it could work? What are the drawbacks?
Syncthing. Look no further, just check the “untrusted device” so that you don’t give unencrypted data to your friend’s disk.
Syncthing is not a backup tool and may very well destroy all your data on its own (though this is rare).
I have local incremental backups and rsync to the remote. Doesn’t syncthing have incremental also? You have a good point about syncing a destroyed disk to your offsite backup. I know S3 has some sort of protection, but haven’t played with it.
They do have versioning: https://docs.syncthing.net/v1.27.7/users/versioning
Of course, you actually have to use that, it has to work, and you have to have a strategy for reverting the state (I don’t know if they have an easy way to do that – I’ve never used the versioned side of things).
I have had some situations where Syncthing seems to get confused and doesn’t do its job right. I ran into this particularly with trying to sync runelite configurations and music. There were a few times I had to “force push” … and I vaguely recall one time where I was fighting gigs of “out of sync” in both directions on something and just destroyed the sync and rebuilt it to stop … whatever it was doing.
Don’t get me wrong, it’s a great tool for syncing things between computers; but I would not rely on it for backup (and prefer having a backup solution on top of the synced directories). There are real backup tools out there that are far better suited to this sort of thing. I suggested Kopia, you should get some integrity checking using its builtin sync (as it won’t be able to figure out what to sync if your origin is corrupted); you won’t get that with a straight up rsync or a syncthing, they’re not application-aware enough to know they’re about to screw you over.
Restic has a similar feature but I’ve always found Restic’s approach much more frustrating and not-at-all friendly for anyone less than a veteran in systems administration. Kopia keeps configuration in the repository itself, has a GUI for desktop use that runs jobs for you automatic, automatically uses the secrets manager appropriate for your operating system, etc … Restic you kind of have to DIY a lot of basic things and the “quick start tutorial” just kinda ignores these various concerns.
Even if you plan to just use cron jobs, Kopia will do sane things with maintenance. Restic last I checked you still need to manually run maintenance tasks and if any job maintenance or otherwise fails, you need to make sure to unlock the repository (which if you haven’t set up notifications … well now you’ve got a silent backup failure and your backups aren’t running).
I just kept running into a sea of “oh this could be bad” footguns with Restic that made me uncomfortable trusting it as my primary backup. I’m sure Restic can be a great tool if used in expert hands with everything appropriately setup; but nobody tells you how to do that … and I get the feeling a lot of people are unaware of what they’re getting into.
The folks making Kopia … they seem like they really know what they’re doing and I’ve been very happy with it. We’re moving from rsnapshot to Kopia at work now as well (rsnapshot is also fairly good you’ve got a bunch of friends with NASes that support hard links and SSH, but it’s CHATTY and has no deduplication, encryption, data integrity verification is basically left to the file system – so you better be running ZFS – etc).
Duplicati’s developer is back too, so that might be something to keep an eye on … but as it stands, the project has been bit rotting for a while and AFAIK still has some pretty significant performance issues when restoring data.
Oh, fair point. Perhaps rclone.org then! :O
rclone or rsync is probably better but see my reply a few comments down (the very long one) about protocol aware cloning vs just cloning things at the file system
I wasn’t aware of the untrusted setting. That sounds like a good option.
Comedy NNTP option here.
It’s an established, stable, understood and very very thoroughly debugged and tested protocol/server solution that’ll run on a potato and has clients for every OS you’ve ever heard of, and a bunch you haven’t.
Setting up your own little mini-network and sharing groups is fairly trivial and it’ll happily shove copies of everyone’s data to every server that’s on the feed.
Just encrypt your shit, post it, and let the software do the rest.
(I mean, if it’s good enough to move 200TB of perfectly legitimate Linux ISOs a day, it’ll handle however much data you could possibly be backing up.)
Disclaimer: it’s not quite that simple, but I mean, it’s pretty close to. Also I’m very much a UNIX boomer and am a big fan of the simplest solution that’s got the longest tested history over shiny new shit, so just making that bias clear.
I’ve done a backup swap with friends a couple times. Security wasn’t much of a worry since we connected to each other’s boxes over ssh or wireguard or similar and used tools that allowed encryption. The biggest challenge for us was that in my selfhosting friend group we all prefer different protocols so we had to figure out what each of us wanted to use to connect and access filesystems and set that up. The second challenge was ensuring uptime and that the remote access we set up for each other stayed up - and that’s what killed the project as we all eventually stopped maintaining the remote access and nobody seemed to care - so if I were to do it again I would make sure all participants have alerts monitoring their shared endpoint.
This reminds me that I need alerts monitoring set up. ; -)
A lot of technical aspects here, but IMHO the biggest drawback is liability. Do you offer free storage connected to internet to a group of “random tech nerds”. Do you trust all of them to use it properly? Are you really sure that none of them will store and distribute illegal stuff with it? Do you know them in person so you can forward the police to them in case they came knocking at your door?
Perhaps I’ve been naieve.
I think encrypted backups won’t be an issue with this setup. And one would also need to have some friends for this to work.
I attended some LUGs before covid and could see something like this being facilitated there. It also reminds me of the Reddit meetups that I never partook in.
I have exactly the setup you described, a Raspberry Pi with an 8 TB SSD parked at a friend of mine. It connects to my network via Wireguard automatically and just sits there until one of my hosts running Duplicati starts to sync the encrypted backups to it.
Has been running for 2 years now with no issues.
I would propose creating a distributed hash table for this. But I would never host someone else’s data like this, because I’m too afraid they will give me encrypted illegal content and then some obscure law will give me the fault for it. This is just me though.
That’s something that I hadn’t considered!
I don’t have an answer for you, but I’m also interested in this and would like to see the responses
Trunas with Tailscale/headscale/NetBird as far as software and security. As far as hardware, you want storage that is not attached via usb. Either an off the shelf nas solution or a diy nas would work. There are a few YouTubers that touched on this, hardware haven and raidowl I think.
I have tailscale mostly set up. What’s the issue with USB drives? I’ve got a raspberry pi on the other end with a RO SD card so it won’t go bad.
Reliability of connection to the drives, especially during unscheduled power cycles. USB is known for random drops, or not picking the drive up before all your other services have started, and can cause the need for extra troubleshooting. Can run fine… or it could not. This is in reference to storage drives, not OS drives.
Backups need to be reliable and I just can’t rely on a community of volunteers or the availability of family to help.
So yeah I pay for S3 and/or a VPS. I consider it one of the few things worth it to pay a larger hosting company for.
It sucked when Crashplan’s home client went under. If you installed the client on two computers with internet access, it would let you set the remote computer as a target. Encryption was done at the source, it had dedupe, versioning. It ate a little ram but it was really nice.
Use object storage for media and backups, then use s3 replication to put a copy somewhere else.
Yes. It’s the “put a copy somewhere else” that I’m trying to solve for without a lot of cost and effort. So far, having a remote copy at a relative’s is good for being off site and cost, but the amount of time to support it has been less than ideal since the Pi will sometimes become unresponsive for unknown reasons and getting the family member to reboot it “is too hard”.
This is kinda the same idea but made for what you originally asked for: https://garagehq.deuxfleurs.fr/
There’s not much cost with S3 object. It’s just a file system in Linux, and replication is a protocol standard.
I use Sia for this. It is essentially what you describe, but with a monetary system.
I rent out some of my storage, and use the Siacoin earned to buy storage for backups.
I’ll have to check this out.
Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:
Fewer Letters More Letters SSD Solid State Drive mass storage SSH Secure Shell for remote terminal access VPS Virtual Private Server (opposed to shared hosting) ZFS Solaris/Linux filesystem focusing on data integrity
4 acronyms in this thread; the most compressed thread commented on today has 6 acronyms.
[Thread #948 for this sub, first seen 2nd Sep 2024, 19:35] [FAQ] [Full list] [Contact] [Source code]
You could use kopia for this (but you would need to schedule cron jobs or something similar to do it).
The way this works with kopia… You configure your backups to a particular location, then in-between runs there’s a sync command you can use to copy the backup repository to other locations.
Kopia also has the ability to check a repository for bad blobs via its verify function (so you can make sure the backups stored are actually at least X% viable).
Using zerotier or tailscale (for this probably tailscale because of the multithreading) would let you all create a virtual network between the devices that lets them directly talk to each other. That would allow you to use kopia’s sync functionality with devices in their homes.