Hi,
I’m not sure if this is the right community for my question, but as my daily driver is Linux, it feels somewhat relevant.
I have a lot of data on my backup drives, and recently added 50GB to my already 300GB of storage (I can already hear the comments about how low/high/boring that is). It’s mostly family pictures, videos, and documents since 2004, much of which has already been compressed using self-made bash scripts (so it’s Linux-related ^^).
I have a lot of data that I don’t need regular access to and won’t be changing anymore. I’m looking for a way to archive it securely, separate from my backup but still safe.
My initial thought was to burn it onto DVDs, but that’s quite outdated and DVDs don’t hold much data. Blu-ray discs can store more, but I’m unsure about their longevity. Is there a better option? I’m looking for something immutable, safe, easy to use, and that will stand the test of time.
I read about data crystals, but they seem to be still in the research phase and not available for consumers. What about using old hard drives? Don’t they need to be powered on every few months/years to maintain the magnetic charges?
What do you think? How do you archive data that won’t change and doesn’t need to be very accessible?
Cheers
Wherever you choose to store it, you should still consider following the 3-2-1 backup rule.
Don’t use DVDs or other disc media
Ugh, sounds icky. Thanks for the advice:)
I am using https://duplicati.com/ and https://www.backblaze.com/ ( use their b2 cloud storage its variable and 6$ a month for 1TB or less depending on how much you use) run a schedule beckup every night for my photos. It’s compressed and encrypted. I save a config file to my google so say if my house and server burn down. I just pull my config from google then redownload duplicati and boom pull my back up down. The whole set up backs up incremental so once you do the first back up its only changes that are uploaded. I love the whole set up.
Edit: You can also just pull files you need not the whole backup.
NAS
I used to write to DVD’s, but the failure rate was astronomical - like 50% after 5 years, some with physical separation of the silvering. Plus today they’re so relatively small they’re not worth using.
I’ve gone through many iterations and currently my home setup is this:
- I have several systems that make daily backups from various computers and save them onto a hard drive inside one of my servers.
- That server has an external hard drive attached to it controlled by a wifi plug controlled by home assistant.
- Once a month, a scheduled task wakes up that external hdd and copies the contents of the online backup directory onto it. It then turns it off again and emails me “Oi, minion. Backups complete, swap them out”. That takes five minutes.
- Then I take the usb disk and put it in my safe, removing the oldest of 3 (the classic, grandfather, father, son rotation) from there and putting that back on the server for next time.
- Once a year, I turn the oldest HDD into an “Annual backup”, replacing it with a new one. That stops the disks expiring from old age at the same time, and annual backups aren’t usually that valuable.
Having the hdd’s in the safe means that total failure/ransomware takes, at most, a month’s worth. I can survive that. The safe is also fireproof and in another building to the server.
This sort of thing doesn’t need to be high capacity HDDs either - USB drives and micro-SD cards are very capable now. If you’re limited on physical space and don’t mind slower write times (which when automating is generally ok), the microSd’s and clear labelling is just as good. You’re not going to kill them through excessive writes for decades.
I also have a bunch of other stuff that is not critical - media files, music. None of that is unique and can be replaced. All of that is backed to a secondary “live” directory on the same pc - mostly in case of my incompetence in deleting something I actually wanted. But none of that is essential - I think it’s important to be clear about what you “must save” and what is “nice to save”
The clear thing is to sit back and work out a system that is right for you. And it always, ALWAYS should be as automated as you can make it - humans are lazy sods and easily justify not doing stuff. Computers are great and remembering to do repetitive tasks, so use that.
Include checks to ensure the backed up data is both what you expected it to be, and recoverable - so include a calendar reminder to actually /read/ from a backup drive once or twice a year.
I use external hard drives. Two of them, and they get rsynced every time something changes, so there’s a copy if one drive should fail. Once a month, I encrypt the whole shebang with gpg and send it off into an AWS bucket.
Don’t over complicate it. 3 copies: backup, main, and offsite; 2 different media: hdd and data center; 1 offsite. I like blackblaze but anything from google to Amazon will work.
Good advice. My off-site is my brother’s place.
The local-plus-remote strategy is fine for any real-world scenario. Make sure that at least one of the replicas is a one-way backup (i.e., no possibility of mirroring a deletion). That way you can increment it with zero risk.
And now for some philosophy. Your files are important, sure, but ask yourself how many times you have actually looked at them in the last year or decade. There’s a good chance it’s zero. Everything in the world will disappear and be forgotten, including your files and indeed you. If the worst happens and you lose it all, you will likely get over it just fine and move on. Personally, this rather obvious realization has helped me to stress less about backup strategy.
So you would suggest to get bigger and bigger storages?
I really like and can embrace the philosophical part. I do delete rigorously data. At the same time, i once had a data lost, because I was young and stupid and tried to install Suse without an backup. I still am sad to not to be able to look at the images of me and my family from this time. I do look at those pictures/videos/recordings from time to time. It gives me a nice feeling of nostalgia. Also grounds me and shows me how much have changed.
Fair enough!
So you would suggest to get bigger and bigger storages?
Personally I would suggest never recording video. We did fine without it for aeons and photos are plenty good enough. If you can still to this rule you will never have a single problem of bandwidth or storage ever again. Of course I understand that this is an outrageous and unthinkable idea for many people these days, but that is my suggestion.
Never recording videos… That is outrageous ;) Interesting train of thought, though. Video is the main data hog on my drives. It’s easy to mess up the compression. At the same time is combines audio, image and time in one easy to consume file. Personally, i would miss it.
There isn’t anything that meets your criteria.
Optical suffers from separation, hard drives break down, ssds lose their charge, tape is fantastic but has a high cost of entry.
There’s a lot of replies here, but if I were you I’d get last generation or two’s lto machine from some surplus auction and use that.
People hate being told to use magnetic tape, but it’s very reliable, long lived, pretty cost effective once you have a machine and surprisingly repairable.
What few replies are talking about is the storage conditions. If your archive can be relatively small and disconnected then you can easily meet some easy requirements for long term storage like temperature and humidity stability with a cardboard box, styrofoam cut to shape and desiccant packs (remember to rotate these!). An antifungal/antimicrobial agent on some level would be good too.
People hate being told to use magnetic tape
Because there are still horror stories of them falling apart and not lasting even in proper controlled conditions
Do unplugged SSDs eventually lose the data?
Yes. They also slowly take longer to access their data with every read.
Wow, I didn’t know reads deteriorate SSDs. What’s the reason? Is the rate significant?
The data is stored in little ccd cells. It’s recorded as an analog voltage. There is no difference between analog voltages and digital voltages, I’m just using the word analog to establish that the potential is a domain that can vary continuously.
When you read the data, the levels of the voltages are checked and translated to the digital information they represent.
To determine the level of a voltage, a small amount of current is allowed to flow between the two points being measured. It’s a very small amount. Microamps and less.
When you draw current from a charge carrying device the charge, as represented by the potential between its negative and positive terminals, the voltage, decreases.
When the controller in the ssd responsible for reading voltages and assembling them into porno.mov doesn’t get a clear read, it asks again. As the ssd ages, parts of it can be re queried hundreds of times just to get commonly read information into memory like system files.
So the ssd degrades on read, and the user experiences this as “slowness”.
Would rewriting the data fix this problem? Yes. Using either badblocks -n, dd or a program called spinrite, rewriting the data fixes that problem.
Why doesn’t the ssd just do it? Because the ssd only has so many write cycles before its toast. Better to rely on the user or more accurately the host os to dictate those writes than to take on that responsibility.