These days reporters are interviewing me again about the Azimuth Climate Data Backup Project - because we're again facing the possibility that a Trump administration could get rid of the US government's climate data.
From 2016 to 2018, our team backed up up 30 terabytes of US government databases on climate change and the environment, saving it from the threat of a government run by climate change deniers. 627 people contributed a total of $20,427 to our project on Kickstarter to pay for storage space and a server.
That project is done now, with the data stored in a secret permanent location. But that data is old, and there's plenty more by now.
As before, I'm hoping that the people at NOAA, NASA, etc. have quietly taken their own precautions. They're in a much better position to do it!
I got interviewed for this New York Times article about the current situation:
• Austyn Gaffney, How Trump's return could affect climate and weather data, New York Times, November 14, 2024. https://archive.is/y5Qb9
For what we actually did, read this:
@johncarlosbaez makes sense to me. Get rid of climate change by getting rid of the data. Kind of like making up data in 80 CE or so then saying that's true. So the inverse logically applies as well.
Isn't this data already shared with responsible global researchers? The US is the only major player plagued by this pseudo-biblical antiscience nonsense globally so I'd think the data should be distributed.
@smxi wrote: "Isn't this data already shared with responsible global researchers?" That's the $64,000 question. I haven't seen any sign that it's true. But they're not dumb, so let's hope they've done it.
@smxi @johncarlosbaez I agree. Having one central high secure vault like backup seems like a shaky proposition. Safety in numbers seems more relevent here. 30tb sounds like a lot, and it is, but even a modest regional university likely has a few server racks with this sort of space on it, especially on off-line archive space. Distribute as widely as possible, rather than seeking perfection.
@Syulang I think I have 30TB lying around my house. As storage has gotten much cheaper over the past eight years, distributing copies now seems practical.
We have 28TB here ... and we live in a motorhome! So no, 30Tb isn't much these days.
@PeterLG @4raylee @Syulang @johncarlosbaez
30 TIB is still a lot. Because data that isn't backed up doesn't really exist. So you'd also need 2 copies which boosts it to 60 TiB. Which is a lot. Also I've never trusted this generation of ultra high capacity spinning disks in terms of durability. Too fragile.
Another saying is RAID isn't backup. Though a RAID mirror is to some degree. But easier to back up static blocks of data. Then 2 copies is fine. Per site backing it up. Backup is hard.
@smxi You're right that RAID isn't backup, but the good news is that this is a solved problem. The idea would be to have many duplicate fragments covering the dataset, each with its own cryptographic signature to verify integrity (cf Merkle trees). Then the datasets can be distributed on a chunk by chunk basis while still verifying the integrity of each chunk. Like bittorrent, but for data sets.
Backup was my biggest headache, after security, back in the day. Working with multiple in-hospital health systems, which are by their nature dynamic (up-to-the-minute dynamic), made kepping data safe in case of hiccough a constant source of ulcers.
Dog! I'm glad I don't do that anymore.
@PeterLG @4raylee @Syulang @johncarlosbaez I did backup for years for a client who believed their data was as important as the data you backed up actually was. Main mirror. Hourly syncs of main mirror in case mirror failed. 2 external off site stored backups lol. And no their data wasn't that important lol.
@smxi - when we paid for commercial servers to back up the US climate data, we were paying for the type of services you delivered. Now we've transferred that data to a location where they'll do that for free, because they are already doing it for lots of other data. We didn't think it was sufficient to buy a 36-terabyte hard drive and stick the data in there, but a lot of people kept suggesting we could just do that.
@smxi @PeterLG @4raylee @johncarlosbaez I worked for a private company once (briefly) in managed services and we had a client with mind-numbingly trivial data, while a university I worked at around the same time had literally no backup for students work made using the only Mac computer suite. (I mean, kinda figures, but still a bit lame)
@johncarlosbaez You could share it with the European Union’s climate community or any non-US university?
@Mossyrua - we did all the sharing we needed to back in 2018. What the world could use now is someone making new backups. But I'm hoping the people at NOAA, NASA, etc. do this on a regular basis.
@johncarlosbaez
Scientists Scramble to Save Climate Data from Trump—Again
https://www.scientificamerican.com/article/scientists-scramble-to-save-climate-data-from-trump-again/
@maxpool - thanks!!!
@johncarlosbaez
Reminds me of museums in Europe, ca. 1940, burying archaeological treasures to save them from the invading Nazis.
What a shame for the US!
@65dBnoise @johncarlosbaez How is it a shame for the US? Your assumed loss equals your assumed derision. Did you already lose access to the data or just decide to bash the US?
@nickserv @johncarlosbaez
It's not a zero memory system.Trump has given plenty a reason (chloroquine injections, sharpied hurricane paths, roll back of EPA rules etc.) for scientists to worry that the worst may happen.
It's a shame for the US, which pioneered open public scientific data and has a vibrant scientific community, that American scientists in American universities fear loss or manipulation of such data because of manifested climate denialism and bigotry, like in the dark ages.
@johncarlosbaez do you have another funding effort I can contribute to and boost?
@3janeTA - thanks so much for your offer, but I'm too burnt out to do that project again. It took quite a bit of work for about a year. I'm sorry! If I discover such a project going on, or some similar useful project, I'll post about it.
@johncarlosbaez no problem! Seemed worthwhile to help if I could. Important work I think
@johncarlosbaez I'm on our home connection that has been janky these past few days, and I have no special archival tools, but I flipped through the list and downloaded the Transportation Energy Data Book.
@bstacey - good.
Downloading really large datasets turned out to be quite slow, even for people who supposedly had very good internet connections.