5934
We're The Backblaze Cloud Team (Managing 750+ Petabytes of Cloud Storage) - Back 7 Years Later - Asks Us Anything!
7 years ago we wanted to highlight World Backup Day (March 31st) by doing an AUA. Here's the original post (https://www.reddit.com/r/IAmA/comments/rhrt4/we_are_the_team_that_runs_online_backup_service/). We're back 7 years later to answer any of your questions about: "The Cloud", backups, technology, hard drive stats, storage pods, our favorite movies, video games, etc...AUA!.
(Edit - Proof)
Edit 2 ->
Today we have
/u/glebbudman - Backblaze CEO
/u/brianwski - Backblaze CTO
u/andy4blaze - Fellow who writes all of the Hard Drive Stats and Storage Pod Posts
/u/natasha_backblaze - Business Backup - Marketing Manager
/u/clunkclunk - Physical Media Manager (and person we hired after they posted in the first IAmA)
/u/yevp - Me (Director of Marketing / Social Media / Community / Sponsorships / Whatever Comes Up)
/u/bzElliott - Networking and Camping Guru
/u/Doomsayr - Head of Support
Edit 3 -> fun fact: our first storage pod in a datacenter was made of wood!
Edit 4 at 12:05pm -> lots of questions - we'll keep going for another hour or so!
Edit 5 at 1:23pm -> this is fun - we'll keep going for another half hour!
Edit 6 at 2:40pm -> Yev here, we're calling it! I had to send the other folks back to work, but I'll sweep through remaining questions for a while! Thanks everyone for participating!
Edit 7 at 8:57am (next day) -> Yev here, I'm trying to go through and make sure most things get answered. Can't guarantee we'll get to everyone, but we'll try. Thanks for your patience! In the mean time here's the Backblaze Song.
Edit 8 -> Yev here! We've run through most of the question. If you want to give our actual service a spin visit: https://www.backblaze.com/.
YevP636 karma
Yev here -> What 14 Petabytes of storage looks like, 180TB Pod (old school), Opened Storage Pod
Here's a few to get you started...I'll send more later ;)
Edit (above for cleanup, below for more hot server pics)
Here's some good good cables -> Cable Porn, Cabling Porn
YevP210 karma
Good question - no idea. That picture was from a while ago (been a minute since I was in the data center)...let me go find out.
Edit* -> Asked the data center team and they think those are Enterasys (but from a long time ago). We now use a combination of: Arista, Dell, and some older Force10s.
unibrow4o984 karma
Hah for sure. For what it's worth, I started my own (very small) business late last year and signed up for your service, and I think you guys do a great job.
ctrlaltd133753 karma
RMA-able, eh? You can return the goods to my home address, I'll PM you. ;)
YevP103 karma
Yev here -> Great question! Those are NOT Storinators. But here's the funny story - Protocase, was our original contract manufacturer for our storage pods. Since we open sourced the design, a few years in, Protocase created a company called 45drives.com and that's where the Storinators are from! So...it's the reverse, these are our "something custom" pods that begot the Storinators!
Edit - typo
whattheactualfuck013 karma
Did you ever entertain Cleversafe --> IBM COS for your peta --> exa scale object storage? What are/were your thoughts on their tech?
YevP27 karma
Yev here -> We've written all of our own code to handle that large of scale (Zettabyte-scale architecture) so switching or using another provider would be fairly expensive for us. Plus we're all about cost optimization, so a lot of existing systems are/were out of the question due to cost. One of our Operations Engineers used to work there though, so that's cool!
Javad0g23 karma
The moment I clicked on the first picture, all of my external drives here in my home office spun up.
they know......they know.
x86_64Ubuntu18 karma
Those are some serious cables in the Cable Porn photo. Do the cable origin and termination points have to match up, or will the system figure it out?
brianwski279 karma
How sustainable is your pricing for ‘unlimited’ backup? Are most users only storing a small amount?
If you are curious, here is a "histogram" of the "Personal Backup Customers" backup sizes as of December 31, 2018:
https://i.imgur.com/iVEuwUT.jpg
You will need to zoom in to see the information. As you can see, we lose money on a few customers at the high end (we cannot store 430 TBytes of data for only $6/month), but since more customers just want to be reasonable and backup their laptops we are profitable and fully sustainable on the "average".
imzeigen147 karma
Holy Cow, who the heck is uploading 430TB of data? I'm guessing linus from linus media group?
mazzar403 karma
When you were sponsoring Critical Role, did Sam ever run an ad idea by you beforehand? Was there anything you nixed?
YevP331 karma
Yev here - Bidet Critter! No, nothing was ever off the table, completely trusted Sam to do a great ad. My personal favorite was the infomercial with Marisha and Tal! Sam was amazing to work with. Crazy creative!
RobertLoblawAttorney36 karma
Are you able to share why you guys don't do ads for them anymore? I miss Yev!
YevP41 karma
Yev here -> Hi! Definitely! I posted this over on /r/criticalrole when my last episode aired -> https://www.reddit.com/r/criticalrole/comments/a4c9z3/no_spoilers_backblaze_sponsorship_ending/ebj31d6/. I had an AMAZING time working with G&S (still do w/ LA by Night) and the Critical Role team, and it was a great partnership! The TL/DR is that at some point you reach a saturation level, and have to look at different advertising/sponsorship avenues (plus they are in great hands now). It was a great ride! Hopefully I can still wiggle my way in there every now and again. That other post has more deets!
i_mormon_stuff189 karma
How is your Hard Drive ordering done, do you like just call up Seagate and say you want 2,000 Hard Drives or what?
And finally, how are returns of bad/broken drives still in warranty handled?
YevP193 karma
Yev here -> we asked our purchasing department for a better answer but until they write back here's what I think happens: we call the manufacturers and say, "Hey we need _X_ amount of drives, what's your lowest price?" And then we go with the one who gives us the smallest dollar amount. As for returns they're done through the warranty process, most manufacturers have an RMA portal that can be utilized using the serial numbers on the drives.
Czfsaht80 karma
No more driving around the SF bay buying external HDDs on sale? I miss those days...
YevP51 karma
I did like exploring the bay area! This WAS my map after all -> https://www.backblaze.com/blog/wp-content/uploads/2012/10/Around_the_Bay_trip.jpg (from https://www.backblaze.com/blog/backblaze_drive_farming/).
dogturd2163 karma
I believe you guys wrote the story about the rash of 2 Tb drives with high failure rates . Did the vendor treat you fairly and make things right ? Or are you avoiding that vendor ? I had the same problem on my home system with the same drives .
YevP22 karma
Yev here -> Yes, those were the 3TB Seagate drives (but honestly many drives we were using around that time suffered higher failure rates) - and that vendor is great! We buy tons of Seagate drives (if you look at the hard drive stats posts you'll see them with a high percentage of our fleet) -> https://www.backblaze.com/b2/hard-drive-test-data.html.
GloriousDawn122 karma
Amazon Web Services has just announced pricing for its new Glacier Deep Archive and it seems among the lowest on the market for what i see as a "last line of defense" backup. But i've heard many good things about Backblaze, so can i ask in what way are your services and pricing structure different, and for which use cases you think you have the better value proposition ? I'm totally a noob with cloud storage BTW (but considering to get one for my Synology) so feel free to correct any misconceptions i might have.
YevP142 karma
Yev here -> Great question! We saw the news ourselves. Here's some back of envelope math we sent around the other day when this news was announced:
Assuming 14TB of storage - 14TB with Backblaze - instant ‘retrievability’ - $70 per month (vs. $322 per month for AWS S3). 14TB with AWS Glacier - minutes to 12 hours retrievability - $56 per month (fees apply). 14TB with AWS Deep Glacier - at LEAST 12 hours retrievability - $14 per month (fees apply).
Both Glacier and Deep Glacier also have a lot of retrieval fees/quirks if you want to speed up the process, but if you're willing to wait it's an OK proposition. The trouble comes if you want that data quickly. We charge $0.01/GB to download so the total(ish - assuming low transactions) cost of storage would be about 14TB/month and $140 to download all of it. And that's all you'd really pay with us.
GloriousDawn43 karma
Great explanation, thanks. Are you considering adding some lower tier of retrievability to compete in that space as well ? I ask that as someone more interested in pricing than speed of retrieval (that "last line of defense" backup idea). OTOH i feel your solutions are probably easier to use than AWS which also command a premium.
YevP64 karma
Are you considering adding some lower tier of retrievability to compete in that space as well
Not at the moment. We're hyper-focused on our offering and scaling that up to meet the needs of the many. A lot of folks want a Cloud Storage service that will be inexpensive and highly available, so that's where our energy is focused at the moment. Building out a lower-tier of storage would mean large-scale architectural changes (in a lot of those low availability services they use tape and/or DVD/s to house the data) and that's a lot of work!
YevP92 karma
Yev here ->
How many of you are Tim's?
At least 3...but we'll never tell who.
FluffyCorgis50 karma
So recently Linus from LinusTechTips made a video about backing up their 1 Petabyte storage servers. They ended up tricking Google Drive to accept that large of data. Would your service work for their needs? Or is 1 petabyte from a single user too much? Any comments about it?
Heres the video: https://youtu.be/y2F0wjoKEhg
YevP82 karma
Yev here ->
is 1 petabyte from a single user too much?
Definitely not. We have a lot of B2 Cloud Storage users with over 1PB of data. If they're just using it for storage/backup/archive we'd definitely work for them. The problem with tricking Google Drive to accept that amount is that's how you end up with unlimited services shuttering or raising prices (BitCasa, OneDrive Unlimited, Amazon Unlimited Storage, etc...). It makes it not sustainable, so while you technically can do that, we'd recommend using services specifically designed for that type of usage (plus can you imagine downloading or recovering 1PB from Google Suite...ooof).
Edit -> typo
FluffyCorgis14 karma
Haha thanks. Is it even possible to recover 1PB?? I’m sure that would take months from any service.
YevP33 karma
From us? Yea. From Google Drive...no idea - that should be a follow-up video :P
matthewscotti8650 karma
Anyone else immediately upvote this because they've sponsored Critical Role?
YevP31 karma
Yev here -> Thanks! That was a fun time...AND I appreciate the upvote! :D <3
Deku78932 karma
Hi, what are some good resources to understand about cloud implementation? Like more technical things that a student interested in pursing a career in cloud computing could understand from? Thanks in advance!
YevP48 karma
Yev here -> I can't speak to learning about cloud computing in general, but one of the most fascinating things that we've made was this explanation of how our Reed-Solomon Erasure Coding works for our vaults. We made the video with our Cloud Architect a few years ago and it was literally the only time I actually understood Matrix Algebra. Other than that our blog post on how we implemented "Vaults" is pretty interesting and might provide some guidance on different aspects of the cloud that you might find interesting: Backblaze Vaults.
buthidae28 karma
What's the biggest single restore job someone has requested through Backblaze?
YevP39 karma
Yev here ->
We had a person once to 9 4TB restores to get all their data back, so that'd be about 35TB or so? Which is...quite a bit. /u/clunkclunk can give more detail!
YevP46 karma
Yev here - Well that number is a month or two old, we're projecting to hit 1 Exabyte by the end of the year. ;-)
I_will_draw_boobs9 karma
Is that useable data or does that include/raw and under managed. How much is duplicated?
YevP14 karma
The 750 is used (active) storage. We're deploying about 20-30 PB per month, and that gets filled up within the next few months. We try not to have too much "unused data" on hand as that is capital intensive and we're largely bootstrapped. We deduplicate data per client (Windows and Mac) on the backup side to avoid re-uploading data excessively from every machine.
byho12 karma
What was your guy's favorite ad bit from the man, Sam Riegel, on Critical Role?
YevP16 karma
Yev here -> I am very partial to this one -> https://www.youtube.com/watch?v=hnVAnmTNaHQ because it was friggin' hilarious. Taliesin's "All these wires, I can't take it anymore!" still kills me.
YevP16 karma
Yev here -> Funny story actually. I found Critical Role when they did their one-shot with Vin Diesel. I never played D&D before that and was intrigued by it. So I went back and started watching Season 1. A few weeks into binging the show, it dawned on me that they weren't taking any sponsorships. I run the online ads/sponsorships for Backblaze, and tried to reach out to them to see if they'd take any sponsors. After a few weeks of tweeting at most of the cast, the Twitter algorithm gods smiled upon me and Liam saw my tweet. We started chatting and he got me in touch with Travis who then put me in contact with Geek & Sundry who was producing the show for them.
Smash-cut (like in my commercials) to a month or so later and I was in Los Angeles for the taping of the Umbrasyl episode with Chris Perkins (Season 1 Episode 55) - which was one of the first sponsored episodes of Critical Role! Nowadays they're in very good hands :D
i_mormon_stuff9 karma
What do the driver manufacturers think of your sharing of data with the public? Sometimes you make them look good, other times when reliability is poor quite bad.
Also you have spoken a lot about Enterprise vs Consumer drives. Do you think it annoys them?
YevP22 karma
Yev here -> It's a mixed bag, like you said, sometimes they like it other times they don't - but I think over the years they've grown to use the stats as a way to dig into their performance. I did an AUA with u/Seagate_Surfer a few months back -> Seagate Scientist IAmA so we're definitely on good terms with all the manufacturers. Overall I think the release of those stats has been good for the industry and has also been good for consumers (granted our use-case is different than 99.99% of people).
IndieDiscovery7 karma
What does your tech stack look like? Do you all use any kind of containers, and if so, container orchestration platform like Kubernetes? Do you host on-prem, through a cloud provider, or mixed? If cloud provider which one(s), and what does your deployment pipeline look like? What are your favorite cocktails? Sorry I'm kind of a DevOps/SRE person so I can ask you as many relevant backend questions as you all care to answer :)
YevP7 karma
Yev here ->
Do you host on-prem, through a cloud provider, or mixed? If cloud provider which one(s), and what does your deployment pipeline look like?
We actually rolled our own cloud, you can read a ton more about the architecture here: Zettabyte-Scale Cloud Storage Architecture. /u/brianwski might be able to speak more to the tech stack as a whole.
What are your favorite cocktails?
I'm a gin fan, so a lot of gimlets or martinis (extra dirty w/ an onion + olives, so kind of a hybrid Gibson) are what I'm drinking a lot of right now!
cornsyrup326 karma
Is it common for the new hellium drives to fail prematurely? I purchased 17 of the Seagate 12TB drives and 4 drives failed after 5 minutes. Ended up switching to WD 10tb drives and have had zero issues so far.
Thanks,
Brandon
YevP7 karma
Yev here -> Actually we found that they pretty much perform the same over time. We wrote an article almost a year ago about it once we had a bit of experience with helium, you can read it here -> https://www.backblaze.com/blog/helium-filled-hard-drive-failure-rates/.
hello_I_am_bad_dev4 karma
Why should I choose your platform over something like GDrive, Dropbox or similar service?
I am actually looking for cloud storage for backup so this is good time to do AMA. 😂
YevP6 karma
Yev here! Great question, I actually wrote an article about that Sync vs. Backup vs. Storage.
TL/DR - Sync is great for working out of one folder or making sure you have access to a subset of data at all times, but if you do not EXCLUSIVELY work out of that folder, the other things aren't getting sync'd. Backups are automatic and take place in the background, giving you access to all the data that is on your computer regardless of location. Cloud Storage is more manual, and is usually what any sync or offsite backup service will use as the back-end (you can write to the APIs and build your own services as well).
hkyq3 karma
What is next for you guys? Anything major?
Also what protocols do you guys have for DDOS attacks?
Favourite video game? Favourite open source tool?
YevP4 karma
Yev here ->
What is next for you guys?
We're work on a lot of cool stuff all the time, but usually we keep it under wraps until we're ready to release it!
protocols do you guys have for DDOS attacks?
DDOS -> Our CTO had a good DDOS response to a different question.
Favourite video game
Top 3 games of all time for me: Last of Us, Vampire The Masquerade Bloodlines (so hype for the sequel), and Age of Empires (the series, loved it growing up, many hours spent in there).
Favourite open source tool?
Loving BitWarden right now!
2cats2hats2 karma
Do hard drive manufacturers reach out to you folks?
Just curious if they can learn anything from what you folks do.
As we all know, making a product and being the end-user of a product are two completely different observations/expectations of said product.
YevP5 karma
Yev here -> We're constantly chatting with the different manufacturers, sometimes we run some testing for them on our hardware and other times we just chat about the general state of the backup world. Over time I think that most of the manufacturers found our stats to be insightful, and thus there's no really ill will or bad vibes going on. At the end of the day we're both trying to store data, so being open and helpful is a benefit to all folks!
MasterBet1 karma
Why should I trust you with my personal data? Please don't tell me because you are nice
Somethingcleaver1529 karma
Can you send pretty server porn pictures?
How sustainable is your pricing for ‘unlimited’ backup? Are most users only storing a small amount?
Are you looking at/offering cloud compute, or just storage?
View HistoryShare Link