A data backup strategy is intended to help ensure your government agency is prepared in case of a severe outage or disaster. Floods, fires, blackouts, malware, cyberattacks and natural disasters are the ones that worry IT administrators most, but what about a leak in the wrong place in your data center roof? What about human error resulting in a corrupted database?
The problem is that, although disasters come in all shapes and sizes, they have one thing in common: they always catch you less prepared than you want to be.
To be prepared in advance, it’s important to be able to answer the following questions:
- Does your agency have a data backup strategy in place?
- What risks does it cover?
- Which goals does it seek to accomplish?
- Have you documented it?
- Have you tested it?
Over the course of this two-part blog series, we will describe a seven-step plan you can use to inform your organization’s data backup strategy, from initial assessment through ongoing testing. In part one, we will go over the first four steps:
1. Know your risks
Where are your weakest links? If they break, how will your agency be vulnerable?
- Naturally, site-wide disasters are the ones that grab headlines and cause the most concern. An earthquake, flood or tornado doesn’t care how expansive your site is, because it will find you and cause an outage.
- Platform failures trim the incident down to a failure of one element inside your infrastructure, but they include a knock-on effect due to unseen relationships among your technology components.
- Application issues can arise from a seemingly benign change like a patch. If you apply a security update and it contains a bug, you may have to roll back to your last-known good configuration. Controls exist to ensure you understand the risk associated with any change.
- Data loss is top of mind for most IT admins now in the context of ransomware actors. The first risk is that they demand ransom for the data they steal from you and the second risk is that they may also encrypt the data in your storage. For good measure, they’re going after your backups to delete those as well.
- The human factor runs from inadvertent deletion stemming from everyday human error all the way to The Revenge of the Disgruntled Insider and the mischief of nation-state actors.
Risk assessment plays a role here. You can make the mistake of assuming you know your entire IT inventory, but trends like virtual machine (VM) sprawl and shadow IT put much of that in doubt. It’s more prudent to automate your inventory process to ensure you’re not leaving things out.
It may seem like a stretch to occasionally inspect the air conditioning in your data center, the pipes under the floor and what your neighboring organizations are up to. But from the perspective of physical plant, it’s better than being caught completely off guard by an incident. And within your data backup strategy, make room for disaster recovery plans suited to each data center. For example, your data center in Florida has a different risk profile from your data center in Arizona, so take advantage of your distributed data by spreading the risk around.
Your SaaS providers should be able to tell you what they’ve put in place to mitigate risk in their infrastructure. Their risk is part of your risk assessment, so be sure to take it into account.
2. Know your data
If you don’t know your data, the odds are that you’ll end up paying to blindly back up data you don’t need to keep around.
Not all data in an agency is equally valuable. In a data backup strategy, that translates into several important dimensions:
- Money — The more data you have, the more you’ll pay to store it somewhere.
- Time — The more data you back up, the longer it will take to restore when the time comes.
- Technology — Different backup storage options are better suited to different types of data.
The way you think about the technology dimension can make a big difference in your money and time dimensions. Consider these types of data:
- Static — Old data that does not change over time. That doesn’t mean that it’s unimportant; nobody would argue that, but it’s been ages since anyone retrieved them, let alone modified them.
- Agency-vital — Data that is essential to an agency, such as constituent records and documents. Without them, you cannot function.
- Mission-critical — If this data is lost or unavailable, even for short periods of time, the agency will sustain damage. For this you want very tight recovery point options so that you lose less data and have less to recover in case of disaster.
Of course, that’s an IT perspective. When you ask a non-IT department head how critical their data is and which data they would need to recover quickly after a disaster, they tell you, “Everything.” And when you’re in IT, your first-round answer is, “Okay. We’ll do it.” But after you’ve had a similar conversation with more department heads, you present them with an unaffordable cost estimate. So, the second-round conversation starts with “We cannot afford to do everything. So now tell me which data truly is mission-critical. I can back it up and store it with expensive technologies, then use less-expensive data protection technologies for the rest.”
Why does this need to be a two-pass process? Because the managers know their departments, but they don’t know their data. That’s why this step is important. In IT, you know that it doesn’t make sense to use the same technology for mission-critical data that you use for static data. The exposure and risk are different because the data is used differently. So different backup and storage technologies are called for.
Mind you, not every organization does it that way. Some IT groups don’t work like that. They set a service level agreement (SLA) for the agency and say, “This is the backup and storage technology we use. We can afford to keep the data for X months, no matter what it is. After that, we move it somewhere else or delete it.”
Or you may group your applications by importance. If, for example, your database has a high rate of churn, do you need to keep snapshots of it for a year? Probably not, because it will have changed so much in a year as to be useless.
3. Know your goals
The one main goal of your data backup strategy, of course, is to recover data in case of disaster. As you break that down and think about needing to recover data, you evaluate the importance of each data set. Why? Because the type of backup and recovery mechanism you’ll use is a function of that importance and how long you can operate without the data set.
In the “Know your data” step, you had conversations with department heads about backing up. In this step you have conversations about recovery, downtime and SLAs. The managers will say, “I know what you can back up. How quickly can you recover it?” And it’s up to you to tell them what you can recover and in what time frame.
The conversation revolves around questions such as these:
- What kind of outage might happen?
- How long is the application likely to be offline?
- For how long will we be unable to work?
- For how long will constituents and internal staff be unable to use our systems?
That’s how you start to build an SLA, based on applications, to the rest of the agency. You may tell Finance, for example, “Our ERP is a major agency application, so we’ll use these techniques to recover it in case of disaster. It will be back up and running for you within x hours.”
IT teams continually work on the basis of SLAs with external service providers like Azure, AWS and Google. In turn, SLAs are just as applicable to the services that IT provides internally.
4. Know your tools
The service levels you agree to depend on your technical capabilities, and you have several options when it comes to recovering data:
- File- and folder-based recovery
- Application-aware recovery; for example, restoring assets like SQL databases in such a way that they are available as soon as the machine is recovered
- Machine image-based snapshot recovery, which has long been popular with virtual machines and can also apply to physical machines
- Deduplication and replication for determining how and where you store your data as part of your data backup strategy
- Single-item recovery and high-speed machine recovery, for when the disaster is limited to, say, losing the CTO’s laptop or deleting a couple of very important email messages
- Cloud data and usage time to recovery, for putting a data set back up into the cloud if it’s been lost. Have you been replicating that data set to different cloud regions, and can you recover it from there? How long will it take to point your services at the replicated data set?
You face a cost-versus-risk axiom in data recovery: The more effective the recovery mechanism, the more it will cost you. And the cheaper it gets, the longer it takes to recover the data. With that in mind, here’s an example of flow for your data backup strategy:
At the top level, you take snapshots of data that’s stored on the primary disk, in case you need to recover in bulk. But snapshots are not so useful in case of complete failure of the disk system, so consider asynchronous replication to a secondary location. That’s a useful technique for recovering large amounts of data, but it usually means buying secondary storage and primary storage from the same manufacturer. You reduce risk, but it’s more expensive.
The next level down contains your backup sets, where the 3-2-1 rule applies: Maintain 3 copies of your data in 2 different media formats, then store 1 of them at a remote site. So, your three copies of data are the primary, a secondary copy and a third copy in another location. Then, your two formats could be in the cloud or on tape. But in any event, one of them is off site. You could then use any of those backups for recovery, depending on the age and size of the data.
- You want the snapshots for the big, quick, instant recovery of large amounts of data.
- The backup copies are your safety net. If your storage system fails, you can recover everything from backups.
- The secondary copy is effective against ransomware. Store it somewhere else, or on tapes separated from your network by an air gap.
As you formulate your data backup strategy, you should weigh heavily the threat of ransomware. Particularly worrisome is the trend toward double extortion ransomware attacks, which not only encrypt all your data but also steal it and threaten to publish it. Your backup, being a copy of your entire agency data set, is an appealing target for ransomware. So your data backup strategy has to ensure that: 1) no attack leaves you without access to your backups; and 2) if your backups are exfiltrated, they are useless.
In defending your backup data against ransomware, data immutability is a valuable feature. An immutable backup is a set of backup data that, once written, cannot be changed in any way — not even ransomware can change it. The ideal of immutable backups is air-gapped data storage that is isolated from the rest of your network. An example is performing an encrypted backup to tape, then removing the tape from the drive until it’s needed for recovery. That way, the backup data is protected and cannot be removed or edited.
You have plenty of options beyond the single backup software/recovery solution. You can tailor your recovery technologies from the very edge, where the data is first written, all the way through to backups that meet different requirements.