The Vault Inside the Drive: A Tech Industry Interview with FIU’s Professor Zhu

I’m an assistant professor at FIU’s Knight Foundation School of Computing and Information Sciences (KFSCIS), and I did my PhD at the University of Florida. My research centers on system security. What drew me to the hardware layer was a simple frustration: almost all our defenses live in software, which is exactly where attackers have become most effective. Once someone gains root (admin) privilege, they can disable nearly everything the operating system relies on for protection. The storage device, by contrast, sees every write a system makes and can be made far harder to tamper with. That shifts the question from “how do we stop the attacker” to “even if the attacker wins, how do we make sure the data survives?”

You have named your system LAST. Once we understand that those original file versions are still physically present on the drive, what does LAST actually do to turn that accidental lingering into a dependable recovery capability rather than a matter of luck?

LAST is a recovery guarantee, not a detector or a preventer, so it does not try to catch the attack itself. The reason is that attack detection is its own active research area, and it doesn’t conflict with the recovery we provide. It helps to think of data protection in two stages, detection and recovery. Detection is the first stage; LAST handles the second, restoring the data after something gets through. So LAST won’t warn you that you’re under attack or stop the encryption, and it stays out of anomaly detection on purpose. The thing is, no detection system can promise zero false negatives, and that is exactly the case where LAST still protects your data. So keep your detection and prevention tools in place; LAST sits underneath them as a last line of defense. A few honest limits: it works at the block level, so on its own the history it keeps lacks rich file-level context, and under an extreme, sustained write storm the retention window can shrink, though that kind of activity is loud and easy to spot. It also assumes the firmware itself is trustworthy, secured by secure boot. The point of LAST is that when your other defenses fail, the data is still there to recover.

In conversation with
Dr. Weidong Zhu — Assistant Professor, Florida International University

To be clear about its boundaries, LAST is designed to make an attack’s consequences reversible rather than to detect the attack itself. What does your approach deliberately not protect against, and what should organisations still keep in place alongside it?

Overhead is the historical reason security rarely lives in storage because keeping more data and tracking it is costly, and naive approaches are punishing. Our first attempt at versioning the DRAM cache directly roughly doubled latency, which is a non-starter. So we attacked each source of overhead individually. We added a dedicated read cache so read performance stays fast; we evict versioned data to flash in the background during idle moments and lay it out to match the flash’s internal parallelism, keeping the bookkeeping off the critical path; and we deduplicate to avoid storing redundant copies. Instead of memory-hungry structures like Bloom filters, we encode the order of data invalidation directly into how data is physically arranged, tracking the full history in about 31 MB of metadata where prior designs needed several hundred. The result is roughly 1.5% latency overhead while extending the protection window by at least 61%. If strong recovery can be a near-free default of storage rather than an expensive add-on, it becomes something you can ship in commodity devices, and that’s what makes hardware-level protection practical at scale.

Before we get into the mechanics, can you set the scene for us? Ransomware is now hitting organisations that have invested heavily in software defences and still cannot recover their data. Why has the storage layer become such a critical battleground, and why is the moment of an attack so often the moment existing protections fail?

You describe a recovery window of up to 126 days for deleted data. For organisations that currently have little to no reliable recovery capability after an SSD based attack, what does that number actually change about how they should think about incident response?

You use the analogy of a vault inside a bank to describe how your system operates independently from the compromised operating system. How important is that independence to the whole approach, and is that what makes this fundamentally different from software based recovery solutions?

The independence and isolation are our points. Almost all data-protection tooling runs inside the operating system, which is a large, exposed target; in 2023 alone there were thousands of privilege-escalation vulnerabilities reported. Once an attacker has root, they can turn off antivirus, backup agents, and version histories, because those defenses share the same ground they now control. A defense an attacker can turn off isn’t really a defense. Moving the versioning into the device puts recovery behind a much harder wall. The SSD firmware runs its own processor and memory, and it has a far smaller trusted base than a full OS, and can be locked down with secure boot and signed firmware. The host only sees a narrow interface: it can send writes, but it can’t reach in and delete the history the device is preserving. In the worst case where the computer is totally down and cannot boot up, a benign administrator can physically disconnect the drive and recover it on a trusted machine. That’s the categorical difference from software recovery: the protection doesn’t rely on the system it protects.

The good news is that this isn’t new silicon. It’s a firmware change, living in the same part of the drive that already handles caching and garbage collection, with a small footprint of about 31 MB. So there’s no need for dedicated hardware to support it. A manufacturer could simply ship firmware with the defense built in, or treat it as a new business model and offer the protection as a subscription or a one-time purchase. Beyond that, it’s mostly a question of adoption and standardization: manufacturers folding the logic into their firmware, and the industry agreeing on an NVMe-style command set so recovery tools can probe versions and roll back the same way across vendors. You also need key management so that only an authorized administrator can recover data, plus the trusted-computing features better drives already ship with, like secure boot, signed firmware, and ideally a TPM. The hardware trend helps as well, since cheaper, larger flash such as QLC gives you a generous retention budget at no extra cost. Most of the pieces already exist. What’s still missing is manufacturers deciding to make recovery a default, and a standard way to talk to it.

We called the system LAST because it lets the data last on the drive, giving you a longer protection window against data loss from an attack. Today’s drives throw that retention away in two ways, and LAST fixes both. First, the DRAM cache normally overwrites data in place on a write hit, so when a file is re-encrypted, its original copy is gone before it ever reaches flash. LAST never overwrites in the cache, and it keeps a separate read cache so performance doesn’t suffer. Second, garbage collection has no idea which old versions are worth keeping. LAST tracks the order in which data was invalidated, and it does this without a memory-hungry index. It physically arranges the data so the layout itself records the sequence. When the drive needs space, it erases the oldest data first and leaves the recent history intact, and it keeps a file’s pieces together so you never end up recovering half a file. The result is that recovery becomes dependable rather than a matter of luck. Accidental data loss still happens on ordinary drives, but with LAST users can reliably get back what they recently deleted.

The main reason to look at storage is straightforward: storage is where the data actually lives, so protecting it there is the most direct approach. There’s a second reason that matters just as much. A storage device is naturally isolated from the operating system. The OS is large and complex, with many moving parts, and all that complexity gives attackers more ways in. A drive is the opposite, a much simpler design with a small but robust mix of hardware and firmware. Building the defense into storage gives you better security while still keeping convenient, manageable access to the data. Today, most data protection runs inside the OS: antivirus, backup agents, snapshot tools, version histories, and so on. Organizations have spent heavily on these, but they all sit on the same foundation, and that foundation is the first thing a serious attacker takes over. Once ransomware gets root, it doesn’t bother fighting your defenses. It simply turns them off, then encrypts or wipes the data, and increasingly the backups too. The storage device is the one place the attacker still has to go through but cannot fully own. It sees every write the system makes, and it can be sealed off from the host. That’s why we think the real battleground has moved down to the drive. It’s the last place the data physically exists and can be protected even after the host itself is gone.

One of the reasons security has historically not been built directly into storage hardware is the computing overhead involved. You have achieved at least a 60 percent improvement in the data protection window with minimal impact on performance. How did you approach that tradeoff and what does solving it open up for the broader field?

You have framed strong recovery as something that could become a near-free default of storage rather than an expensive add-on. Realistically, what would it take for a capability like this to reach commodity drives, and what has to happen with manufacturers and standards for that to become real?

It changes the assumption you’re forced to make about when you discover an attack. Detection is often slow because sophisticated ransomware mimics legitimate software, and some intrusions sit quietly for weeks or months. If your only recovery is a backup from a fixed moment, you’re betting you caught the attacker before that backup was already poisoned. Against a patient adversary, that’s a bad bet. A long retention window decouples recovery from the speed of detection. In our evaluation, LAST preserved historical data for up to 126 days, averaging about 53. For an organization with little reliable recovery today, incident response could still be effective even if the attack is not detected on time. It’s a longer fail-safe window against stealthy attacks.

Most people assume that when data is deleted it is simply gone, but the reality of how solid state drives handle deletion is far more complex and far more consequential after an attack. Can you walk us through what is actually happening inside an SSD when ransomware strikes?

DZ

Professor Zhu, before we get into the research itself, could you tell us a little about your background and what drew you to the intersection of hardware and cybersecurity at FIU?

This is the counterintuitive part. When you delete or overwrite a file on an SSD, the old data usually isn’t erased right away. Flash memory in the storage can only erase in large blocks, so the drive writes the new version to a fresh location, marks the old copy invalid, and only reclaims that space later through garbage collection. So for some time after ransomware encrypts your files, the original, unencrypted versions may still be physically sitting on the chips. That lingering is effectively accidental versioning, but today’s drives didn’t exploit it well. Two things can work against data retention. First, SSDs use a DRAM cache for speed, and when ransomware overwrites recently-written data still in that cache, the original is lost before it ever reaches flash. Second, garbage collection has no idea which old versions matter, so it can erase exactly the data you’d have wanted. My work, LAST, is about not discarding it.

Similar Posts