SoftRAID News

by Tim Standing, VP of Engineering at SoftRAID

October 6 2016

A kernel panic is a crash, the type when your computer just freezes—the mouse stops moving and everything on the screen is still. A few seconds later, your computer reboots and you may have lost all the work you had just been doing.

We pride ourselves in fixing 100% of the bugs in SoftRAID which cause kernel panics. We know how disruptive they can be to your everyday life and don’t want to be responsible for you losing work.

So I was really troubled 6 months ago when I started getting kernel panics on my development Mac, the one I use to create the SoftRAID product. I had just started using new virtualization software allowing me to run several different versions of Mac OS X on the same Mac at exactly the same time. Since the kernel panics started when I started using the new software, it seemed natural that the software was causing the kernel panics.

At first, I encountered the kernel panics every 2–3 weeks, always after I had run the virtualization software. Then they started happening every week, finally every 2–3 days. It was getting to be a real nuisance.

During this time, Mark James, one of our support engineers, was helping a customer who was also experiencing kernel panics. The customer said they were happening on his 2013 Mac Pro, while using a RAID 4 volume, created using SoftRAID, and made of 4 SSDs. The customer naturally assumed the kernel panics were caused by SoftRAID because they started occurring after he created his SoftRAID volume.
The interesting point was that this customer only encountered the bug when 64 GB of RAM was installed in the Mac Pro.

We tried to reproduce the problem, using the same amount of RAM (64 GB). However, we could not get the kernel panics to happen. A few weeks later, I asked Mark if he had resolved the issue with the customer. “Yes,” he said, “it was bad RAM. Once the customer replaced the RAM, the kernel panics disappeared entirely.”

The next day, I purchased replacement RAM for my development Mac. Since I installed it, 6 weeks ago, I haven’t had a single kernel panic.

So even though we all want to believe that kernel panics are caused by inferior software, sometimes it is actually a hardware problem—like bad RAM!

At August’s 2016 Flash Media Summit (in Santa Clara, CA) SoftRAID’s VP of Engineering, Tim Standing, talked about the challenges around SSD failure prediction.

September 9 2016

Tim started off talking about SoftRAID’s efforts to make storage more reliable: “In 2010, we added a feature for predicting disk failure, which used the results from a Google study on 100,000 rotating media disk drives. This feature can warn users weeks or months before a disk fails. The feature predicts about 75% of disk drive failures, the other 25% of the failures happen without any warning.”

SoftRAID’s success in predicting disk failure in rotating media spurred Tim and his team to develop a similar system for SSDs: “After we saw the power of failure prediction, we wanted to develop the same feature for SSDs.”

For those of us who don’t know why SSDs can’t use the same process as rotating media for failure prediction, Tim explains: “When disks with rotating media are about to fail, they start reallocating sectors. We can use the reallocated sector count as an indicator for impending disk failure; the more sectors reallocated, the nearer the disk is to failure. Unfortunately, this technique doesn’t work with SSDs because SSDs reallocate sectors during everyday use—every time a flash memory block stops working, the controller reallocates another block of flash memory to replace it. It’s not unusual for a healthy SSD to have thousands of reallocated sectors.”

So another technique needed to be used for failure prediction in SSDs, and Tim thought his team had found it: “We were excited to discover that SSDs contain a Media Wearout Indicator as one of their SMART parameters.”

Tim then described how the Media Wearout Indicator works: “Remember that SSDs have 10 – 20% extra flash memory (a 100 GB SSD actually contains 110–120 GB of flash memory). This extra flash memory is used to replace flash memory blocks that wear out as the SSD is used. The Media Wearout Indicator displays the amount of extra flash memory still available in an SSD. It goes from 100% when the SSD is new down to 0% when all this extra flash memory has been used up.

However, as Tim went on to explain, the Media Wearout Indicator didn’t turn out to be quite as useful as expected: “We had high hopes that this indicator would provide us with a predictive indicator for impending failure. Two years ago, we incorporated a mechanism for monitoring it into SoftRAID. Since then, we have seen no SSDs which have failed because all their extra flash memory has been consumed. All the SSDs we have seen fail have failed with the Media Wearout Indicator well above 80%. We are still trying to develop a reliable mechanism for predicting when SSDs will fail.”

After his talk, Tim spoke to Chris Bross of DriveSavers Data Recovery, Inc., who said that their experience was exactly the same. SSDs fail catastrophically and without warning, and the Media Wearout Indicator is not useful in predicting when they will fail.

In the Press

Industry Mentions

Sign Up for SoftRAID News

Sign up for SoftRAID news and be the first to know about critical updates, time saving tips and powerful tricks that help you setup and optimize your RAID volumes.

Don’t miss out, sign up for SoftRAID news now!

What are customers are saying

Using SoftRaid for Windows Lite Version and have 2 Western Digital M.2 NVME SSD’s in Raid 0 and speeds are incredible. No issues whatsoever. What we love is that SoftRaid for Windows supports TRIM while Windows build-in raid Does Not. Trim is essential for SSDs drives. Highly recommend SoftRaid for Windows.

Had issues after upgrading from 5.6.7 to 5.7.3, submitted a request for help and Mark James provided quick response and excellent service to resolve the problems.

Delighted to see my experience is far from unique! Looks like SoftRaid gets the Nobel Prize for 4.99 Star average! Sure get my vote. Nothing like having support that responds like you’re family! By far best experience in my decades of computing. In a universe by ourselves!

I wish more companies would take note of how support should be run. Mark offered near immediate response, addressing all my questions in plain to understand English. I wish more companies would take note of how support should be run. Thanks SoftRAID!

The new version has performed flawlessly! I was just thinking about you guys today and how reliable the software is interacting with my MacBook Pro. I did have some trouble at first but you guys got me back up and running and at the time it honestly seemed like magic.

Have been using SoftRAID Version 5(.x) since around 2014 with external (Thunderbolt) OWC RAID arrays. Perhaps the best thing I can say is that since then, except for replacing drives from time to time as needed, I’ve barely thought about SoftRAID, everything has just worked.

But what prompted me to post this review tonight was how quick/helpful/friendly tech support was late today (Friday) helping with recovering a volume. My initial email was answered within a few minutes and after maybe 5-6 back and forths (mostly waiting for me as I was also working on something else), everything was wrapped up about 2 hours later.

I was a little surprised at the quick response, but realized then I had used support once before for some type of recovery situation (and questions) and had a similar experience. While I’m sure there are plenty of issues that just don’t lend themselves to quick/same-day resolution like this, my (admittedly limited) experience with the tech support folks has been great, nice folks.