Frequently Asked Questions: Bad Blocks

What are bad blocks?

First you need to understand a little about how disk storage works, and how data is organised on a typical hard disk.

Disks are divided up into blocks; the arrangement of the blocks on the physical surface(s) of the disk doesn’t matter for the purposes of this discussion, but it is worth knowing that each block is in fact a particular sector on a particular track (a concentric ring of sectors at a particular distance from the disk hub) of a particular platter (a physical disk surface). Sometimes you may also see the term cylinder, which refers to a group of tracks, one on each surface of each platter, all at the same distance from the hub (if you visualise this in your mind, the tracks form a cylinder).

In order to avoid giving bad data back to the user of a disk, each block, in addition to the space provided for data storage (usually 512 bytes per sector, though not always), there is a checksum of some description, often a cyclic redundancy check or CRC. Whenever the disk reads data from a block, it will check the CRC to make sure that the data and CRC match. Whenever it writes data to a block, it will compute a new CRC based on that data. A single bit error in the block will be enough to ensure that the CRC will not match and the disk will know that the data it read back was incorrect.

A bad block happens when either

  1. the magnetic medium at that location on the disk is defective (a hard failure), or
  2. the CRC for that block does not match the data read back by the disk (a soft failure).

A block that has suffered a hard failure cannot be repaired, since the problem is physical, however, the fact that we are using block numbers rather than explicitly specifying which sector, track and surface we wish to address means that the disk is able to re-assign a currently unused part of one of its surfaces to hold data for this particular block number. This process is called remapping or sparing, and with modern disks it happens automatically when you overwrite a block that has failed in this way.

It is possible for a disk to run out of spare blocks, in which case this process will fail. In such a situation, the only solution is to replace the disk itself.

A block that has suffered a soft failure does not require sparing, since the magnetic medium is quite capable of storing data; in this case, writing over the bad block will simply result in the disk writing new data and a new CRC to the block.

In both cases, the bad block will not be readable by ordinary software on your machine.

How common are bad blocks?

So common that every disk manufactured comes out of the factory with a list of bad blocks that were found during quality control. It would be highly unusual for a newly manufactered disk to have no bad blocks, however, most if not all of them should be caught by the manufacturer and will not affect your data.

Bad blocks that develop later on in the life of the disk can be more common in units that have been mistreated (e.g. exposed to high temperatures, or placed in other environments likely to have an adverse affect on sensitive electronic equipment). But they can happen to any disk, at any time.

Often you won’t notice them because of the sparing mechanism, and because modern disks typically use error correcting codes, so that a certain rate of read errors are tolerable. Indeed, some models now read and re-write data when they are not otherwise busy in an effort to clean up any errors that they do detect.

What are the symptoms?

If you are running iPartition or iDefrag on a disk that has bad blocks, typically the programs will stop and present you with an error message containing the text “errno 5”. Usually this will be accompanied by a message telling you that the program was unable to move some data or a file from one place to another. In the case of iDefrag, the full path of the file will also be given.

In more general terms, the symptoms can vary. Bad blocks are often not handled well by application software, as a lot of developers—like a lot of users—assume that disk storage is reliable. If the bad blocks are in files that are rarely, if ever, accessed on your machine (e.g. language files that you aren't using, or programs that you never run), it is unlikely that you will notice their presence. If, however, they are in files critical to the correct operation of the system, symptoms could range from not being able to read your disk through to particularly types of operation failing (e.g. not being able to print, or not being able to render text in a particular font).

What can I do about the problem?

The best solution is to find the affected file and replace it with a known good copy from a backup. When you write over the original, this will cause the hard disk to either spare the bad block, or to fix the CRC and/or data.

If the file is something that you will never use, you can simply delete it. An example might be a Korean language file on a machine that only ever runs in U.S. English.

If the file is a temporary file, a cache file or a log file, or some other type of file that will be recreated automatically by the system or by some software you use, you can also just delete it.

If the affected file is part of the structure of your filesystem (e.g. the Catalog file on HFS+, or the MFT on NTFS), you may find that a disk repair tool can help. Note though that it is not going to be able to fix the bad block itself. What it might be able to do is replace it with valid data and (hopefully) still leave all your files accessible. On the Mac, Alsoft’s DiskWarrior would be a good choice if the Disk Utility program provided with the machine cannot effect a repair.

Finally, if the file is a data file containing critical data, and you don’t have a backup, you will need to find someone with the necessary expertise to help you to repair the file.

I’ve found this great program that says it can repair bad blocks

There are, broadly speaking, two groups of software that make this claim; the first simply write zeroes over the block that has gone bad, which makes the block itself good, but results in damage to the affected file (depending on the file, this might even be silent corruption, which is very bad indeed). The second make use of erroneous statistical methods based on misconceptions about how hard disk storage actually works; typically they might try to read the raw data from the sector a few hundred times and then guess at the “correct” values of each bit based on the number of times it was a 1 or a 0. Unfortunately, modern hard disks don’t store bits—in fact, disks never really did (aside possibly for some very early designs, the data has always been encoded somehow). These days with techniques like PRML, the data you get back from the disk is already the most likely data (that's what the ML part of PRML means). If you’re interested in the details of how this all works, you could read up on Viterbi decoders and error correcting codes.

Both kinds of program are rather insidious since there is a fair chance that you will be able to open your file in the program that saved it. That might be useful, sometimes, but the trouble is that it’s quite possible that the file will be corrupted in such a manner that saving it again will result in another corrupted file. Typically, per Murphy’s Law, this will only bite you several years down the road when you have upgraded that piece of software and suddenly discover that you can no longer open the document in question for no apparent reason.

Anyway, a program might be able to make a bad block good, but it probably can’t fix your file for you.

What if I have a lot of bad blocks?

The best thing to do here is probably to erase your disk, making sure to use the option to zero it (in Disk Utility this is under “Security Options…” on the “Erase” tab). This will overwrite every block on the disk with zeroes, causing the disk itself to spare any hard failed bad blocks and to write a correct CRC on the remainder.

Obviously if you do that, you will lose all the data on the disk unless you have a backup.

I don’t have a backup

That is unfortunate. We specifically advise our customers that they should back up any data they care about.

Please also note that if you already have a bad block on your disk, there is no point trying to back up the affected file and then restoring from that backup. Your backup software won’t be able to read the file properly either, so it will (hopefully) refuse to back it up, though it’s possible that some software might simply write the wrong data to the bad region, or even to the rest of the file if it isn’t watching for errors. Check with your backup software vendor how it handles bad blocks.

It was working fine before I ran your software

This type of problem is a hardware fault and cannot be caused by software. We do have sympathy for users affected by hardware faults, but we cannot accept responsibility for problems with your equipment.


Valid XHTML 1.0! Valid CSS!