You’re not going to write out all zeros, but the end case will be the disk is full of nothing but zeros. (or partition, like /dev/sdb3, which is what I used this for). The basic idea here is you read a disk, one arbitrary chunk at a time, compute the checksum for each chunk and only write out zeros to the chunks that are not all zeros already. But it all comes down to how expensive is it to write vs read.
#!/bin/bash
# argv1 => the device or partition you want to erase
DEVICE=${1} # partition
# argv2 => how many blocks to lump together in a single checksum
COUNT=${2} # blocksize multiplier
# get the blocksize from stat, 4096 bytes in most cases
DSKBLKSZE=$(stat /dev/sdb2 --format=%o)
# figure out the chunk size
CHUNK=$(echo "$DSKBLKSZE * $COUNT" | bc)
# get the total device size
T_BYTES=$(fdisk -l /dev/sdb2 2>/dev/null | \
grep "/dev/sdb2" | awk '{print $5}')
# get the total number of chunks
T_CHUNKS=$(echo "$T_BYTES / $CHUNK" | bc)
# get the md5sum for a block of zeros for this size
MD5_ZERO=$(dd if=/dev/zero bs=$DSKBLKSZE count=$COUNT | md5sum)
That’s the setup part, and now for the loop. You should probably echo the dd instead of performing it until you’re completely sure what this does. It will overwrite the whole thing you give it as input to the script (which becomes output in dd)
for i in `seq 1 $T_CHUNKS`; do
MD5_DEV=$(dd if=$DEVICE bs=$DSKBLKSZE count=$COUNT | md5sum)
if [[ $MD5_DEV == $MD5_ZERO ]]; then
echo "$MD5_DEV is equal $MD5_ZERO"
else
SEEK=$(echo "$CHUNK * $i" | bc)
# double buffer dd using pipe
dd if=$DEVICE bs=$DSKBLKSZE count=$COUNT | \
dd of=$DEVICE bs=$DSKBLKSZE count=$COUNT seek=$SEEK
echo "$MD5_DEV != $MD5_ZERO"
fi
done
The last chunk of the disk is probably going to get truncated, in which case the checksum will not match even if it is all zeros. So the last chunk should get overwritten no matter what. But, it won’t overrun the device, so if you have multiple partitions and you overwrite the first, it won’t bleed into the second, which is a good thing.
This is key. If you’re doing this on a single disk or a RAID setup that does not yield increased sequential read speed, then this is a waste and you should just write out zeros. You can skip reading the disk and just write it out. But if your disk is a RAID 5 and you expect most of the disk to be zero already, then this should be a lot faster. (about 50% faster in my case) The down side is that given a block that is non zero, you have to read it, compute the checksum, and then write out zeros. Computing the checksum isn’t really a big deal, but having to read a chunk and then write It is a lot of overhead, and probably not worth your time unless you have the very specifc case that I had! (huge raid5, mostly zeros)