Category: Unix

  • Reblogging: My experience with using cp to copy a lot of files 432 millions, 39 TB

    One morning I was notified that a disk had failed. No big deal, this happens
    now and then. I called Dell and next day I had a replacement disk. While
    rebuilding, the replacement disk failed, and in the meantime another disk had
    also failed. Now Dell’s support wisely suggested that I did not just replace
    the failed disks as the array may have been punctured. Apparently, and as I
    understand it, disks are only reported as failed when they have sufficiently
    many bad blocks, and if you’re unlucky you can lose data if 3 corresponding
    blocks on different disks become bad within a short time, so that the RAID
    controller does not have a chance to detect the failures, recalculate the data
    from the parity, and store it somewhere else. So even though only two drives
    flashed red, data might have been lost.

    Having almost used up the capacity we decided to order another storage
    enclosure, copy the files from the old one to the new one, and then get the old
    one into a trustworthy state and use it to extend the total capacity. Normally
    I’d have copied/moved the files at block-level (eg. using dd or pvmove), but
    suspecting bad blocks, I went for a file-level copy because then I’d know which
    files contained the bad blocks. I browsed the net for other peoples’ experience
    with copying many files and quickly decided that cp would do the job nicely.
    Knowing that preserving the hardlinks would require bookkeeping of which files
    have already been copied I also ordered 8 GB more RAM for the server and
    configured more swap space.

    When the new hardware had arrived I started the copying, and at first it
    proceeded nicely at around 300-400 MB/s as measured with iotop. After a while
    the speed decreased considerably, because most of the time was spent creating
    hardlinks, and it takes time to ensure that the filesystem is always in a
    consistent state. We use XFS, and we were probably suffering for not disabling
    write barriers which can be done when the RAID controller has a write cache
    with a trustworthy battery backup. As expected, the memory usage of the cp
    command increased steadily and was soon in the gigabytes.

    For more visit: My experience with using cp to copy a lot of files 432 millions, 39 TB.