4

From a given a file, I have a requirement to create a copy that is padded with zeros to a specific size.

If you create a file with the following.

echo test >testfile

The output of the following command is inconsistent.

cat testfile /dev/zero | dd bs=256k count=1 status=none | od -c

This is the output that I would expect.

0000000   t   e   s   t  \n  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000020  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
1000000

But you also randomly get either of the following.

0000000   t   e   s   t  \n
0000005
0000000   t   e   s   t  \n  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000020  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0400000  \0  \0  \0  \0  \0
0400005

Why does this command have inconsistent behavior?

Even if dd is cutting the pipe off at the end of the first file, The 128k result is strange. I get the same inconsistent results under 16.04, 18.04 and 19.04 systems.

John JJ
  • 478

2 Answers2

4

You need to specify full blocks. Try:

cat testfile /dev/zero | dd bs=256k iflag=fullblock count=1 status=none | od -c

Documentation

From man dd:

fullblock
        accumulate full blocks of input (iflag only)

Example

Observe that, without fullblock, the byte counts are inconsistent:

$ cat testfile /dev/zero | dd bs=256k count=1 status=none | wc -c
5
$ cat testfile /dev/zero | dd bs=256k count=1 status=none | wc -c
262144
$ cat testfile /dev/zero | dd bs=256k count=1 status=none | wc -c
262144
$ cat testfile /dev/zero | dd bs=256k count=1 status=none | wc -c
5

With iflag=fullbock, I see consistent full byte counts:

$ cat testfile /dev/zero | dd bs=256k iflag=fullblock count=1 status=none | wc -c
262144
$ cat testfile /dev/zero | dd bs=256k iflag=fullblock count=1 status=none | wc -c
262144
$ cat testfile /dev/zero | dd bs=256k iflag=fullblock count=1 status=none | wc -c
262144
$ cat testfile /dev/zero | dd bs=256k iflag=fullblock count=1 status=none | wc -c
262144
$ cat testfile /dev/zero | dd bs=256k iflag=fullblock count=1 status=none | wc -c
262144
$ cat testfile /dev/zero | dd bs=256k iflag=fullblock count=1 status=none | wc -c
262144
$ cat testfile /dev/zero | dd bs=256k iflag=fullblock count=1 status=none | wc -c
262144
$ cat testfile /dev/zero | dd bs=256k iflag=fullblock count=1 status=none | wc -c
262144
John1024
  • 13,687
  • 43
  • 51
  • 1
    Thanks. Looking at the history of dd+fullblock and other related posts, this makes sense. I would say this should at least generate a warning or something for a partial block read and not just produce inconsistent results. I guess it's all in the timing. – John JJ May 30 '19 at 13:01
  • @JohnJJ I agree: dd's current default behavior and its lack of warnings are awful. – John1024 May 30 '19 at 18:21
  • 1
    GNU dd did throw errors about partial read in my tests, however it was inconsistent. The fullblock solution is proper, hence +1, and on fact was mentioned in the error messages I got – Sergiy Kolodyazhnyy Dec 09 '19 at 07:28
3

The core of the issue is two-fold. One part of the problem is short or partial read(). Per POSIX specifications:

A partial input block is one for which read() returned less than the input block size.

This is typical with pipes and that's exactly what's happening in the question. One solution is to use GNU extension iflag=fullblock, and this is the version Ubuntu uses. From GNU dd manual:

Note if the input may return short reads as could be the case when reading from a pipe for example, ‘iflag=fullblock’ will ensure that ‘count=’ corresponds to complete input blocks rather than the traditional POSIX specified behavior of counting input read operations.

POSIX dd, MirOS dd , FreeBSD dd - these do not have such option (although there were requests to add that to POSIX spec). So how do we write portable scripts with dd that you may want to port from Ubuntu to say FreeBSD ? Well, part of the issue is the count=1 flag. It tells dd how many read() calls to perform. Try to perform multiple traces on dd if=/dev/urandom | strace -e read dd of=/dev/null bs=256k count=1 and you will see there's always only one read(), which is often partial. (Note also, don't be surprised if you see 262144 bytes read instead of 256,000, because 256k is 256*1024=262144)

The solution is to flip the parameters , that is make the block size bs=1 and count=256k. That way we ensure there's no partial reads and we always read 1 byte, but we will do that 256k times. And yes, this is a lot slower and will take a lot longer with data in range of Gigabytes/Terabytes. In my tests, iflag=fullblock was about 100 times faster (difference between 5 milliseconds and 700 milliseconds on the 256k bytes). However, the advantage is that this is portable and doesn't have to rely on GNU dd extension, especially you cannot always install GNU dd

Sergiy Kolodyazhnyy
  • 105,154
  • 20
  • 279
  • 497