1

How do I break a large file into smaller files?

And how can I send the part files on multiple computers through pssh

And how do I get those files back into the client computer and reassemble them as the original file?

Zanna
  • 70,465

1 Answers1

2

You can use split utility in Linux to split a file either according to size or number of lines.

  • split - split a file into pieces

          `split [OPTION]... [INPUT [PREFIX]]`
    

Explanation:

  • Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is 'x'. With no INPUT, or when INPUT is -, read standard input.

Options:

  • -a, --suffix-length=N use suffixes of length N (default 2)
  • -b, --bytes=SIZE put SIZE bytes per output file
  • -C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
  • -d, --numeric-suffixes use numeric suffixes instead of alphabetic
  • -l, --lines=NUMBER put NUMBER lines per output file
  • --verbose print a diagnostic just before each output file is opened
  • --help display this help and exit
  • --version output version information and exit

SIZE may be (or may be an integer optionally followed by) one of following: KB 1000, K 1024, MB 1000*1000, M 1024*1024, and so on for G, T, P, E, Z, Y.*

Examples:

  • $ split --bytes 500M --numeric-suffixes --suffix-length=3 abc abc.

(where the input filename is abc and the last argument is the output prefix)

  • same with short options:

$ split -b 100k -d -a 3 abc abc.

The split commands generate pieces named: abc.000, abc.001 ...

For re-assembling the generated pieces again you can use e.g.:

$ cat abc.* > abc_2

(assuming that the shell sorts the results of shell globbing - and the number of parts does not exceed the system dependent limit of arguments)

You can compare the result via:

$ cmp abc abc_2 $ echo $?

(which should output 0)

Alternatively, you can use a combination of find/sort/xargs to re-assemble the pieces:

$ find -maxdepth 1 -type f -name 'abc.*' | sort | xargs cat > abc_3

  • You could also do something like this but it will create files with 3000 lines named xaa xab xac... :

$split -l 3000 filename

  • Another option, split by size of the output file (still splits on line breaks):

$split -C 50m --numeric-suffixes input_filename output_prefix

creates files like output_prefix01 output_prefix02 output_prefix03 ... each of max size 50 megabytes.

pssh:

[feddy@localhost ~]$ pscp.pssh --help
Usage: pscp.pssh [OPTIONS] local remote

Options:
  --version             show program's version number and exit
  --help                show this help message and exit
  -h HOST_FILE, --hosts=HOST_FILE
                        hosts file (each line "[user@]host[:port]")
  -H HOST_STRING, --host=HOST_STRING
                        additional host entries ("[user@]host[:port]")
  -l USER, --user=USER  username (OPTIONAL)
  -p PAR, --par=PAR     max number of parallel threads (OPTIONAL)
  -o OUTDIR, --outdir=OUTDIR
                        output directory for stdout files (OPTIONAL)
  -e ERRDIR, --errdir=ERRDIR
                        output directory for stderr files (OPTIONAL)
  -t TIMEOUT, --timeout=TIMEOUT
                        timeout (secs) (0 = no timeout) per host (OPTIONAL)
  -O OPTION, --option=OPTION
                        SSH option (OPTIONAL)
  -v, --verbose         turn on warning and diagnostic messages (OPTIONAL)
  -A, --askpass         Ask for a password (OPTIONAL)
  -x ARGS, --extra-args=ARGS
                        Extra command-line arguments, with processing for
                        spaces, quotes, and backslashes
  -X ARG, --extra-arg=ARG
                        Extra command-line argument
  -r, --recursive       recusively copy directories (OPTIONAL)

Example: pscp -h hosts.txt -l irb2 foo.txt /home/irb2/foo.txt

Example (for your case):

[feddy@localhost ~]$ touch ko001 ko002 ko003
[feddy@localhost ~]$ pscp.pssh -AH feddy@localhost ko00* Downloads/
Warning: do not enter your password if anyone else has superuser privileges or access to your account.
Password:
[1] 20:26:42 [SUCCESS] feddy@localhost
[feddy@localhost ~]$ ls Downloads/
ko001   ko002   ko003

Now, Finally, you can use cat or find utility to join or reassemble all files (as already mentioned above)

bsdboy
  • 84