How do I break a large file into smaller files?
And how can I send the part files on multiple computers through pssh
And how do I get those files back into the client computer and reassemble them as the original file?
How do I break a large file into smaller files?
And how can I send the part files on multiple computers through pssh
And how do I get those files back into the client computer and reassemble them as the original file?
You can use split
utility in Linux to split a file either according to size or number of lines.
split - split a file into pieces
`split [OPTION]... [INPUT [PREFIX]]`
Explanation:
- Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is 'x'. With no INPUT, or when INPUT is -, read standard input.
Options:
- -a, --suffix-length=N
use suffixes of length N (default 2)
- -b, --bytes=SIZE
put SIZE bytes per output file
- -C, --line-bytes=SIZE
put at most SIZE bytes of lines per output file
- -d, --numeric-suffixes
use numeric suffixes instead of alphabetic
- -l, --lines=NUMBER
put NUMBER lines per output file
- --verbose
print a diagnostic just before each output file is opened
- --help
display this help and exit
- --version
output version information and exit
SIZE may be (or may be an integer optionally followed by) one of following: KB 1000, K 1024, MB 1000*1000, M 1024*1024, and so on for G, T, P, E, Z, Y.*
Examples:
$ split --bytes 500M --numeric-suffixes --suffix-length=3 abc abc.
(where the input filename is abc and the last argument is the output prefix)
- same with short options:
$ split -b 100k -d -a 3 abc abc.
The split commands generate pieces named: abc.000, abc.001 ...
For re-assembling the generated pieces again you can use e.g.:
$ cat abc.* > abc_2
(assuming that the shell sorts the results of shell globbing - and the number of parts does not exceed the system dependent limit of arguments)
You can compare the result via:
$ cmp abc abc_2
$ echo $?
(which should output 0)
Alternatively, you can use a combination of find/sort/xargs to re-assemble the pieces:
$ find -maxdepth 1 -type f -name 'abc.*' | sort | xargs cat > abc_3
- You could also do something like this but it will create files with 3000 lines named xaa xab xac... :
$split -l 3000 filename
- Another option, split by size of the output file (still splits on line breaks):
$split -C 50m --numeric-suffixes input_filename output_prefix
creates files like output_prefix01 output_prefix02 output_prefix03 ... each of max size 50 megabytes.
pssh:
[feddy@localhost ~]$ pscp.pssh --help
Usage: pscp.pssh [OPTIONS] local remote
Options:
--version show program's version number and exit
--help show this help message and exit
-h HOST_FILE, --hosts=HOST_FILE
hosts file (each line "[user@]host[:port]")
-H HOST_STRING, --host=HOST_STRING
additional host entries ("[user@]host[:port]")
-l USER, --user=USER username (OPTIONAL)
-p PAR, --par=PAR max number of parallel threads (OPTIONAL)
-o OUTDIR, --outdir=OUTDIR
output directory for stdout files (OPTIONAL)
-e ERRDIR, --errdir=ERRDIR
output directory for stderr files (OPTIONAL)
-t TIMEOUT, --timeout=TIMEOUT
timeout (secs) (0 = no timeout) per host (OPTIONAL)
-O OPTION, --option=OPTION
SSH option (OPTIONAL)
-v, --verbose turn on warning and diagnostic messages (OPTIONAL)
-A, --askpass Ask for a password (OPTIONAL)
-x ARGS, --extra-args=ARGS
Extra command-line arguments, with processing for
spaces, quotes, and backslashes
-X ARG, --extra-arg=ARG
Extra command-line argument
-r, --recursive recusively copy directories (OPTIONAL)
Example: pscp -h hosts.txt -l irb2 foo.txt /home/irb2/foo.txt
Example (for your case):
[feddy@localhost ~]$ touch ko001 ko002 ko003
[feddy@localhost ~]$ pscp.pssh -AH feddy@localhost ko00* Downloads/
Warning: do not enter your password if anyone else has superuser privileges or access to your account.
Password:
[1] 20:26:42 [SUCCESS] feddy@localhost
[feddy@localhost ~]$ ls Downloads/
ko001 ko002 ko003
Now, Finally, you can use cat
or find
utility to join or reassemble all files (as already mentioned above)