5

A file looks like:

1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0 

Can I use some text processing tool to split it into two files such as:

1:

1140.271257 0.002288454025 0.002763420728 0.00414251259
1479.704769 0.00146621631 0.003190634646 0.003672029231
1663.276205 0.003379552854 0.04643209167 0.0539399155

2:

0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 

Just get the first numbers, which are not 0, and then just put the rest in another file... if the file could be named like the original file name with a x1 and x2 or so it would be cool.

terdon
  • 100,812

6 Answers6

6

With awk. The command below checks every entry in every line and writes in different files, in my example out1 and out2. If there is a newline in the input file, also a newline will be written in the output file.

awk '{for(i=1;i<=NF;i++) {if($i!=0) {printf "%s ",$i > "out1"} else {printf "%s ",$i > "out2"}; if (i==NF) {printf "\n" > "out1"; printf "\n" > "out2"} }}' foo

Example

The input file

cat foo

1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0

The command

awk '{for(i=1;i<=NF;i++) {if($i!=0) {printf "%s ",$i > "out1"} else {printf "%s ",$i > "out2"}; if (i==NF) {printf "\n" > "out1"; printf "\n" > "out2"} }}' foo

The output files

cat out1

1140.271257 0.002288454025 0.002763420728 0.004142512599 
1479.704769 0.00146621631 0.003190634646 0.003672029231 
1663.276205 0.003379552854 0.04643209167 0.0539399155 

cat out2

0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0
A.B.
  • 90,397
3

You can indeed use a text processing tool to do so, but if the purpose is to separate the first 4 fields from what is following them using cut is enough:

 cut -d ' ' -f 1-4 infile > outfile1
 cut -d ' ' -f 5- infile > outfile2
user@debian ~/tmp % cat infile
1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0 
user@debian ~/tmp % cut -d ' ' -f 1-4 infile
1140.271257 0.002288454025 0.002763420728 0.004142512599
1479.704769 0.00146621631 0.003190634646 0.003672029231
1663.276205 0.003379552854 0.04643209167 0.0539399155
user@debian ~/tmp % cut -d ' ' -f 5- infile 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 
kos
  • 35,891
  • There's no need for all these quotes: cut -d' ' -f5- file and cut -d' ' -f1-4 file work perfectly well. – terdon Sep 09 '15 at 11:33
  • @terdon Indeed. Initially I enclosed everything for a mere and arguable stylistic choice (tough I forgot that cut takes filenames in the arguments). However looking at it again... I agree it doesn't look good, and after all I prefer the minimalistic approach too :). Changed. (I like that Perl stdout / stderr trick by the way) – kos Sep 09 '15 at 12:02
2

I would recommend using perl for this. save your input in input.txt and run the following command:

cat input.txt | perl -ane 'foreach(@F){   #loop through input and split each line into an array
  chomp; #remove trailing newline
  if($_ == 0){   #print the element to STDOUT if it is "0"
    print $_," "
  }
  else{     #print the element to STDERR if it is not "0"
    print STDERR $_," "
    }
  };
  print "\n"; print STDERR "\n";' #add a newline at the end 
> x2.txt 2> x1.txt    #redirect STDOUT to x2.txt and STDERR to x1.txt

here as one-liner to copy paste:

cat input.txt | perl -ane 'foreach(@F){chomp;if($_ == 0){print $_," "}else{print STDERR $_," "}};print "\n"; print STDERR "\n";' > x2.txt 2> 1.txt
Wayne_Yux
  • 4,873
2

Just get the first numbers, which are not 0, and then just put the rest in another file

In that case you can use grep with Perl Compatible Regex (-P) :

  • To get the first numbers that are not zero :

    $ grep -Po '^.*\s\d+\.\d+(?=\s0\s.*)' file.txt 
    1140.271257 0.002288454025 0.002763420728 0.004142512599
    1479.704769 0.00146621631 0.003190634646 0.003672029231
    1663.276205 0.003379552854 0.04643209167 0.0539399155
    
    • ^.*\s\d+\.\d+ will get our desired portion

    • (?=\s0\s.*) is a zero width positive lookahead pattern ensuring that we have the starting of zeros after our desired postion

    To save it as filex1.txt :

    grep -Po '^.*\s\d+\.\d+(?=\s0\s.*)' file.txt >filex1.txt
    
  • To get the rest i.e. zeros :

    $ grep -Po '\s\d+\.\d+\s\K0\s.*' file.txt 
    0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0
    
    • \s\d+\.\d+\s will make sure that we have a non-zero entry prior to our desired portion, \K will discard the match

    • 0\s.* will get us the desired portion i.e. zero entries starting from first one

    To save it as filex2.txt :

    grep -Po '\s\d+\.\d+\s\K0\s.*' file.txt >filex2.txt
    
heemayl
  • 91,753
2

Another approach using Perl:

perl -lne '/(.*?)\s(0\s.*)/; print "$1"; print STDERR "$2"' file > filex1 2> filex2

The regular expression will match everything up to the 1st 0 surrounded by whitespace and then everything from that 0 to the end of the line. The parentheses capture those two groups as $1 and $2 respectively. The -l turns on automatic trailing newline removal (chomp) and adds a \n to each print call. So, we print $1 to standard output and $2 to standard error and then redirect each to a different file.

Since this is Perl, there's more than one way to do it. This is the same idea as Wayne_Yux's answer but simplified:

perl -lane '@A=grep{$_==0}@F; @B=grep{$_!=0}@F;print STDERR "@A"; print "@B"' file > filex1 2>filex2

Alternatively, a simpler grep -P:

grep -oP '^.+?(?=\s0\s)' file > filex1
grep -oP ' \K0 .*' file > filex2
terdon
  • 100,812
0

Assuming once you get a 0 all the rest of fields are like this, you can say:

awk -v FS=" 0 " '{print $1 > "f1"; gsub($1 " ",""); print > "f2"}' file

This sets the field separator to the string 0 and prints the first field (that is, up to the first 0) into the file f1. Then, it removes this first field from the original line and prints its result into the file f2.

fedorqui
  • 10,069