Parsing a file using text processing tools

Question

A file looks like:

1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0

Can I use some text processing tool to split it into two files such as:

1:

1140.271257 0.002288454025 0.002763420728 0.00414251259
1479.704769 0.00146621631 0.003190634646 0.003672029231
1663.276205 0.003379552854 0.04643209167 0.0539399155

2:

0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0

Just get the first numbers, which are not 0, and then just put the rest in another file... if the file could be named like the original file name with a x1 and x2 or so it would be cool.

The count of 0 in your input file in the last line doesn't match with the count of 0 in your output — A.B., Sep 09 '15 at 09:25

A.B. · Answer 1 · 2015-09-09T15:40:26.367

With awk. The command below checks every entry in every line and writes in different files, in my example out1 and out2. If there is a newline in the input file, also a newline will be written in the output file.

awk '{for(i=1;i<=NF;i++) {if($i!=0) {printf "%s ",$i > "out1"} else {printf "%s ",$i > "out2"}; if (i==NF) {printf "\n" > "out1"; printf "\n" > "out2"} }}' foo

Example

The input file

cat foo

1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0

The command

awk '{for(i=1;i<=NF;i++) {if($i!=0) {printf "%s ",$i > "out1"} else {printf "%s ",$i > "out2"}; if (i==NF) {printf "\n" > "out1"; printf "\n" > "out2"} }}' foo

The output files

cat out1

1140.271257 0.002288454025 0.002763420728 0.004142512599 
1479.704769 0.00146621631 0.003190634646 0.003672029231 
1663.276205 0.003379552854 0.04643209167 0.0539399155

cat out2

0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0

@heineman: As you're a reputation 6 user: If this answer helped you, don't forget to click the grey ☑ at the left of this text, which means Yes, this answer is valid! ;-) — Fabby, Sep 09 '15 at 19:17

kos · Answer 2 · 2015-09-09T11:58:52.280

3

You can indeed use a text processing tool to do so, but if the purpose is to separate the first 4 fields from what is following them using cut is enough:

 cut -d ' ' -f 1-4 infile > outfile1
 cut -d ' ' -f 5- infile > outfile2

user@debian ~/tmp % cat infile
1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0 
user@debian ~/tmp % cut -d ' ' -f 1-4 infile
1140.271257 0.002288454025 0.002763420728 0.004142512599
1479.704769 0.00146621631 0.003190634646 0.003672029231
1663.276205 0.003379552854 0.04643209167 0.0539399155
user@debian ~/tmp % cut -d ' ' -f 5- infile 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0

edited Sep 09 '15 at 11:58

answered Sep 09 '15 at 09:13

kos

35,891

There's no need for all these quotes: cut -d' ' -f5- file and cut -d' ' -f1-4 file work perfectly well. – terdon Sep 09 '15 at 11:33
@terdon Indeed. Initially I enclosed everything for a mere and arguable stylistic choice (tough I forgot that cut takes filenames in the arguments). However looking at it again... I agree it doesn't look good, and after all I prefer the minimalistic approach too :). Changed. (I like that Perl stdout / stderr trick by the way) – kos Sep 09 '15 at 12:02

score 2 · Answer 3 · answered Sep 09 '15 at 09:10

I would recommend using perl for this. save your input in input.txt and run the following command:

cat input.txt | perl -ane 'foreach(@F){   #loop through input and split each line into an array
  chomp; #remove trailing newline
  if($_ == 0){   #print the element to STDOUT if it is "0"
    print $_," "
  }
  else{     #print the element to STDERR if it is not "0"
    print STDERR $_," "
    }
  };
  print "\n"; print STDERR "\n";' #add a newline at the end 
> x2.txt 2> x1.txt    #redirect STDOUT to x2.txt and STDERR to x1.txt

here as one-liner to copy paste:

cat input.txt | perl -ane 'foreach(@F){chomp;if($_ == 0){print $_," "}else{print STDERR $_," "}};print "\n"; print STDERR "\n";' > x2.txt 2> 1.txt

score 2 · Answer 4 · answered Sep 09 '15 at 09:58

Just get the first numbers, which are not 0, and then just put the rest in another file

In that case you can use grep with Perl Compatible Regex (-P) :

To get the first numbers that are not zero :

$ grep -Po '^.*\s\d+\.\d+(?=\s0\s.*)' file.txt 
1140.271257 0.002288454025 0.002763420728 0.004142512599
1479.704769 0.00146621631 0.003190634646 0.003672029231
1663.276205 0.003379552854 0.04643209167 0.0539399155

^.*\s\d+\.\d+ will get our desired portion
(?=\s0\s.*) is a zero width positive lookahead pattern ensuring that we have the starting of zeros after our desired postion

To save it as filex1.txt :

grep -Po '^.*\s\d+\.\d+(?=\s0\s.*)' file.txt >filex1.txt

To get the rest i.e. zeros :
```
$ grep -Po '\s\d+\.\d+\s\K0\s.*' file.txt 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0
```
- \s\d+\.\d+\s will make sure that we have a non-zero entry prior to our desired portion, \K will discard the match
- 0\s.* will get us the desired portion i.e. zero entries starting from first one
To save it as filex2.txt :
```
grep -Po '\s\d+\.\d+\s\K0\s.*' file.txt >filex2.txt
```

terdon · Answer 5 · 2015-09-09T12:16:51.283

Another approach using Perl:

perl -lne '/(.*?)\s(0\s.*)/; print "$1"; print STDERR "$2"' file > filex1 2> filex2

The regular expression will match everything up to the 1st 0 surrounded by whitespace and then everything from that 0 to the end of the line. The parentheses capture those two groups as $1 and $2 respectively. The -l turns on automatic trailing newline removal (chomp) and adds a \n to each print call. So, we print $1 to standard output and $2 to standard error and then redirect each to a different file.

Since this is Perl, there's more than one way to do it. This is the same idea as Wayne_Yux's answer but simplified:

perl -lane '@A=grep{$_==0}@F; @B=grep{$_!=0}@F;print STDERR "@A"; print "@B"' file > filex1 2>filex2

Alternatively, a simpler grep -P:

grep -oP '^.+?(?=\s0\s)' file > filex1
grep -oP ' \K0 .*' file > filex2

score 0 · Answer 6 · answered Sep 09 '15 at 13:07

Assuming once you get a 0 all the rest of fields are like this, you can say:

awk -v FS=" 0 " '{print $1 > "f1"; gsub($1 " ",""); print > "f2"}' file

This sets the field separator to the string 0 and prints the first field (that is, up to the first 0) into the file f1. Then, it removes this first field from the original line and prints its result into the file f2.

Parsing a file using text processing tools

6 Answers6