2

I have the following code that works fine:

$ awk 'FILENAME==ARGV[1] {a[FNR]=$0} FILENAME==ARGV[2] {print a[FNR],FS,$0}' tab1 tab2

(tab1 contains uppercases) The output is:

3A  3B  3C  3D   1a  1b  1c  1d
3A  3B  3C  3D   2a  2b  2c  2d
3A  3B  3C  3D   3a  3b  3c  3d

This method uses an array a[] which could be very large for the files I want to use. Is there a way I can avoid using the array? I just want to read the first line of tab1 and the first line of tab2, then process and print and then move on to the next line?

The number of input files could be as many as five.

steeldriver
  • 136,215
  • 21
  • 243
  • 336
  • 3
    You appear to be re-inventing the paste utility... – steeldriver Aug 12 '19 at 16:19
  • 1
    @steeldriver Even though the question is asking about awk, it seems to me that an answer about paste would be appropriate. Would you be willing to post one? – Eliah Kagan Aug 12 '19 at 16:52
  • 1
    I'd forgotten about paste. I might be able to use paste and create an intermediate file and then use awk to post-process the lines . Thanks for that steeldriver –  Aug 12 '19 at 19:21

1 Answers1

2

Yes, you can read from multiple files at the same time using awk.

In order to do that, you should use the getline command to explicitly control input.

In particular, you want to use getline with a file so that you're not reading from the file(s) passed in as main arguments to awk.

One possibility, in your case:

$ awk '{
      getline line < "tab2"
      print $0, FS, line
  }' tab1

This is not doing much error handling or anything, and as you can notice, the filename is now hardcoded in the awk script. You can fix those issues somehow, at the cost of making the script really uglier.

So, while yes, you can do this from awk.

(Side note: Whether it's a good idea to do this from awk, that's another story... In my opinion, this starts getting messy really quickly from here, so you're probably better off switching now to a higher level language, such as Python, Perl, Ruby, Lua, etc., as that kind of language will give you much better support for file handles and objects than what you'll get in awk. While bash / shell scripting would do somewhat better in dealing with multiple files, I wouldn't recommend that either, since it doesn't do as good as the aforementioned ones in writing modular maintainable testable code. Just my 2c.)

filbranden
  • 2,631
  • 1
    Thanks filbranden. That is excellent. As you say it might get very messy when I have four or five input files. I will do it in Python then. Thanks again –  Aug 12 '19 at 18:56