9

Since I learned some bash syntax, I have being very enthusiastic about its use in daily life. A famous command is grep. In case one with to grep something but ignore several files, the commands below MAY work.

grep_ignore=("token_a", "token_b")
grep -rnw . -e "token2" | grep -v <(printf '%s\n' "${grep_ignore[@]}")

How to reproduce:

  1. Create some dummy folder: command run mkdir dummy & cd dummy

  2. Create files:

    a. file_token_a.txt: command run echo "token1 token2" > file_token_a.txt;

    b. file_token_b.txt: command run echo "token1 token3" > file_token_b.txt

    c. file_token_c.txt: command run echo "token2 token3" > file_token_c.txt

Command run:

grep_ignore=("token_a", "token_b")
grep -rnw . -e "token2" | grep -v <(printf '%s\n' "${grep_ignore[@]}")

Expected output:

./file_token_c.txt:1:token2 token3

Given output:

./file_token_c.txt:1:token2 token3
./file_token_a.txt:1:token1 token2

2 Answers2

12

There are two issues with your attempt:

  1. your array construction has an erroneous comma, which makes the first pattern token_a, instead of token_a

  2. <(printf '%s\n' "${grep_ignore[@]}") is being passed to grep -v as a file to be searched a pattern consisting of the process substitution's file descriptor string like /dev/fd/631, rather than as a list of patterns - to have patterns read from a file (or process substitution) you need to make it an argument to the -f option

Correcting for these:

grep_ignore=("token_a" "token_b")

then

$ grep -rnw . -e "token2" | grep -vFf <(printf '%s\n' "${grep_ignore[@]}")
./file_token_c.txt:1:token2 token3

(the -F says to treat the array elements as fixed strings rather than regular expressions).


Alternatively, at least in GNU grep, you can use --exclude (and --include) to limit the match to specific file subsets to avoid the second grep altogether. So using your example above:

$ grep -rnw . -e "token2"
./file_token_a.txt:1:token1 token2
./file_token_c.txt:1:token2 token3

but given an array of filename patterns (note the elements are separated by whitespace not commas):

grep_ignore=("*token_a*" "*token_b*")

then

$ grep -rnw . -e "token2" "${grep_ignore[@]/#/--exclude=}"
./file_token_c.txt:1:token2 token3

where the array parameter expansion ${grep_ignore/#/--exclude=} expands as follows:

$ printf '%s\n' "${grep_ignore[@]/#/--exclude=}"
--exclude=*token_a*
--exclude=*token_b*

Alternatively you could use a brace expansion instead of an array:

grep -rnw . -e "token2" --exclude={"*token_a*","*token_b*"}

  1. try it with set -x for example:

     $ grep -rnw . -e "token2" | grep -v <(printf '%s\n' "${grep_ignore[@]}")
     + grep --color=auto -rnw . -e token2
     + grep --color=auto -v /dev/fd/63
     ++ printf '%s\n'
     ./file_token_a.txt:1:token1 token2
     ./file_token_c.txt:1:token2 token3
    

    Note how the grep command has become grep --color=auto -v /dev/fd/63? You can further confirm that it's treating /dev/fd/63 as a pattern rather than a pseudo-file as follows:

     printf '%s\n' /dev/fd/{61..65} | 
       grep -v <(printf '%s\n' "${grep_ignore[@]}")
    

    (you'll see that /dev/fd/63 gets filtered out).

steeldriver
  • 136,215
  • 21
  • 243
  • 336
1
grep -rnw token2 *.txt | grep -E -v "(token_a|token_b)"

seems to me to be a simpler approach, than to handle arrays.

Grep with -E for extended regular expressions, so you can use the OR-Operator "(token_a|token_b)".

user unknown
  • 6,507