How can I determine the prematch and postmatch using egrep or ksh under linux

Question

I am working on a issue at the office running on a linux system. I would like to be able to either use egrep or ksh pattern matching to determine not only the matched string but I also need to be able to determine the prematch and postmatch strings.

I know I can do this in Perl but I would also like to be able to do this using egrep or ksh pattern matching.

I did some google searching and I found an egrep command where you can specify the number of prematch and postmatch characters, but that is not good enough. I need the entire prematch and postmatch strings.

An example or two would be helpful. GNU grep has a PCRE mode (grep -P) that supports lookarounds. Is your "linux system" actually Ubuntu? — steeldriver, Sep 07 '23 at 13:13

Raffa · Answer 1 · 2023-09-08T16:16:53.320

With `grep`

you can use Perl's \K (enabling Perl-compatible regular expressions with the option -P) and a lookahead pattern like so:

$ echo -e "pre1 line1 post1\npre2 line2 post2" |
grep -Po "pre2.*\Kline.(?=.*post2)"
line2

... where the patterns pre2.* and .*post2 are looked for and evaluated but are not included in the matching output, but the pattern line. is printed in the output upon a successful match of all the three patterns in the same sequence in an input line.

In the shell

In bash as well as in ksh, zsh and other similar Bourne-like shells, you can do something similar to this:

pat="line."
pre="pre2.*"
post=".*post2"
echo -e "pre1 line1 post1\npre2 line2 post2" |
while IFS= read -r line
  do
    [[ "$line" =~ $pre$pat$post ]] && echo "$line"
    done
Outputs "pre2 line2 post2"
You can echo "$pat" as well

Or to mimic the above grep -Po output on a file, a function like this:

mygrep () {
pre="${1}."
pat="$2"
post=".${3}"
file="$4"
help="Usage: mygrep &quot;prematch&quot; &quot;match&quot; &quot;postmatch&quot; &quot;filename&quot;"
if [[ $# -lt 4 ]]
  then
    echo "$help"
    return
    fi
while IFS= read -r line
  do
    if [[ "$line" =~ $pre$pat$post ]]
      then
        for word in $line;
          do
            [[ "$word" =~ $pat ]] && echo "$word" && break
            done
      fi
    done < "$file"
}

... and that will work like so:

$ cat file
pre1 line1 post2
pre2 someword line2 otherword post2
pre3 line3 nomatch post3
pre2 match match line4 will match post2
pre2 post2
pre2 nomatch post2
$
$
$ mygrep --help
Usage: mygrep "prematch" "match" "postmatch" "filename"
$
$
$ mygrep "pre2" "line." "post2" "./file"
line2
line4

Notice that the unquoted parameter $line in the head of the for loop is deliberate and meant to allow the shell's word splitting to happen so that individual words in that input line can be looped upon, but know that this will also allow for shell globbing of filenames in the current working directory to happen if one of the words in that line happened to contain any glob characters and therefore in that case you might want to first read the words (splitting on spaces) in that line into an array and loop over them as array elements instead while quoting the expansion of that array's elements ... Which would be safer in this case (Included both for educational reasons) ... Like this:

mygrep () {
pre="${1}."
pat="$2"
post=".${3}"
file="$4"
help="Usage: mygrep &quot;prematch&quot; &quot;match&quot; &quot;postmatch&quot; &quot;filename&quot;"
if [[ $# -lt 4 ]]
  then
    echo "$help"
    return
    fi
while IFS=' ' read -r -a line
  do
    if [[ "${line[*]}" =~ $pre$pat$post ]]
      then
        for word in "${line[@]}";
          do
            [[ "$word" =~ $pat ]] && echo "$word" && break
            done
      fi
    done < "$file"
}

Notice, as well, that the shell, although can match text using either glob patterns or regular expression patterns, is not the best choice for it ... Use grep or alike instead ... You, however, might want to read Can globbing be used to search file contents?

How can I determine the prematch and postmatch using egrep or ksh under linux

1 Answers1

With grep

In the shell

Outputs "pre2 line2 post2"

You can echo "$pat" as well

With `grep`