There's a common core of regular expression syntax however there are distinct flavors. Your expression appears to contain some features specific to the perl flavor, in particular the use of complex lookaround assertions describing the start and end of the pattern to be matched, whereas grep defaults to a basic regular expression (BRE) syntax that only supports a simpler set of these zero-length matches such as line- (^
,$
) and word-anchors (\>
, \<
).
You can enable perl-compatible regular expression (PCRE) support in grep using the -P command line switch (although note that the man page currently describes it as "experimental"). In your case you probably want the -o switch as well to only print the matching pattern, rather than the whole line i.e.
cat /var/log/dpkg.log | grep 'remove' | grep -oP '(?<=remove)(.*?)(?=:)'
Be aware that this expression may fail if it encounters packages that do not have the :i386 suffix since it may read ahead to a matching colon in the next word, e.g.
echo "2013-09-07 08:31:44 remove cifs-utils 2:5.1-1ubuntu2 <none>" | grep -oP '(?<=remove)(.*?)(?=:)'
cifs-utils 2
You may wish to look at awk instead e.g.
cat /var/log/dpkg.log | awk '$3 ~ /remove/ {sub(":.*", "", $4); print $4}'
As well as BRE and PCRE, Gnu grep has a further mode called extended regular expression (ERE), specified by the -E command line switch. The man page notes that
In GNU grep, there is no difference in available functionality
between basic and extended syntaxes.
However you should note that "no difference in available functionality" does not mean that the syntax is the same. For example, in BRE the +
character is normally treated as literal, and only becomes a modifier meaning 'one or more instance of the preceding regular expression' if escaped, i.e.
$ echo "123.456" | grep '[0-9]+\.[0-9]+'
$ echo "123.456" | grep '[0-9]\+\.[0-9]\+'
123.456
whereas for ERE it is exactly the opposite
$ echo "123.456" | grep -E '[0-9]+\.[0-9]+'
123.456
$ echo "123.456" | grep -E '[0-9]\+\.[0-9]\+'
A similar distinction applies for sed
invoked without and with the -r
switch.
grep -E '[0-9] remove' dpkg.log | sed -nrs 's/:/ /g;p' | awk '{ print $6 }'
. As for grep versus "regex", I understand you need to usegrep -E
, the usage ofegrep
being frowned upon, for full regex functionality. – Sep 22 '13 at 12:17[0-9]
is needed to eliminate lines like2013-09-11 11:11:08 startup packages remove
– Sep 22 '13 at 12:30