4

I searched for ki with * as per the example below and it should have returned the first three lines. I am not sure why it returns the last line when there is no ki matching it.

$ grep "ki*" trial_file.txt
kartik,27,Bangalore,Karnataka
pulkit,25,Bangalore,Karnataka
kit,28,Bangalore,Karnataka
kush,24,Pennsylvania,Philadelphia
Eliah Kagan
  • 117,780
  • Run it grep ki trial_file.txt. The * is wrong. – Pilot6 Sep 18 '17 at 18:32
  • 4
    @Pilot6 This question involves two misconceptions, and the main one is that grep matches the whole line and requires something special to be done to just match part of a line. Both mjb2kmn's answer and my answer address that, but it would be outside the scope of that question. (That question also has a major part that doesn't apply here--the issue of when the shell expands *--though currently no answers there really address that in detail.) – Eliah Kagan Sep 18 '17 at 18:49

2 Answers2

12

Don't use * for this. Use grep 'ki' trial_file.txt or grep -F 'ki' trial_file.txt.

  1. Unless you pass it the -x/--line-regex option, grep will return lines that contain a match anywhere, even if the whole line isn't a match. So all you have to do is match part of the line. You don't have to do anything special to indicate there may be more characters.

  2. In a regular expression, * means "zero or more of the previous item." This is an entirely different from its meaning in shell pathname expansion (see also this article, man 7 glob, and this section). So, for example:

    • ax*b matches a, followed by any number of xes (even none), followed by b: ab, axb, axxb, axxxb, ...
    • a[xz]*b matches a followed by any number of characters where each is x or z, followed by b: ab, axb, azb, axxb, axzb, azxb, azzb, axxxb, ...
    • a(xyz)*b matches a, followed zero or more occurrences of the string xyz, followed by b: ab, axyzb, axyzxyzb, axyzxyzxyzb, ...

In this case, it seems like you're just searching for text. You don't need to use any regular expression metacharacters like ., *, or \ that have special meanings. That's why I suggest passing the -F flag, which makes grep search for "fixed strings" rather than performing regular expression matching.

If, however, you only want to match starting at the beginning of the line, then you do want to use a regular expression metacharacter: ^, as mjb2kmn suggests. This anchors your match to the start of the line. In that case you would run grep '^ki' trial_file.txt.

For more information on the options grep supports, see man grep and the GNU Grep manual.

Although in general I suggest enclosing regular expressions in ' ' quotes, in this case no quoting is necessary because the shell does not perform any expansions on ki or ^ki before passing them to grep.

Eliah Kagan
  • 117,780
10

I think you're expecting shell-style wild cards here, but what you're getting is a regular expression. When searching for ki* you are asking for a literal k followed by 0 or more is.

The first line doesn't contain "ki" either.

How to do this correctly depends on what exactly you are trying to match.

As commented above grep "ki" could be what you want, or if you want to match only lines starting with "ki" you'd need grep "^ki".
^ denotes the beginning of the line.

Zanna
  • 70,465
virullius
  • 641
  • 2
    It is not a regular expression when you use grep without -E, but in this case it is really looking for ks with 0 or more is – Pilot6 Sep 18 '17 at 18:39
  • 3
    It actually is basic regular expressions by default, the -E options enables "Extended regular expressions" man grep explains this. Edit: sorry the linux grep man page does not explain this well, the BSD grep man page does though. – virullius Sep 18 '17 at 18:42
  • You are correct. I always call only extended regexp the real ones. But it is probably wrong. – Pilot6 Sep 18 '17 at 18:43
  • 2
    @Pilot6 Ubuntu has GNU Grep, which supports three regex dialects: pass -G or no flag for POSIX BRE with GNU extensions, -E for POSIX ERE, or -P for PCRE ("Perl"). It also accepts -F for "fixed strings" where the pattern is matched literally. See Matcher Selection. Only -F does not use a regular expression. BRE and ERE are more similar to each other than to PCRE and many other dialects; both omit powerful features that are now common, like lookaround assertions. – Eliah Kagan Sep 18 '17 at 19:37