To be precise
Some text
begin
Some text goes here.
end
Some more text
and I want to extract entire block that starts from "begin" till "end".
with awk we can do like awk '/begin/,/end/' text.
How to do with grep?
To be precise
Some text
begin
Some text goes here.
end
Some more text
and I want to extract entire block that starts from "begin" till "end".
with awk we can do like awk '/begin/,/end/' text.
How to do with grep?
Updated 18-Nov-2016 (since grep behavior is changed: grep with -P parameter now doesn't support ^ and $ anchors [on Ubuntu 16.04 with kernel v:4.4.0-21-generic])(wrong (non-)fix)
$ grep -Pzo "begin(.|\n)*\nend" file
begin
Some text goes here.
end
note: for other commands just replace the '^' & '$' anchors with new-line anchor '\n'
______________________________
With grep command:
grep -Pzo "^begin\$(.|\n)*^end$" file
If you want don't include the patterns "begin" and "end" in result, use grep with Lookbehind and Lookahead support.
grep -Pzo "(?<=^begin$\n)(.|\n)*(?=\n^end$)" file
Also you can use \K notify instead of Lookbehind assertion.
grep -Pzo "^begin$\n\K(.|\n)*(?=\n^end$)" file
\K option ignore everything before pattern matching and ignore pattern itself.
\n used for avoid printing empty lines from output.
Or as @AvinashRaj suggests there are simple easy grep as following:
grep -Pzo "(?s)^begin$.*?^end$" file
grep -Pzo "^begin$[\s\S]*?^end$" file
(?s) tells grep to allow the dot to match newline characters.
[\s\S] matches any character that is either whitespace or non-whitespace.
And their output without including "begin" and "end" is as following:
grep -Pzo "^begin$\n\K[\s\S]*?(?=\n^end$)" file # or grep -Pzo "(?<=^begin$\n)[\s\S]*?(?=\n^end$)"
grep -Pzo "(?s)(?<=^begin$\n).*?(?=\n^end$)" file
see the full test of all commands here (out of dated as grep behavior with -P parameter is changed)
^ point the beginning of a line and $ point the end of a line. these added to the around of "begin" and "end" to matching them if they are alone in a line.
In two commands I escaped $ because it also using for "Command Substitution"($(command)) that allows the output of a command to replace the command name.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
-P, --perl-regexp
Interpret PATTERN as a Perl compatible regular expression (PCRE)
-z, --null-data
Treat the input as a set of lines, each terminated by a zero byte (the ASCII
NUL character) instead of a newline. Like the -Z or --null option, this option
can be used with commands like sort -z to process arbitrary file names.
grep -Pzo "(?<=begin\n)(.|\n)*(?=\nend)" file to not to print \n character which exists on the line begin.
– Avinash Raj
Nov 19 '14 at 12:18
grep -Pzo "(?s)begin.*?end" file
– Avinash Raj
Nov 19 '14 at 12:19
\n but you can post your another solution as your own answer ;)
– αғsнιη
Nov 19 '14 at 12:38
grep -Pzo "begin(.|\n)*\nend" file instead to make sure that end only matches at the beginning of a line and not in things like bend.
– terdon
Nov 19 '14 at 13:18
^ would only match the beginning of the file when using -z but apparently not.
– terdon
Nov 19 '14 at 13:23
^ and $ to match just before and just after a \0 instead. Apparently, they're hard coded to match \n.
– terdon
Nov 19 '14 at 13:27
grep: ein nicht geschütztes ^ oder $ wird mit -Pz nicht unterstützt The translation of the error is something like: grep: a not protected ^ or $ is not supported with -Pz
– musbach
Nov 15 '16 at 08:01
grep's behavior has changed. I just tested and musbach is right, the ^ and $ don't work with -Pz. It should work as expected if your replace ^ and $ with \n though.
– terdon
Nov 15 '16 at 08:45
grep seems to have changed.
– terdon
Nov 15 '16 at 09:24
grep -Pzo "begin\n(.|\n)*\nend\n" file. If I put before begin a \n (grep -Pzo "\nbegin\n(.|\n)*\nend\n" file) I get blank line and than the correct output. I guess that \n produces a linefeed but it looks strange to me. @KasiyA I am on Ubuntu 16.04. On what OS are you?
– musbach
Nov 16 '16 at 20:24
\n is the newline character. You get an extra newline because with \nbegin you are including the newline character at the end of the previous line, so that's printed as a blank line.
– terdon
Nov 16 '16 at 20:55
In case your grep doesn't support perl syntax (-P), you can try joining the lines, matching the pattern, then expanding the lines again as below:
$ tr '\n' , < foo.txt | grep -o "begin.*end" | tr , '\n'
begin
Some text goes here.
end