I would like to delete some lines in a file with more than 100K lines of data.
I only want to delete line which started with MX
and NOT containing the word sum
. How can I do that with sed
?
Original file content:
Expected file content:
I would like to delete some lines in a file with more than 100K lines of data.
I only want to delete line which started with MX
and NOT containing the word sum
. How can I do that with sed
?
Original file content:
Expected file content:
Based on the examples, provided in the article sed
- 25 examples to delete a line or pattern in a file we can compose this command:
sed '/^MX/{/sum/!d}' in-file # just output the result
sed '/^MX/{/sum/!d}' in-file -i.bak # change the file and create a backup copy
sed '/^MX/{/sum/!d}' in-file > out-file # create a new file with different name/path
Here is perl
solution - the source:
perl -ne '/^MX((?!sum).)*$/ || print' in-file
perl -ne '/^MX((?!sum).)*$/ || print' in-file > out-file
The same regular expression will work with grep -P
(more explanations). But, instead of the above construction that literally means if not then print, to preserve the output of the matched lines with grep
we need the -v
option:
grep -vP '^MX((?!sum).)*$' in-file
grep -vP '^MX((?!sum).)*$' in-file > out-file
Here is also awk
solution:
awk '! /^MX/ || /sum/ {print}' in-file
awk '! /^MX/ || /sum/ {print}' in-file > out-file
It is relatively easy to compose your regular expressions by online tools as regextester.com.
Productivity comparison:
$ du -sh in-file
2.4M in-file
$ TIMEFORMAT=%R
$ time grep -vP '^MX((?!sum).)*$' in-file > out-file
0.049
$ time sed '/^MX/{/sum/!d}' in-file > out-file
0.087
$ time awk '! /^MX/ || /sum/ {print}' in-file > out-file
0.090
$ time perl -ne '/^MX((?!sum).)*$/ || print' in-file > out-file
0.099
sum, sum@1, sum@2, sum@3 and I would like to keep all these line. What's should I do with your recomended soluiton?
– Kevin 5059 Nov 14 '19 at 12:45