Remove lines starting with "string1" and NOT containing "string2" with sed

Question

I would like to delete some lines in a file with more than 100K lines of data.

I only want to delete line which started with MX and NOT containing the word sum. How can I do that with sed?

Original file content:

Expected file content:

Don't post pictures of text. I would like to cut'n'paste the input text to develop a solution, but I can't. — glenn jackman, Nov 14 '19 at 15:27

pa4080 · Answer 1 · 2019-11-14T20:44:24.303

Based on the examples, provided in the article sed - 25 examples to delete a line or pattern in a file we can compose this command:

sed '/^MX/{/sum/!d}' in-file            # just output the result
sed '/^MX/{/sum/!d}' in-file -i.bak     # change the file and create a backup copy
sed '/^MX/{/sum/!d}' in-file > out-file # create a new file with different name/path

Here is perl solution - the source:

perl -ne '/^MX((?!sum).)*$/ || print' in-file
perl -ne '/^MX((?!sum).)*$/ || print' in-file > out-file

The same regular expression will work with grep -P (more explanations). But, instead of the above construction that literally means if not then print, to preserve the output of the matched lines with grep we need the -v option:

grep -vP '^MX((?!sum).)*$' in-file
grep -vP '^MX((?!sum).)*$' in-file > out-file

Here is also awk solution:

awk  '! /^MX/ || /sum/ {print}' in-file
awk  '! /^MX/ || /sum/ {print}' in-file > out-file

It is relatively easy to compose your regular expressions by online tools as regextester.com.

Productivity comparison:

$ du -sh in-file
2.4M    in-file
$ TIMEFORMAT=%R

$ time grep -vP '^MX((?!sum).)*$' in-file > out-file
0.049
$ time sed '/^MX/{/sum/!d}' in-file > out-file
0.087
$ time awk  '! /^MX/ || /sum/ {print}' in-file > out-file
0.090
$ time perl -ne '/^MX((?!sum).)*$/ || print' in-file > out-file
0.099

Hi, Thanks for the answer. However, I just realized that I have some lines with:
sum, sum@1, sum@2, sum@3 and I would like to keep all these line. What's should I do with your recomended soluiton? — Kevin 5059, Nov 14 '19 at 12:45
Hi, @Kevin5059, you do not need nothing additional, all lines that contains "sum" should be kept. — pa4080, Nov 14 '19 at 14:20

Remove lines starting with "string1" and NOT containing "string2" with sed

1 Answers1