1

I would like to delete some lines in a file with more than 100K lines of data.

I only want to delete line which started with MX and NOT containing the word sum. How can I do that with sed?

Original file content:

Original file content

Expected file content:

Expected file content

Zanna
  • 70,465

1 Answers1

2

Based on the examples, provided in the article sed - 25 examples to delete a line or pattern in a file we can compose this command:

sed '/^MX/{/sum/!d}' in-file            # just output the result
sed '/^MX/{/sum/!d}' in-file -i.bak     # change the file and create a backup copy
sed '/^MX/{/sum/!d}' in-file > out-file # create a new file with different name/path

Here is perl solution - the source:

perl -ne '/^MX((?!sum).)*$/ || print' in-file
perl -ne '/^MX((?!sum).)*$/ || print' in-file > out-file

The same regular expression will work with grep -P (more explanations). But, instead of the above construction that literally means if not then print, to preserve the output of the matched lines with grep we need the -v option:

grep -vP '^MX((?!sum).)*$' in-file
grep -vP '^MX((?!sum).)*$' in-file > out-file

Here is also awk solution:

awk  '! /^MX/ || /sum/ {print}' in-file
awk  '! /^MX/ || /sum/ {print}' in-file > out-file

It is relatively easy to compose your regular expressions by online tools as regextester.com.

Productivity comparison:

$ du -sh in-file
2.4M    in-file
$ TIMEFORMAT=%R

$ time grep -vP '^MX((?!sum).)*$' in-file > out-file
0.049
$ time sed '/^MX/{/sum/!d}' in-file > out-file
0.087
$ time awk  '! /^MX/ || /sum/ {print}' in-file > out-file
0.090
$ time perl -ne '/^MX((?!sum).)*$/ || print' in-file > out-file
0.099
pa4080
  • 29,831
  • Hi, Thanks for the answer. However, I just realized that I have some lines with:

    sum, sum@1, sum@2, sum@3 and I would like to keep all these line. What's should I do with your recomended soluiton?

    – Kevin 5059 Nov 14 '19 at 12:45
  • 1
    Hi, @Kevin5059, you do not need nothing additional, all lines that contains "sum" should be kept. – pa4080 Nov 14 '19 at 14:20