101

How to remove all the lines from the text file containing the words "cat" and "rat"?

muru
  • 197,895
  • 55
  • 485
  • 740
PersonX
  • 1,087
  • 1
    This sounds suspiciously like a homework assignment. Please remember to attribute your answer to the nice folks over at Askubuntu. – zwets Oct 07 '13 at 19:59
  • That a part of the big project, I am new to Linux environment. – PersonX Oct 07 '13 at 20:05

8 Answers8

148

grep approach

To create a copy of the file without lines matching "cat" or "rat", one can use grep in reverse (-v) and with the whole-word option (-w).

grep -vwE "(cat|rat)" sourcefile > destinationfile

The whole-word option makes sure it won't match cats or grateful for example. Output redirection of your shell is used (>) to write it to a new file. We need the -E option to enable the extended regular expressions for the (one|other) syntax.

sed approach

Alternatively, to remove the lines in-place one can use sed -i:

sed -i "/\b\(cat\|rat\)\b/d" filename

The \b sets word boundaries and the d operation deletes the line matching the expression between the forward slashes. cat and rat are both being matched by the (one|other) syntax we apparently need to escape with backslashes.

Tip: use sed without the -i operator to test the output of the command before overwriting the file.

(Based on Sed - Delete a line containing a specific string)

gertvdijk
  • 67,947
  • I wonder if there's a way to achieve both the removal from the source file AND generate the file with matches. Probably not, but it would be useful (e.g. when you get a file that is growing too large, you are splitting it based on content). – Sridhar Sarnobat Nov 14 '16 at 20:29
  • 1
    @Sridhar-Sarnobat Oh, you can. Use tee and subshells to copy stdout. In one you filter, in the other the reverse. Use of tee and subshells demonstrated in an unrelated usecase demonstrated here: https://blog.g3rt.nl/luks-smartcard-or-token.html#enhance-security-avoid-temporary-key-storage – gertvdijk Nov 15 '16 at 09:38
  • I was thinking of xargs sed. The grep -v approach is much simpler! – John Jiang Nov 16 '19 at 06:00
  • sed: 1: "filename": invalid command code f

    There are different editions of "sed" out in the wild. MacOS is BSD based. From the manpage of that sed I get: -i extension Edit files in-place, saving backups with the specified extension. If a zero-length extension is given, no backup will be saved. It is not recommended to give a zero-length extension when in-place editing files, as you risk corruption or partial content in situations where disk space is exhausted, etc.

    – Jörg Dec 13 '19 at 23:29
20

To test in terminal only, use:

sed '/[cr]at/d' file_name

To really remove those lines from the file, use:

sed -i '/[cr]at/d' file_name
Radu Rădeanu
  • 169,590
  • How do you remove search terms that have the / symbol in them, like a url? I tried replacing / with | because you can do that for other uses of sed but it didn't work. – Wimateeka Jan 08 '20 at 15:58
7

Try using ex command (part of Vi/Vim):

ex +"g/[cr]at/d" -scwq file.txt

The above has the advantage over other tools such as sed due to its non-standard FreeBSD -i (in-place) extension and may not be available on other operating systems. Secondly sed is a Stream EDitor, not a file editor.

kenorb
  • 10,347
2

Delete lines from all files that match the match

grep -rl 'text_to_search' . | xargs sed -i '/text_to_search/d'
djperalta
  • 121
1

Using awk to exclude lines containing specific words:

$ awk '!/\<(cat|rat)\>/{print $0}' ./input.txt

awk syntax:

  • !/regex/ Only print lines that do not match regex.
  • | Alternation operator, used to specify alternatives.
  • (...) Grouping, for example grouping alternation operators.
  • \< Matches the empty string at the beginning of a word.
  • \> Matches the empty string at the end of a word.
  • {...} Action statement.
0
cat logs.txt | grep 'your regex' > logs_regex.txt

This will create a new file logs_regex.txt which is a copy of your file logs.txt with only the lines containing your regex

Sox -
  • 101
  • 1
0

Consider if you have file with file_name and you want to search for mouse but on the same time few rows from mouse having other words like cat and rat and you don't want to see those in your output, so the one way to do it is -

grep -r mouse file_name | grep -vE "(cat|rat)"
muru
  • 197,895
  • 55
  • 485
  • 740
0

portable shell way

Works in /bin/sh, which is dash on Ubuntu, as well as ksh, and bash. Slightly awkward that you have to write multiple test cases for each word in case statement but portable. Works with cases where word appears alone on the line, in the beginning, end of the line, or middle of the line, and ignores where it might be part of another word.

#!/bin/sh
line_handler(){
   # $1 is line read, prints to stdout
    case "$1" in
        cat|cat\ *|*\ cat\ *|*\ cat) true;; # do nothing if cat or rat in line
        rat|rat\ *|*\ rat\ *|*\ rat) true;; 
        *) printf "%s\n" "$1"
    esac
}

readlines(){
    # $1 is input file, the rest is words we want to remove
    inputfile="$1"
    shift

    while IFS= read -r line;
    do
        line_handler "$line" "$@"
    done < "$inputfile"
    [ -n "$line" ] && line_handler "$line" 
}

readlines "$@"

And this is how it works:

$ cat input.txt                                                                                                                                                        
the big big fat cat
the cat who likes milk 
jumped over gray rat
concat 
this is catchy
rat
rational
irrational
$ ./dellines.sh input.txt                                                                                                                                              
concat 
this is catchy
rational
irrational
Sergiy Kolodyazhnyy
  • 105,154
  • 20
  • 279
  • 497