I have a string like: "thisissometext"
. I want to find all text files inside a given directory (recursively) that containg this string, or any variations of it with white spaces and/or newlines in the middle of it. For example, a text file containing "this is sometext"
, or "this\n issometext"
, "this\n isso metext"
should show up in the search. How can I do this?

- 13,223
- 26
- 70
- 104
4 Answers
With the newer versions of GNU grep
(that has the -z
option) you can use this one liner:
find . -type f -exec grep -lz 'this[[:space:]]*is[[:space:]]*some[[:space:]]*text' {} +
Considering the whitespaces can come in between the words only.
If you just want to search all files recursively starting from current directory, you don't need find
, you can just use grep -r
(recursive). find
can be used to be selective on the files to search e.g. choose files of which directory to exclude. So, simply:
grep -rlz 'this[[:space:]]*is[[:space:]]*some[[:space:]]*text' .
The main trick here is
-z
, it will treat the each line of input stream ended in ASCII NUL instead of new line, as a result we can match newlines by using usual methods.[[:space:]]
character class pattern indicates any whitespace characters including space, tab, CR, LF etc. So, we can use it to match all the whitespace characters that can come in between the words.grep -l
will print only the file names that having any of the desired patterns. If you want to print the matches also, use-H
instead of-l
.
On the other hand, if the whitespaces can come at any places rather than the words, this would loose its good look:
grep -rlz
't[[:space:]]*h[[:space:]]*i[[:space:]]*s[[:space:]]*i[[:space:]]*\
s[[:space:]]*s[[:space:]]*o[[:space:]]*m[[:space:]]*e[[:space:]]*\
t[[:space:]]*e[[:space:]]*x[[:space:]]*t' .
With -P
(PCRE) option you can replace the [[:space:]]
with \s
(this would look much nicer):
grep -rlzP 't\s*h\s*i\s*s\s*i\s*s\s*s\s*o\s*m\s*e\s*\
t\s*e\s*x\s*t' .
Using @steeldriver's suggestion to get sed
to generate the pattern for us would be the best option:
grep -rlzP "$(sed 's/./\\s*&/2g' <<< "thisissometext")" .

- 91,753
-
1Again, this does not work on this string, or any variations of it with white spaces and/or newlines in the middle of it, Only if they appear on whole words. – Jacob Vlijm May 27 '15 at 21:12
-
-
1@heemayl you could maybe do something like
grep -zP "$(sed 's/./\\s*&/2g' <<< "thisissometext")"
to take some of the tedium out of extending your approach to arbitrary amounts of whitespace between any characters – steeldriver May 27 '15 at 21:28 -
-
Can you specify which version of
grep
do I need for this to work? (And how do I find bygrep
version?) Thanks. – a06e May 27 '15 at 22:10 -
@becko You can find
grep
version bygrep --version
.. i am using 2.16..i can't recall from which version it included-z
..useman grep | grep -- '--null-data'
, you will get it if yourgrep
supports it.. – heemayl May 27 '15 at 22:13 -
1Why is this wrapped in a find-exec? Why not just use grep's -r recursive flag? – Oli May 28 '15 at 11:14
-
@Oli if the situation is at is right now
grep -r
is the way to go..find
is used to if OP wants to be more selective on the files.. :) – heemayl May 28 '15 at 11:22 -
the spaces need not be between words. In fact, the text I want to search for is a sequence of characters. There are no "words". The last command using
sed
seems to work fine. – a06e May 29 '15 at 17:45 -
@becko Well..then check the last three solutions..i have mentioned that clearly too.. :) – heemayl May 29 '15 at 17:46
-
Yes, I saw that, I +1 your answer ;). It seems to work fine. If I have no issues I'll accept it. – a06e May 29 '15 at 17:48
You can delete all whitespace and grep it:
tr -d '[[:space:]]' < foo | grep thisissometext
Extending:
find . -type f -exec bash -c 'for i; do tr -d "[[:space:]]" < "$i" | grep -q thisissometext && printf "%s\n" "$i"; done' _ {} +
The bash
command, expanded:
for i
do
tr -d "[[:space:]]" < "$i" |
grep -q thisissometext &&
printf "%s\n" "$i"
done
This loops over all arguments and uses the above test.

- 197,895
- 55
- 485
- 740
The code below searches a directory recursively for files, removes all occurrences of " "
and "\n"
. If the string exists in the remaining text, there is a match. This implies that the spaces/newlines can be on any position in the string inside your file(s).
What it does
If it finds matching files, they will be printed in the terminal, including their paths, like:
/home/jacob/Bureaublad/testmap/test2.txt
/home/jacob/Bureaublad/testmap/Naamloze map 2/test1.txt
The try / except I built in to prevent the script from breaking if it runs into an unreadable file.
The script
#!/usr/bin/env python3
import os
import sys
s = sys.argv[2]
for root, dirs, files in os.walk(sys.argv[1]):
for file in files:
file = root+"/"+file
try:
if s in open(file).read().replace(" ", "").replace("\n",""):
print(file)
except:
pass
How to use
- Copy the script into an empty file, save it as
find_string.py
Run it with the directory and the string as arguments:
python3 /path/to/find_string.py <directory> <string_to_find>
If either the string or the directory contains spaces, use quotes:
python3 /path/to/find_string.py '<directory>' '<string_to_find>'
Note
The script, as it is finds files with the string, with either whitespaces or newlines in it. It can be expanded with other characters/strings (e.g. tabs) in the line:
if s in open(file).read().replace(" ", "").replace("\n",""):

- 83,767
You could use grep -i --recursive 'word1\|word2' *
and awk '/word1/,/word2/'
can be used to deal with the newline

- 105,154
- 20
- 279
- 497
-
1This does not work on this string, or any variations of it with white spaces and/or newlines in the middle of it. – Jacob Vlijm May 27 '15 at 19:50
heemayl
's answer seems to work fine. – a06e May 29 '15 at 18:08