14

How can I search for *.odt or *.doc files that contain certain text in Ubuntu?

I use grep -rl <text to search for>, but this only works for text files.

Note: a solution that uses grep (such as searchmonkey) will not work because the *.doc or *.odt files have a special format.

From How to search for strings inside files in a folder?

  • Recoll wants to index my home directory, but I want to search *.odt files in specific directories; I couldn't figure out how to do that with this tool.
  • Searchmonkey seems to be a GUI for grep, and as I mentioned, grep doesn't work on *.doc or *.odt files.
  • Regexxer also has the same problem.

From Searching through ODT documents without opening them?

  • Like Recoll, I couldn't figure out how to search *.odt files in specific directories with this tool.
Enterprise
  • 12,352
  • 3
  • I am trying to find all files that contain certain text. I could recursively cat all files in a directory. However, I noticed that doing cat on *.odt files does not list all of the words in the file, and it includes a lot of unprintable characters. I guess this is why grep doesn't work either. – Enterprise Jul 23 '17 at 03:25
  • See https://ubuntuforums.org/showthread.php?t=899179&p=10272667#post10272667 and other posts in that thread for searching .odt files. – DK Bose Jul 23 '17 at 03:37
  • You can try searchmonkey and recoll according to these links, https://askubuntu.com/questions/198110/how-to-search-for-strings-inside-files-in-a-folder and https://askubuntu.com/questions/31869/how-to-search-pdf-files-by-their-metadata – sudodus Jul 23 '17 at 05:46
  • 3
    Is this question really a duplicate of this one? Because that question is about searching through ASCII files while this one is about searching through binary file formats. – Guildenstern Aug 01 '17 at 17:46

1 Answers1

16

catdoc appears to work recursively for .doc files in 16.04: https://superuser.com/questions/330242/how-to-recursively-find-a-doc-file-that-contains-a-specific-word

There's no mention of .docx so you'll need to figure that one out yourself.

For .ods or .odt files, you could have the following script courtesy kaibob @ ubuntuforums.org:

#!/bin/bash

find . -type f -name ".od" | while read i ; do [ "$1" ] || { echo "You forgot search string!" ; exit 1 ; } unzip -ca "$i" 2>/dev/null | grep -iq "$*" if [ $? -eq 0 ] ; then echo "string found in $i" | nl fi done

Let's say you call it "libre-search" and have made it executable.

Then, running libre-search your_string should list files containing your_string. It will not list the string context.

unzip -ca "$i" 2>/dev/null takes care of unwanted content.
grep -iq makes the search case-insensitive.
nl numbers the output.

DK Bose
  • 42,548
  • 23
  • 127
  • 221
  • 3
    One could also convert ODT/S/P files to PDF with unoconv then use pdfgrep. – Andrea Lazzarotto Jul 23 '17 at 12:36
  • 1
    This looks promising. I will try it and comment back. – Enterprise Jul 24 '17 at 01:30
  • @Andrea Lazzarto, your idea sounds good too. In my particular case, I have hundreds of files, within a directory structure, so I wouldn't want to convert all of them. However, you should post your suggestion as an answer, because it may be useful to someone else searching for this topic. – Enterprise Aug 01 '17 at 02:17
  • 1
    @ DK Bose, your solution worked the best out of the ones I tried (from the alternate questions above). I did experience some errors in the unzip process for some files, but that might be due to corrupt files. I like the fact that your solution is command line based, it can work on an arbitrary directory, it can be scripted to search though directory trees, and that it does not require me to index all my files. – Enterprise Aug 01 '17 at 02:20
  • 1
    I've modified the code slightly to remove the unzip errors and to make the search insensitive. – DK Bose Feb 04 '18 at 06:41