Searching through ODT documents without opening them?

Question

Let's say I have 2000 .ODT files. Their names are random numbers. How do I go about searching for, let's say, "pricing list"? How do I find the document I need without opening all of them one by one and checking to see if it's the right one?

Is there some program to search through the words in the documents without opening them?

score 8 · Answer 1 · answered Apr 04 '19 at 12:44

8

This works for anything that openoffice can read; I wanted odt only in this case:

find -name \*.odt -exec sh -c 'unoconv --stdout -f text "{}" | grep -i string_to_search_for' \;

answered Apr 04 '19 at 12:44

user1562400

91

1

If you add -print, then also the filenames of matching files are shown. – tillmo Feb 21 '23 at 10:31

yuric · Answer 2 · 2017-08-02T18:10:04.747

4

An alternative is Recoll . Once it has indexed your files, it finds the matching files quite fastly. Also searches inside ODSs, ODPs and PDFs. Works pretty well.

edited Aug 02 '17 at 18:10

answered Jan 26 '13 at 20:48

yuric

322

The link in the Software Center image has a typo (“recall” at the end). – Guildenstern Aug 01 '17 at 17:34
@Guildenstern Thanks, I fixed it. You can edit questions/answers for improving them too, if you want to. =) – yuric Aug 02 '17 at 18:13

score 2 · Accepted Answer · answered Aug 13 '12 at 18:22

You would need a full text indexing solution, which has a filter to support indexing the full text of those files.

One option for this is the tracker package in Ubuntu. You'll need to install tracker and tracker-miner-fs for this, and you'll also likely want tracker-gui for the search tool UI.

score 1 · Answer 4 · answered Jan 03 '24 at 16:04

1

You can use the --cat option to libreoffice to get the text from all the files without opening them (it can take some time, depending on file size). This leads to the solution

libreoffice --cat *.odt | grep -i string_to_search_for

answered Jan 03 '24 at 16:04

Robin Ryder

111

score 0 · Answer 5 · answered Feb 08 '24 at 02:56

Install antiword and odt2txt with apt install.

This code will search through all .doc and .odt files in a directory for a given string that may include spaces:

dgrep (make sure to make executable and put in your path!)

#!/bin/bash
#USE: dgrep   this text
#grep for doc files, using antiword
#grep for odt files, using odt2txt
#Run in a given directory with doc / odt files
#string=$1
string=$@
for i in .doc
do
    antiword $i | grep "$string" > found
    if [ -s found ] 
    then
     echo "(("$i"))"
     more found
    fi
done
for j in *.odt
do
    odt2txt $j | grep "$string" > found2
    if [ -s found2 ] 
    then
     echo "(("$j"))"
     more found2
    fi
done
/bin/rm found found2

Searching through ODT documents without opening them?

5 Answers5

Linked

Related