8

Let's say I have 2000 .ODT files. Their names are random numbers. How do I go about searching for, let's say, "pricing list"? How do I find the document I need without opening all of them one by one and checking to see if it's the right one?

Is there some program to search through the words in the documents without opening them?

Jonas
  • 125
  • 1
  • 2
  • 6

5 Answers5

8

This works for anything that openoffice can read; I wanted odt only in this case:

find -name \*.odt -exec sh -c 'unoconv --stdout -f text "{}" | grep -i string_to_search_for' \;
4

An alternative is Recoll Install Recoll. Once it has indexed your files, it finds the matching files quite fastly. Also searches inside ODSs, ODPs and PDFs. Works pretty well.

yuric
  • 322
2

You would need a full text indexing solution, which has a filter to support indexing the full text of those files.

One option for this is the tracker package in Ubuntu. You'll need to install tracker and tracker-miner-fs for this, and you'll also likely want tracker-gui for the search tool UI.

dobey
  • 40,982
1

You can use the --cat option to libreoffice to get the text from all the files without opening them (it can take some time, depending on file size). This leads to the solution

libreoffice --cat *.odt | grep -i string_to_search_for
0

Install antiword and odt2txt with apt install.

This code will search through all .doc and .odt files in a directory for a given string that may include spaces:

dgrep (make sure to make executable and put in your path!)

#!/bin/bash

#USE: dgrep this text

#grep for doc files, using antiword #grep for odt files, using odt2txt

#Run in a given directory with doc / odt files

#string=$1 string=$@

for i in .doc do antiword $i | grep "$string" > found if [ -s found ] then echo "(("$i"))" more found fi done

for j in *.odt do odt2txt $j | grep "$string" > found2 if [ -s found2 ] then echo "(("$j"))" more found2 fi done

/bin/rm found found2

ptzan
  • 1