7

What's the best way to search my file system on ubuntu and get results almost instantly? I have used catfish , tracker and the usual search tool provided with ubuntu.

Tracker finds nothing, ubuntu search tool is too slow and catfish most of the time finds nothing. I have a lot of PDFs and DJVU files that I want to access. In windows, there is a program called search everything that returns results almost instantly. I want a similar linux tool.

Please provide a detailed answer as possible as I'm a newbie in linux. If such a tool doesn't exist in ubuntu, what's the chance that I can find such tool in other linux distribution e.g mandriva, redhat?

Glutanimate
  • 21,393
Nabil
  • 71

7 Answers7

10

Recoll can do this for you. It features full-text indexing for almost every document type you can imagine and a result overview sorted by page numbers for PDF documents.

enter image description here

enter image description here

You can install it through the software center (search for Recoll) or get the new newest version through the Recoll PPA (including a Unity lens/scope). First add the official Recoll repository:

sudo add-apt-repository ppa:recoll-backports/recoll-1.15-on
sudo apt-get update

If you are on Ubuntu 13.04 and below you will have to install recoll-lens:

sudo apt-get install recoll recoll-lens

For Ubuntu 13.10 and up use unity-scope-recoll instead:

sudo apt-get install unity-scope-recoll

If this is the first time you are installing from a PPA, make sure you read these first:

What are PPAs and how do I use them?

Are PPA's safe to add to my system and what are some "red flags" to watch out for?

You will have to execute Recoll at least once to build your search index before being able to use the Recoll lens/scope.

More extensive documentation on how to use Recoll can be found here.

Glutanimate
  • 21,393
  • Thanks,Glutanimate . I have recoll and it's useless . I don't want what's inside the documents . All I want is to type a part of the file name and I get the result fast . All the tools I have tried either do not return results at all or can only work occasionally . Catfish have worked fine once but after I restarted my computer and tried it again , it returned nothing – Nabil Sep 28 '12 at 18:08
  • But you can do that as well. Just choose file type from the drop down menu ( the one that says All Items in the screenshot) – Glutanimate Sep 28 '12 at 18:09
  • I have done that but it still doesn't work . maybe because I didn't tell the program where to search ?Does it search the whole hard disk? – Nabil Sep 28 '12 at 18:14
  • You have to define where it searches. By default it searches through home only. See the documentation for more details. – Glutanimate Sep 28 '12 at 18:14
  • Well , I have those files on some partition but I do not find the option to choose partition C , E ,F familiar in windows ? – Nabil Sep 28 '12 at 18:18
  • Ok , I made it search all Hard disk however , When I click search it immediately says nothing is found – Nabil Sep 28 '12 at 18:23
  • Time to go back to windows – Nabil Sep 28 '12 at 18:34
  • Follow the documentation and add all of your partitions in the indexing configuration wizard. Check the option "include file names" (this option should be on by default). Then rebuild the index. – Glutanimate Sep 28 '12 at 18:38
  • If the partitions you want to add were created by windows you will first have to mount them. Open your file manager and click on each partition so that you see the eject symbol next to them. You should be able to find them in recoll under /media/. – Glutanimate Sep 28 '12 at 18:45
  • Anyone have this working in Ubuntu 14.04? I installed recoll from the default repositories, but do not see any relevant results in Dash. I then tried adding the repository above, but cannot install recoll-lens. – Brian Z Jul 28 '14 at 22:00
  • 2
    @BrianZ Thanks for your comment. With Ubuntu 13.10 and up you have to install a scope instead of a lens. I updated my answer accordingly. – Glutanimate Jul 28 '14 at 22:47
  • @Glutanimate Thank you! It's working... this definitely makes Dash a lot more useful to me. – Brian Z Jul 28 '14 at 23:42
  • But I don't suppose there is any way to see the actual filenames, instead of the titles from the metadata? – Brian Z Jul 28 '14 at 23:54
  • @BrianZ No, I don't think there's such an option at the current time. You can always drop a feature request on recoll's bug tracker, though (e.g.: ask for a configurable option). The developer is very friendly and responsive. – Glutanimate Jul 29 '14 at 00:29
4

To search for file names only - ignoring content -
you can use locate tool. It is very fast on searching.

locate '*.pdf'

will list all the pdf file. See the manual page for more info.

$ locate --help
Usage: locate [OPTION]... [PATTERN]...

Search for entries in a mlocate database.

  -b, --basename         match only the base name of path names
  -c, --count            only print number of found entries
  -d, --database DBPATH  use DBPATH instead of default database (which is
                         /var/lib/mlocate/mlocate.db)
  -e, --existing         only print entries for currently existing files
  -L, --follow           follow trailing symbolic links when checking file
                         existence (default)
  -h, --help             print this help
  -i, --ignore-case      ignore case distinctions when matching patterns
  -l, --limit, -n LIMIT  limit output (or counting) to LIMIT entries
  -m, --mmap             ignored, for backward compatibility
  -P, --nofollow, -H     don't follow trailing symbolic links when checking file
                         existence
  -0, --null             separate entries with NUL on output
  -S, --statistics       don't search for entries, print statistics about each
                         used database
  -q, --quiet            report no error messages about reading databases
  -r, --regexp REGEXP    search for basic regexp REGEXP instead of patterns
      --regex            patterns are extended regexps
  -s, --stdio            ignored, for backward compatibility
  -V, --version          print version information
  -w, --wholename        match whole path name (default)
Volker Siegel
  • 13,065
  • 5
  • 49
  • 65
Anwar
  • 76,649
  • Note that for update the index you must run:

    sudo updatedb

    Also a powerful feature on this, is that you can save your index on a file then use it with locate <filename> -d <DBPATH> great option if you want to keep indexed locally HDD's but not always connected

    – Diego Andrés Díaz Espinoza Apr 04 '16 at 17:43
1

I also do a lot of searching through very large libraries of PDFs. For me, this is the #1 frustration of Linux that makes me miss MS Windows. I've tried it all at this point, and the solution I have settled on for now is to use the following programs in combination.

Unfortunately, none of these seem to be in the Ubuntu repositories at the moment, and may be unstable. So if Recoll (now in the default repository for Ubuntu 14.04 I beleive?) or something else works for you, better to stick with that.

1) Synapse

Installation: Read this post for details, but basically you can install it by running the following commands in a terminal.

sudo apt-add-repository ppa:synapse-core/testing
sudo apt-get update
sudo apt-get install synapse

Positive

  • Very fast, smart search results
  • If what you want doesn't come up right away, you can press down and tab to find more with "locate".

Negative

  • Only searches filenames, not text inside.
  • Seems to miss a lot, especially before you try "locate".

2) Launchy

Installation: Download the package here.

Positive:

  • Almost as fast as Synapse
  • Results are very comprehensive.

Negative:

  • Also only searches filenames.
  • Probably the buggiest of these three.

3) DocFetcher

Installation: Unless you can find it in a repository somewhere, you are stuck with the portable version. Download it here and follow the instructions.

Positive:

  • Searches inside the text of your PDFs
  • Comprehensive but relevant results, in a logical order (I usually find the results in Recoll or Tracker to be completely screwy in comparison)
  • Full document preview pane so you can see more of the file before you open it (not just a few lines)
  • Reasonably fast

Negative:

  • Hard to install and run natively in Ubuntu (e.g. without Java runtime)
  • Much slower than the apps that only search filenames

Hopefully Dash will catch up and make all of this obsolete, but in the meantime these three are mostly what I am using.

Other options maybe worth trying:

  • Gnome-Do might be a worthy alternative to Synapse, but last I checked it can only index 5000 files, and that is not enough for me
  • pdfgrep is sometimes useful but slow and has no GUI that I am aware of
Brian Z
  • 663
  • 6
  • 17
  • 3
    Very comprehensive answer, +1. My only suggestion would be to give Recoll another try. Out of all solutions it's the most configurable by far. And while it might take some time to tweak to your specific needs and use case I can only say that it's very much worth the effort. Luckily the documentation is very good and exhaustive. – Glutanimate Jul 29 '14 at 00:37
  • 1
    Also, from the sounds of it you are more interested in a lightweight solution that retrieves documents based on their file name rather than their contents There's a new project I saw a few weeks ago that might fit the bill. It's called PyNeedle and uses Recoll's powerful indexer as its backend. I have yet to try it out but it might be a good alternative for your specific use case. – Glutanimate Jul 29 '14 at 00:39
  • @ Glutanimate Thanks... Now that I have the Recoll lens working in Dash since yesterday, I've been using it quite a bit. I will definitely look into your suggestions. – Brian Z Jul 30 '14 at 01:52
0

enter image description hereyou can also use gnome-search-tool . you can get it by sudo apt-get install gnome-search-tool

Raja G
  • 102,391
  • 106
  • 255
  • 328
0

The following Python code will return search results very quickly. Just change the second parameter in fnmatch.fnmatch(file,'*.txt) to whatever you are looking for. It's incredibly quick.

import fnmatch
import os

for file in os.listdir('.'):
    if fnmatch.fnmatch(file, '*.txt'):
        print file
noel
  • 314
0

Another option is Synapse.
Integrates Zeitgeist results.
I have a lot of documents on my system, and was surprised at how fast Synapse was able to find the files I need.

sudo apt-get install synapse

cheers

DrewG
  • 1
0

For a command line option, "silver searcher" is in my opinion simply the best. Far faster than find and awk, and has simpler usage:

ag <path>

Install from ubuntu 14.04

sudo apt-get install silversearcher-ag

Take a look on some speed comparisons against find and awk

https://github.com/ggreer/the_silver_searcher