9

I'd like to scan my hard drive for all compressed file collections like zip, gzip, bzip, and others and have the content of those searched for certain file types (such as images). Anti-virus' do it, so I believe there should be a way.

6ft Dan
  • 313
  • 1
  • 3
  • 9

2 Answers2

17

The simplest approach would be to list the contents of the archive and look for files of the relevant extension. For example, with a zip file:

$ zip -sf foo.zip | grep -iE '\.png$|\.jpg$'
  file1.jpg
  file1.png
  file2.jpg
  file2.png

The -sf option tells zip to list the files contained in an archive. Then, the grep will look for a .png or .jpg that are at the end of the line ($). The -E enables extended regular expressions, so we can use | as OR and the -i makes the matching case insensitive.

However, each archive tool has a different command to list the contents. I've written a script that can deal with most of the more popular ones. If you save that script as list_compressed.sh, you could then run:

list_compressed.sh | grep -iE '\.png$|\.jpg$|\.jpeg$|\.gif$|\.tif$|\.tiff$'

That would show you the most common image types. Note that this approach assumes that the file type can be determined by the file's extension. It will not find image files that don't have an extension and it will not recognize files with the wrong extension. There is no way to deal with that without actually extracting the files from the archive and running file on each of them.


If you want to find all archives that contain image files on your hard drive, combine the above with find:

find / -name '*.gz' -o -name '*.tgz' -o -name '*.zip' -print0 |
    while IFS= read -r -d '' arch; do    
        list_compressed.sh "$arch" | 
            grep -qiE '\.png$|\.jpg$|\.jpeg$|\.gif$|\.tif$|\.tiff$' &&
                echo "$arch contains image(s)"
    done

The find command will search for all .gz, .tgz or .zip files (you can add as many extensions as you like), those are then passed through my script. The -q suppresses grep's normal output, nothing will be printed. The && echo will print the archive's name only if the grep was successful.

terdon
  • 100,812
  • According to my original question I'd like to "scan my hard drive for all compressed file collections, like zip, that contain images". You've helped for looking into the archives themselves, but I just want to identify which archives contain images. – 6ft Dan Jun 08 '15 at 14:29
  • @6ftDan sorry, I hadn't seen the original. Please feel free to roll back or re-edit any edit that changes the meaning of your post. See updated answer for how to search the entire file system. – terdon Jun 08 '15 at 14:43
  • Great, but since you're grepping case-insensitively maybe you want to also search case-insensitively? – kos Jun 08 '15 at 14:56
  • @kos hmm, that's easy enough to do just change -name to -iname. However, there's little point to it, many compression programs (gzip, for example) need the specific extension. GZ won't work. – terdon Jun 08 '15 at 15:13
3

Not as advanced as terdon, but this will do:

Save the following code, in a folder where all your code resides in, as finda.sh, or any other name as you like:

for file in *.*; do
    if ( 7z l -slt "$file"> /tmp/$file.log); then
       echo $file:; cat /tmp/$file.log | grep -iE 'Path*'> $file.log && cat $file.log
    fi
done

Then in a dir were all of your archives are in, run it and this is the output:

./finda.sh 
one.7z:
Path = one/abradabra.png
Path = one/birb.png
three.rar:
Path = three/blah.png
Path = three/qwa0g.jpg
two.zip:
Path = two/whut.png
blade19899
  • 26,704
  • According to my original question I'd like to "scan my hard drive for all compressed file collections, like zip, that contain images". You've helped for looking into the archives themselves, but I just want to identify which archives contain images. – 6ft Dan Jun 08 '15 at 14:29
  • @6ftDan That, I think is possible, but may take a while. In the mean time, I added some improvements to my script, whit the help of terdon. – blade19899 Jun 08 '15 at 14:35
  • Note that *.* will only match files with an extension. Also, this will list all files in all archives, you're not testing for any file type. – terdon Jun 08 '15 at 14:44