5

I use the following code at the end of one of my scripts to tally up the number of files I have processed and moved into that directory.

# Report on Current Status
echo -n "Cropped Files: "
ls "${Destination}" | wc -l

My problem lies with how I handle duplicate files. As of right now, I check for the file's presence first (as my script is destructive in nature to the source files I am processing). If it senses a file of that name already processed, I alter the filename as follows.

Duplicate file: foo.pdf

Changed name: foo.x.pdf

If there is a foo.x.pdf, then I rename again to foo.xx.pdf. Repeat as necessary. I intend to go in later and evaluate each 'version' and select the best one to keep on hand. But herein lies my problem. I would like to count the number of files that do not contain .x. .xx. and so on. How do I strip these out of the ls output so wc -l can count the unique files only?

TL;DR: How do I get the count of files in a given directory that do not contain a given substring in their filename?

wjandrea
  • 14,236
  • 4
  • 48
  • 98

3 Answers3

9

To find the number of files in a directory that do not contain .x.pdf, try:

find "${Destination}" -mindepth 1 ! -name '*.x.pdf' -printf '1' | wc -c

To find the number of files in a directory that do not contain period - one or more x - period - pdf, try:

find "${Destination}" -mindepth 1 ! -regex '.*\.x+\.pdf' -printf '1' | wc -c

The above search recursively through subdirectories. If you don't want that, add the option -maxdepth 1. For example:

find "${Destination}" -mindepth 1 -maxdepth 1 ! -regex '.*\.x+\.pdf' -printf '1' | wc -c

Note that because we use -printf '1', this method is safe even if the directory contains files whose names contain newline characters.

John1024
  • 13,687
  • 43
  • 51
  • 1
    Altered your second example and tested it. Works! Thank you. find "${Destination}" -mindepth 1 ! -regex '.*.x+.pdf' -printf '1\n' | wc -l – Aaron Nichols Feb 09 '18 at 21:17
  • @DavidFoerster Yes, that does seem simpler. Answer updated to eliminate \n. Thanks. – John1024 Feb 21 '18 at 18:23
2

Without subdirectories:

echo $(($(for file in *.sh ; do echo -n 1+; done; echo 0;)))

because:

for file in *.sh ; do echo -n 1+; done; echo 0;
1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+0
user unknown
  • 6,507
  • 1
    I see how this counts files in a directory but how does it avoid counting the files that the OP doesn't want to count: "But herein lies my problem. I would like to count the number of files that do not contain .x. .xx. and so on"? – John1024 Feb 10 '18 at 00:31
  • 1
    @John1024: Count all files, count all files with .x*.pdf, subtract. – user unknown Feb 10 '18 at 00:51
  • user-unknown, OK. Very good. – John1024 Feb 10 '18 at 01:46
0

You can exclude a file or files that match to a pattern from the ls command by using (one or more times) the option -I, --ignore=PATTERN (reference):

ls -I "*.x*.pdf" "${Destination}" | wc -l

Or you could use the subtraction method in this way:

echo $(($(ls "${Destination}" | wc -l) - $(ls "${Destination}"/*.x*.pdf | wc -l)))
pa4080
  • 29,831