1

So, I have allot of files (167k) and now they are now in proper order, thanks to Serg's script in here - https://askubuntu.com/a/686829/462277 .

And now I need to find gaps between filenames, the diference should be 15 and more

Aaaa.bb - 000002 tag tag_tag 9tag  
Aaaa.bb - 000125 tag tag_tag 9tag  
Aaaa.bb - 000130 tag tag_tag 9tag  

They all start the same and have different endings.
Everything is in external HDD.

Ceslovas
  • 37
  • 6
  • So you want to know if there is a difference between numbers in file names ? For example there's file with number 000125 and the next closest one is 000135. Something like that ? Or you want to have a list of the missing numbers ? List i think would be easiest to implement – Sergiy Kolodyazhnyy Oct 18 '15 at 19:52
  • We need more information, as Serg mentioned. What should happen if the gap is < 15? – Jacob Vlijm Oct 18 '15 at 20:53
  • List of missing numbers would be best. If its less than 15 we can ignore, but if 15 and more we need to compile a list of those files. – Ceslovas Oct 18 '15 at 21:32

2 Answers2

1

a version in python (python3 to be precise).

save the program below under the name diff_filename.py (make it exectuable) and use it in the following way:

$ ./diff_filename.py the/directory/containing/the/files

the program assumes that the numbers you want to compare are always in the same position of the filename (indices 10:16).

as it is now it's pretty verbose and prints out correct filenames including the difference. as soon as it hits a filename that does not respect the minimal difference it prints that out and stops.

here's the source code:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

'''
usage: ./diff_filename.py the/directory/containing/the/files
'''

import os
import  sys

MIN_DIFF = 15

the_dir = sys.argv[1]
sorted_files = sorted(os.listdir(the_dir))

last_number = None
last_file = None
for current_file in sorted_files:
    current_number = int(current_file[10:16])
    if last_number is None:
        last_number = current_number
        last_file = current_file
        continue
    diff = current_number - last_number
    if diff < MIN_DIFF:
        print('fail! "{}" and "{}" do not respect MIN_DIFF={}'.format(
            last_file, current_file, MIN_DIFF))
        break
    else:
        print('ok! "{}" and "{}" diff={}'.format(last_file, current_file, diff))

    last_number = current_number
    last_file = current_file
1
find . -maxdepth 1  -type f -regextype posix-awk -iregex ".*[:digit:]"| sort | awk '{  if ( ($3 - previous) > 15 ) print previous"**"$3}{ previous=$3 }'

The code above uses find command which matches all of the files in the current directory that contain digits in it, sorts them, and passes on to awk. awk goes through the list, stores each number from field 3 into variable previous and on the next item compares previous with current number

Sergiy Kolodyazhnyy
  • 105,154
  • 20
  • 279
  • 497