Below here's a find
, sort
and awk
one-liner.
Basic idea is to list files, sort them numerically (which works, unless Aaaaaaa.bbb and tags are themselves are numbers), and then let awk store each 3rd field of filenames into prev
variable, and compare it with current value of field 3. If they match, print a message.
find . -type f -print | sort --numeric | awk '{if(prev == $3) print $0" is duplicate of "$prevEntry}{ prev=$3; prevEntry=$0}'
Below is a small demo:
$ seq 6 10 | xargs printf "%07d\n" | xargs -I {} touch "Aaaaaaa.bbb - {} tag 9tag"
$ seq 00001 00020 | xargs printf "%07d\n" | xargs -I {} echo "Aaaaaaa.bbb - {} tag tag_tag 9tag"
$ find . -type f -print | sort --numeric | awk '{if(prev == $3) print $0" is duplicate of "$prevEntry}{ prev=$3; prevEntry=$0}'
./Aaaaaaa.bbb - 0000006 tag tag_tag 9tag is duplicate of ./Aaaaaaa.bbb - 0000006 tag tag_tag 9tag
./Aaaaaaa.bbb - 0000007 tag tag_tag 9tag is duplicate of ./Aaaaaaa.bbb - 0000007 tag tag_tag 9tag
./Aaaaaaa.bbb - 0000008 tag tag_tag 9tag is duplicate of ./Aaaaaaa.bbb - 0000008 tag tag_tag 9tag
./Aaaaaaa.bbb - 0000009 tag tag_tag 9tag is duplicate of ./Aaaaaaa.bbb - 0000009 tag tag_tag 9tag
./Aaaaaaa.bbb - 0000010 tag tag_tag 9tag is duplicate of ./Aaaaaaa.bbb - 0000010 tag tag_tag 9tag
Aaaaaaa.bbb - 0000002 tag 9tag
would be a duplicate ofAaaaaaa.bbb - 0000002 tag tag_tag 9tag
because of0000002
, correct? – kos Nov 05 '15 at 22:35foo.bar - XXX
and the name isfoo
? Will there always be an extension? Will the space before the-
always be the first space in the file name? – terdon Nov 10 '15 at 15:43