0

I have a file called "my_file" containing a mix of md5 hashes and file path names. I'd like to be able to identify one duplicate (not sort -u) hash value contained in the first column; also displaying the associated file path in the following column.

Example: From this # cat my_file

enter image description here

NOTE: The hashes or checksums signify I have a high probability of identifying at the same file

Any help would be appreciated

graham
  • 10,436

1 Answers1

0

I assume you want to get only one duplicated record without the rest records which do not have duplicates:

awk -F, 'a[$1]++{print $1}' my_file

Example:

$ cat shasums 
804951ce256f190e77baba24f29b6b1890b3e9df  ./bell.wav
793e3a485bd29d1e5a87493fa566624d4742f215  ./output.sh
b35bd58dd07d7e3375dea1aee4c5e73e470a928b  ./package-lock.json
804951ce256f190e77baba24f29b6b1890b3e9df  ./bell.wav
b35bd58dd07d7e3375dea1aee4c5e73e470a928b  ./package-lock.json
b35bd58dd07d7e3375dea1aee4c5e73e470a928b  ./package-lock.json1
d1847e16de5717f9a35eab98f974c20a867019eb  ./shasums

$ awk -F, 'a[$1]++{print $1}' shasums 
804951ce256f190e77baba24f29b6b1890b3e9df  ./bell.wav
b35bd58dd07d7e3375dea1aee4c5e73e470a928b  ./package-lock.json
Gryu
  • 7,559
  • 9
  • 33
  • 52