Sort duplicates within a file based upon identifying the duplicates from the content within the first column

Question

I have a file called "my_file" containing a mix of md5 hashes and file path names. I'd like to be able to identify one duplicate (not sort -u) hash value contained in the first column; also displaying the associated file path in the following column.

Example: From this # cat my_file

NOTE: The hashes or checksums signify I have a high probability of identifying at the same file

Any help would be appreciated

Please don't add text as image. Also I don't really get what you want. — pLumo, Apr 15 '20 at 13:12

Gryu · Answer 1 · 2020-04-15T16:47:03.930

I assume you want to get only one duplicated record without the rest records which do not have duplicates:

awk -F, 'a[$1]++{print $1}' my_file

Example:

$ cat shasums 
804951ce256f190e77baba24f29b6b1890b3e9df  ./bell.wav
793e3a485bd29d1e5a87493fa566624d4742f215  ./output.sh
b35bd58dd07d7e3375dea1aee4c5e73e470a928b  ./package-lock.json
804951ce256f190e77baba24f29b6b1890b3e9df  ./bell.wav
b35bd58dd07d7e3375dea1aee4c5e73e470a928b  ./package-lock.json
b35bd58dd07d7e3375dea1aee4c5e73e470a928b  ./package-lock.json1
d1847e16de5717f9a35eab98f974c20a867019eb  ./shasums

$ awk -F, 'a[$1]++{print $1}' shasums 
804951ce256f190e77baba24f29b6b1890b3e9df  ./bell.wav
b35bd58dd07d7e3375dea1aee4c5e73e470a928b  ./package-lock.json

Sort duplicates within a file based upon identifying the duplicates from the content within the first column

1 Answers1