2

I am using Ubuntu server 20.04 for OwnCloud and backups of multiple networked computers. I know I have many duplicate files on the server and would like to have one database of all files so I can quickly decide those to keep or delete.

I was thinking of using find and then loading them into mysql along with path,date, and size info. I'm looking for an easy way to load the data and am pretty familiar with SQL from there.

mchid
  • 43,546
  • 8
  • 97
  • 150
  • How would you define a "duplicate file"? MySQL may not be the best way to list dupes as you may have multiple configuration or profile files for different applications that share the same name. Or are you trying to find duplicate files in the OwnCloud directories so that you can have multiple MySQL entries point to the same source file on disk? – matigo Jan 04 '23 at 02:25
  • Have you seen this similar question? If so, please let us know what you think or if you've considered or tried any of these methods. – mchid Jan 04 '23 at 04:12
  • I tried fdupes and there was another utility but one seemed to take forever with one directory. I might not have given it much of a chance. I'd like to find things like 'picture_backup' vs 'picture backup' vs 'backup pictures', any of which might contain the same data. That's why I was thinking if I had an SQL directory of this system I'd have more flexibility. There are about 50TB of files on this server now, many 400GB backups of pictures with the same data but different dates, some music, large VOB files, typical data dump in ZFS pools that got out of hand. – CncJerry Jan 04 '23 at 04:25
  • Concerning find printing paths, sizes and modification dates of files, you cad try find -type f -printf "%p %k %Ts\n" ... see the (-printf format) part in man find – Raffa Jan 04 '23 at 04:32

0 Answers0