Friday, October 10, 2008

removing duplicate files with under bash using fdupes and accounting for spaces

Basically, I've been lazy.  I've been trying to organize a file structure, been unhappy with it, copied it, tried again, been unhappy, copied . . . and ended up with about six copies of this folder each with various additions made to it at various subfolder levels that I need to account for.

Now, I could simply do a recursive find and look for duplicate file names, but the other part of my being lazy is a lot of files in various subfolders are named simply with the very imaginative format of the date I wrote the file.  What I wanted was to compare md5 checksums to be sure they were truly identical.

While I use Windows at work and could whip of a powershell script to handle this on Windows, I run OS X for my personal machine.

I've tinkered with bash scripting before, enough to know that what I want to do should be doable in one line with the help of the great utility fdupes. (fdupes can be installed on OS X easily if you have MacPorts installed, simply call "sudo port install fdupes")

I whipped up the following and was proud of myself:
for i in $(fdupes -rf ./); do echo deleting: $i; rm $i; done
(of course the "rm" was done after making a backup ;-) )

What I noticed was this worked fine for all files except those with spaces in the name.  I apparently some files that were saved with the first sentence as the filename, which the output of this command confirmed as:
deleting: some
deleting: file
deleting: with
deleting: spaces.txt
After some head scratching thinking about piping through awk or escaping out the spaces, I found this on a google result:
If you set IFS to $'\n' then it will only split on newlines, not spaces.
Could it be that simple?  YES!  My completed commands:
IFS=$'\n'
for i in $(fdupes -rf ./); do echo deleting: $i; rm $i; done

0 comments: