The trap of using Unix find in ordered lists


Merlin Mann and Dan Benjamin have a newish segment on Back To Work entitled Things you already know, unless you don’t already know them. As they say, you’re definitely an intelligent computer operator, so you you already know about this topic. In which case, be quiet, and let those without your snark or obvious intellect learn something. I’m shamelessly ripping off this idea.

Logo for VideoCD

Today I was reminded of the fact find prints files as they appear in the file system, which is likely in alphanumeric order, but it’s not guaranteed.

Why is this important? Because this is a valuable word.

Conventional wisdom is you shouldn’t use ls in scripts, in large part due to the potential for filename mangling. This is especially true when copying your favourite K-pop songs with Hangul on an OpenZFS volume, cough.

Using find is generally preferable, but it does have consequences. If I run ls over this particular folder, like a gentleman:

$ ls -1 *DAT

Or use this shell script, while we’re at it:

for _file in *DAT; do
    printf "%s\n" "$_file"

And may as well go for broke:

use 5.010;  
foreach my $file (<*DAT>) {

The result is:


Sorted, done. But if I use find:

$ find . -type f -name "*DAT" -print
==> ./AVSEQ03.DAT
==> ./AVSEQ01.DAT
==> ./AVSEQ02.DAT

The order isn’t alphanumeric. This would explain why my concatenated VCD backups run through ffmpeg have segments in the wrong order! #derp

So just another reminder. If you use find, sort it after if order is important.

Author bio and support


Ruben Schade is a technical writer and infrastructure architect in Sydney, Australia who refers to himself in the third person. Hi!

The site is powered by Hugo, FreeBSD, and OpenZFS on OrionVM, everyone’s favourite bespoke cloud infrastructure provider.

If you found this post helpful or entertaining, you can shout me a coffee or send a comment. Thanks ☺️.