Bash and Spaces in File Names
On LinkedIn one can specify skills that one has as a collection of keywords. Contacts can verify these skills by vouching that one has them. A recently added feature is that one can take a 15 question multiple choice test and show a badge if the test result lies in the 0.7 quantile or above. In principle a nice idea.
The tests for C++, Python, R and Git seemed sensible, The one about Bash was rather well for the most part, except for one question:
In order to write a script that iterates through the files in a directory, which of the following could you use?
for $ls; do β¦; done
for $(ls); do β¦; done
for i in $(ls); do β¦; done
for i in $ls; do β¦; done
Well, the third one will get the job done under a lot of dangerous assumption from the programmer, so it likely is the βcorrectβ answer. But is is terribly brittle and I would never accept that in any code that I review. Let's take a look at this and make it fail.
Take the following test program:
for i in $(ls); do echo "Processing '$i'" done
Next we create a simple four files in our current directory, such that the directory tree looks like this:
. βββ a βββ b βββ c βββ d
And when we run that simple Bash script in this directory, the output is exactly what we would expect:
Processing 'a' Processing 'b' Processing 'c' Processing 'd'
Now I just add a subdirectory with some more files to the mix. We then have the following structure:
. βββ a βββ b βββ c βββ d βββ subdirectory βββ 1 βββ 2 βββ 3 βββ 4
And the output of the script is this here:
Processing 'a' Processing 'b' Processing 'c' Processing 'd' Processing 'subdirectory'
Well, the question asked to iterate through all the files, so we already have one example where we broke this approach. In order to make sure that they are files, we could add some test to it:
for i in $(ls); do if ! [[ -f "$i" ]]; then continue fi echo "Processing '$i'" done
This way we skip everything that is not a simple file. That will also skip symlinks, fifos, sockets and devices. One has to be a bit careful how to test. One could also just skip directories if that is the intention.
Another crucial point is that Bash does splitting by spaces in an awful lot of
places and the output of ls
is meant for the human reader. So I have added a
file called two words
(with a space) into the directory, it now looks like
this:
. βββ a βββ b βββ c βββ d βββ subdirectory β βββ 1 β βββ 2 β βββ 3 β βββ 4 βββ two words
When running with the original code we get the following output:
Processing 'a' Processing 'b' Processing 'c' Processing 'd' Processing 'subdirectory' Processing 'two' Processing 'words'
Oops! There are two loop iterations dispatched for the single file. Therefore all file names which contain a space will break this program.
But we can also become even malicious. I've added a file -n ..
. These are our
files now:
. βββ a βββ b βββ c βββ d βββ -n .. βββ subdirectory β βββ 1 β βββ 2 β βββ 3 β βββ 4 βββ two words
The output is now this, which is still innocent.
Processing 'a' Processing 'b' Processing 'c' Processing 'd' Processing '-n' Processing '..' Processing 'subdirectory' Processing 'two' Processing 'words'
But let's do a simpler echo
statement like this:
for i in $(ls); do echo $i was processed done
The output is not quite what one would expect:
a was processed b was processed c was processed d was processed was processed.. was processed subdirectory was processed two was processed words was processed
Well, the file with name -n
certainly has a legal name on an EXT4 file
system. But the way that the code is written the line echo $i was processed
will have the variable substituted to echo -n was processed
which tells Bash
to print βwas processedβ without a trailing newline. And that is what we get
there.
Now think of what might happen when we just have rm $i
as our loop body. Then
there will be two iterations, one with rm -n
and another with rm ..
.
Luckily rm
does not remove directories by default. But if the loop body was
rm -rf $i
, then the user would be in serious trouble and have the parent
directory deleted if a file containing ..
with spaces is present.
So how do we fix that? Never parse the output of ls
! Also read the Bash
Pitfalls to see more dangerous code
patterns. We use the following:
for i in ./*; do echo "$i" was processed done
This solves a few issues at the same time:
-
Using a glob means that the file names that are generated from the pattern are not subject to whitespace splitting and therefore filenames with spaces are not a problem.
-
The leading
./
prevents file names starting with a hyphen to be interpreted as command line options to the programs. Most programs support the--
to signal the end of command line arguments, but some don't. So it is easier to just have the leading./
. -
The quotes at
"$i"
prevent whitespace splitting of the content of the variable. This is a very strange quirk in Bash and one has to look out of it all the time. -
The sorting of the files that
ls
returns depends on user preferences and the locale. This means that the order can be different depending on the user platform. With globs it still depends on the locale, but one can try to control that from the script.
Using these precautions the user can use whatever file names that they want. Our script stays robust and does not fail eventually.
LinkedIn allows to send feedback if there is some mistake with the question. But you better do it quick because the time still runs and the question will count as failed if you did not answer it in 90 seconds. I hope that they will pick up the suggestion and do not show bad code as the correct answer.