Choosing Suitable Filenames

Technically, in the EXT4 file system, the only characters that are not allowed is the / and \0 character. I think that : is not allowed as well, in order to separate paths in variables like $PATH.

I\'d like to show you why some of those things are a problem.

Spaces

Spaces are not really special to program, but to the shell, like bash. If you type a command, it will split your input by spaces. The first word is the program, and the next words are the arguments. So if you type the following into Bash:

convert source.png target.jpg

It will parse it to the following:

  1. convert
  2. source.png
  3. target.jpg

The zeroth parameter is the program, that will be called. This works all fine, since you filename does not contain multiple words. Now let see what happens if you have a filename like source file.png. Say you naively type the following:

convert source file.png target.jpg

That will be parsed to:

  1. convert
  2. source
  3. file.png
  4. target.jpg

The solution is to use quotes:

convert "source file.png" target.jpg

This will then be parsed to:

  1. convert
  2. source file.png
  3. target.jpg

If you use variables in Bash, it will look like the following:

source="source file.png"
convert $source target.jpg

Looks good, right? If you know a programming language like Perl or PHP, this might look fine to you. The problem is that Bash will expand the expression $source and then split the line at spaces. That means, that it will parse source file.png as source and file.png. You have to put the variable into quotes:

source="source file.png"
convert "$source" target.jpg

This might look weird, but in Bash, you have to but quotes around the variables! Since that is so weird, a lot of programmers forget this. That leads to various failures in programs. Then, people became careful with spaces. And developers started relying on careful users and did not care about the quotes.

Length

Various file systems limit the length of file names. The limit is especially low, if you have your home folder encrypted with ecryptfs, which limits the length to some 140 characters. If you copy files from a file system which allowed longer names to one which does not, the file names might be cut off of changed in some way.

DOS can only use names that are 8+3 characters long, eight for the file name, three for the file extension. If you used the command line (cmd.exe) in Windows and listed a directory with dir, you might have found strange names like Progra~1. This is a short name for the directory so that old programs can use a longer directory name.

When I sync my files with Unison, it will create temp files which are based on the file name and incorporate a unique id. That id is pretty long, so some files cannot be transfered since the temp file name is too long. Unison does not make shorter temporary names, so I have to keep my file names short.

So in order to be portable, file names should not be too long.

Special Symbols

I have not had problems with unicode characters on my EXT4 file systems. So I can put whatever I want into my file names. The problem start when I upload them to the web or scp them to my Android device. For some reason, it has a problem with copying a file which contains a "?" in the name.

Others have reported that Linux and Mac OS X use different ways of encoding the same character to UTF-8, which makes it look like the same file name. But it is a different file name, actually.

Multiple Periods

A big problem I encountered is with LaTeX. When you try to include a PDF file with more than one period in the file, it does not know which file format it is. So this does not work:

\includegraphics{test.xoj.pdf}

The way I got this working is to rename the file to test.pdf.

Other than that, it should be safe.

Dates and Numbers

Sorting is done letter wise. So first sorted by the first letter, then by the second, and so on. Say you keep a diary with notes, organized by date. Your filenames might be:

  • 10.2.12
  • 1.3.12
  • 9.3.12
  • 10.3.12
  • 11.3.12
  • 12.3.12

Looks good? When I sort them, I get this:

  • 1.3.12
  • 10.2.12
  • 10.3.12
  • 11.3.12
  • 12.3.12
  • 9.3.12

Whenever you write numbers in filenames, use leading zeros! And use a decent date format </articles/date-formats/index>{.interpreted-text role="doc"} as well.

Conclusions

For Users

Keep in mind that programmers might forget quotes in Bash scripts or assume certain properties of your file names.

So keep filenames ...

  • short.
  • without spaces (use _).
  • without special symbols.
  • with a single period.

Unless you are sure that every single program that you use can cope with that.

For Programmers

  • Put quotes around variables in Bash scripts.
  • Prepare for files that are named like this: foo bar;\nrm -rf ~