June 27th, 2006
Yesterday, an interesting discussion (via Avi Drissman) about file metadata began on darwin-dev. It all started with a question about the information that
tar stores in the
._xxx AppleDouble files, but then quickly digressed into a general discussion of file metadata and file creation dates in particular.
File Creation Dates
For those who haven’t followed my blog, I have been occupied with some metadata-related discussion recently. In an earlier post, I gave an overview of the different classes of metadata that are attached to files on Mac OS X, and showed that among others, Apple’s command-line tools and APIs have some serious issues with the preservation of metadata when copying files; the common denominator being that the file creation date is never preserved.
Jordan Hubbard’s Stance
Now, returning to the discussion on darwin-dev: at some point, Jordan K. Hubbard chipped in a very interesting remark, which probably explains the shortcomings of Apple’s tools. Jordan is famous for being one of the founders of the FreeBSD project back in 1993, and Apple has been lucky to hire him a couple years ago as an engineering manager working on BSD technologies. Here’s an excerpt of what he said:
OK, creation date is admittedly special. Let’s call it “filesystem internal” metadata for the purpose of tracking when a file was created. I mistakenly lumped it into my list of stat(2) data due to a faulty memory of the st_ctimespec field in the stat structure, but that doesn’t change the facts. I believe the creation date is not something you’re intended to spoof – it’s when the file was created. If you create a backup file, that backup file will have its own creation date. If you restore from backup in such a way that the original file is deleted and replaced (e.g. a new inode is allocated) then, by all rights, the new file is not the same as the old file and should have its (newer) creation date reflect this. FWIW, this behavior is not unique to MacOSX.
Basically, Jordan claims there is no problem at all — throwing the creation date away when copying files is just OK.
A User-Friendly Definition of Creation Date Semantics
I strongly disagree with that position, and will bring forward some reasons why preservation of creation dates is the right thing to do.
First of all, I haven’t been around in the computing business for as long as Jordan has, nor am I a professional, nor do I have such great experience in designing operating systems; so please bear with me if I am overlooking something significant.
Essentially, the behavior of creation dates is a matter of definition. Definitions cannot be right or wrong, they can just be sensible or useless. Hence, the question will be what the most useful behavior is. Let’s look at the following illustrative example. Say I have a picture taken in 2002, which I downloaded from my camera at that point. So it’s got both its creation date and its modification date (identically) set to that point in 2002. At some point in 2005, I find that the picture is turned 90 degrees because I had turned the camera to portrait orientation, so I go ahead, turn the picture by 90 degrees back in Photoshop, and save the file. Result: Creation date still 2002, modification date 2005. Fine. Now, in 2006, I start reorganizing my files, maybe copying files from one partition to another, duplicating some files for other projects, etc. According to Jordan’s logic, the copies of my picture should have a creation date of 2006, which is when the new inodes were allocated.
From a user perspective, this doesn’t really make sense. The copy of the picture still contains a picture that was taken in 2002, so a naive user should still expect that to be reflected in its creation date. After all, the contents of the file have not changed by the fact that I made a copy of the original file, and even less so the time at which the contents were created. By a simple look at Finder’s File Info window, one can still tell that the image was taken in 2002 if creation dates are preserved.
Why Creation Dates Shouldn’t Be Second-Class Citizens
Now, Jordan’s definition of creation date as a â€œfilesystem internalâ€ beast is consistent (irrespectively of why a new file is created, it’s creation date is always set to now) and easy to implement, but IMO it doesn’t make sense for most use cases. Where is the point of storing the time at which a particular inode was created? What’s the added value of that information for the user? In the “real”, analog world, where it’s impossible to clone things, it makes sense to attach a new creation date to copies of things, because the copy is indeed newer, so it may have a larger lifetime from now on, etc. etc.; but the digital realm brings with it the possibility to clone data such that the copy is indistinguishable from the source, and thus it intuitively makes sense to associate the creation date with the contents of a file, and not with such a technicality as an inode.
Everybody’s been fine without a “filesystem internal”, as Jordan calls it, inode creation time, and I doubt there’s any direct technical use for it. But content-centric creation dates have a direct informational benefit for users. What I’m suggesting is to accept creation dates as first-class citizens in the metadata family, and to have the BSD-level tools support the preservation of creation dates throughout.
Fortunately, Mac OS has a long history of asking what makes sense for the user, and putting the user perspective first, not technicalities. Starting with the first Macintosh file system released in 1984 (MFS, which evolved into HFS a year later), Apple has supported creation dates as full-blown file metadata. I don’t have the hardware to check, but at least for the limited number of OSes that I’ve personally used (going back to OS 7.0), I recall that the Finder has always displayed the file creation date in the File Info box, and also preserved the creation date when copying files. Just what I would consider user-friendly behavior.
Now I’m not an expert in Unix history, but as far as I can tell, support for file creation date is only very recent in most Unix variants, and also the file systems traditionally used by Unix-type systems don’t support creation dates. That seems to be mostly because the Unix founders considered creation dates a flawed concept in itself. Also, creation dates have often been confused with an inode’s
ctime, which traditionally denotes the time at which the contents of the inode itself such as the permissions have been changed. (See this excellent blog post bringing light into the confusion). Anyway, in times where
cat < a > b may not be sufficient anyway for copying files (speaking of extended attributes, resource forks, etc.), the outright damnation of creation dates seems to cease, and creation dates make their way into systems such as FreeBSD.
While the technical perspective of Unix system programmers probably favors a point of view similar to Jordan’s, I guess it’s correct to say that Unix doesn’t have much tradition in handling creation dates, whereas in Mac OS creation dates have lived a peaceful existence for 22 years at least. So even if Mac OS X’s BSD-level tools were to define things such that most other Unix hackers wouldn’t immediately agree with it, there’s not many expectations to be violated on the Unix side; however, the current neligence of creation dates makes Mac OS users suddenly lose a behavior that they’ve learned to appreciate over a long long time.
The Finder still continues to preserve creation dates — so I fail to see why the command-line equivalent of a Finder file copy,
cp -Rp, should have different semantics. We’re of course not even talking of
cp, by default, preserving creation dates, but of
cp with the explicit
-p option given, which essentially tells
cp to preserve as much file metadata as possible. The same is true for
ditto (which is fundamentally broken in 10.4, anyway) and
rsync -aE (which is also broken, in other ways). Even in the case that nobody hears my pleas about the preservation of file creation dates in normal day-to-day file copying scenarios, it should still be possible to roll a file backup solution with at least one of Apple’s built-in tools. Currently, there is a formidable market for third-party backup tools, because Apple does not provide a robust and complete file cloning engine.
I appreciate that it may be a bit hard at first to agree with my arguments if one comes from a geekier background than most Mac users do; also, it may be weird to see that Mac users care about such subtleties (Windows users definitely wouldn’t), but the resonance to my earlier blog posts and on dawin-dev has shown that quite many do.
So, Apple, please, have mercy with the old-time Mac users and fix the broken behavior of the BSD-level tools and APIs. Please continue making Mac OS X the most user-friendly Unix system there is! For the record, this is Radar 4506951.
P.S. The irony with Jordan’s view is that the BSD tools currently don’t even behave as he proposes — upon file copying, the creation date is set to the modification date of the file. Now that’s the most illogical solution of all IMO. What’s the worth of creation dates in such an environment? Setting it to zero would even convey more information about the file’s past…