In this piece, I counter that treating file creation dates as first-class metadata citizens and preserving them upon copying is the more sensible thing to do, and eventually represents the behavior expected by most Mac users.
Archive for the ‘unix’ Category
In my article The State of Backup and Cloning Tools under Mac OS X, I investigated the metadata preservation capabilities of several command-line utilities, among them Apple’s
asr (Apple Software Restore) command-line tool. The same tool is used by the Apple Disk Utility GUI. It seems that its behavior in file-by-file copying mode has changed drastically between OS X 10.4.5 and OS X 10.4.6—see the following table that corresponds to the table in my earlier post. Refer to my earlier post for an explanation of the metadata classes.
|ASR (file mode)||Apple (10.4.5)||x||-||x||?||x||x||x||-||x||x||x||x||-|
|ASR (file mode)||Apple (10.4.6)||x||-||x||-||x||-||x||-||x||x||-||-||-|
Basically, the BSD flags (not sure if they were preserved in 10.4.5), the locked flag, HFS+ extended attributes, and ACLs are not preserved.
The conclusion would be that the
asr tool, upon which many people rely for backup and machine setup purposes, is badly broken in OS X 10.4.6. This would also invalidate my earlier recommendations made here and here. Can readers confirm this behavior? I’ve filed the bug as Radar #4523878.
Update 2006-04-26: Apple has marked the bug as duplicate, effectively acknowledging it is indeed a bug in
Update 2006-07-11: Apple has not indicated any pertinent fixes in the release notes of OS X 10.4.7, and, indeed, I can attest that the situation is unchanged with respect to 10.4.6.
39 comments April 23rd, 2006
Back in the days of OS 9, backing up files was fairly easy. One would just use the Finder to copy files and directories to another volume, and be done. The simplicity, unfortunately, is gone with OS X. Such a simplistic approach is no longer a guarantee to preserve all data faithfully (neither is it a simple or reliable approach for a regular backup procedure). The trouble on OS X is mostly related to metadata, i.e., data about files and directories (such as modification date, file creator/type, Unix permissions, etc.).
Another problem arises when a complete system partition shall be backed up and be bootable later on. Making a backup bootable is not trivial.
Superficially, one could nourish high expectations about the state of backup solutions on Mac OS X, because the underlying BSD Unix core has made all the mature backup and file copying tools that have been developed for Unix systems available on the platform. However, the fly in the ointment is that these tools are generally not aware of Mac OS X metadata and, hence, fail to produce a faithful backup.
This essay will first investigate means to copy files as completely and reliably as possible on Mac OS X, if possible with free and open-source tools. It will conclude with an (incomplete) survey of dedicated backup tools. The tools covered are not only relevant for backup purposes, but also for the case of migrating machines, when the content of one hard drive is to be cloned to another one.
I will not address common features of backup software such as scheduling, backup management, and incremental backups. This piece will be solely about the bare basics of copying files.
The analysis presented here assumes a recent install of OS X 10.4.5 (Tiger) with all updates. The state of backup and cloning tools has already been worse, so there is no need to shoot ourselves into the foot by using outdated tools.
Copying Files under Mac OS X
Paradoxically, copying a file and being sure that all information has been copied is not easy under Mac OS X. However, achieving this goal (generally termed cloning) is obviously paramount for backup purposes. There is one main culprit for all issues: metadata.
Types of Metadata under Mac OS X
A file does not only consist of the file data itself, but also of accompanying information, called metadata. Different operating systems have traditionally supported a wide range of metadata, with many headaches in cross-platform environments stemming from differences in metadata support. Mac OS (X) has traditionally supported rich metadata compared with other operating systems. Underlying support for this metadata is given by the HFS+ file system, which is the successor of the venerable HFS file system.
Classic Unix metadata —
The classic Unix metadata includes the file name (HFS+ supports unicode strings with a maximum length of 255 characters), the file modification date, file owner/group, and the POSIX permissions and file flags (changeable via
chflags, respectively). These types of metadata items are accessible to all usual unix command-line tools. However, some information such as POSIX permissions is not accessible to classic Carbon APIs.
Update 2006-04-23 Owner/group of symlinks — Symbolic links are somewhat special. Their permissions are irrelevant in UNIX systems, so they don’t need to be preserved on copy (nor can they). However, symlinks do have an owner and group that may be different from the file they’re pointing to. In most cases, this information is not too important, but (i) it tells who generated the symlink, and (ii) some software actually makes use of this information, e.g., Apache when the
SymLinksIfOwnerMatch option is switched on. Until OS X 10.4 there was no possibility in Darwin to change the owner/group of a symlink, so it was impossible to preserve this information. However, fortunately Apple added a
lchown(2) call in OS X 10.4.
Finder Flags —
These flags (and data fields) are a relic of OS 9, and are mostly used by the Finder. Apple is still actively using many of these features, although the technology is rather outdated. In these fields, there are a number of binary flags (file invisible, name locked, etc.). Also, the file creator and file type codes are part of the finder flags. These are each 32-bit constants that specify the creating program and file type. Under OS X, file type and creator are usually no longer used, but they are still honored. Finally, there is more, undocumented data used by the Finder for purposes such as file label and icon position. The Finder Flags are partially accessible by the
SetFile command-line tools, which come with the Apple Developer Tools. Up to OS X 10.3 (Panther), there was no means to access the flags from regular BSD APIs.
Creation date — Unlike classical Unix file systems, files on HFS(+) volumes have a creation date. The creation date can be accessed via Carbon APIs. Finder displays the creation date in the “Get Info” window.
Finder Comments — Finder comments (nowadays called “Spotlight comments”) are arbitrary comments that can be affiliated with a file using the “Get Info” window of the Finder. However, these comments are not really stored together with the corresponding file.
Finder comments have, in fact, experienced somewhat of an odyssey throughout the history of Mac OS. These days, they are stored in an invisible file called
.DS_Store in the file’s parent directory. Thus, it is crucial for the preservation of Finder comments to keep the
.DS_Store files when performing a backup.
Resource Forks —
OS 9 (and HFS) has always supported two forks of a file. Information could equally well be stored in the data fork and the resource fork. The normal content of a file, such as seen on Unix or Windows, is in the data fork. The resource fork was used by Apple for storing structured information in a proprietary database-like format. Although the use of resource forks is somewhat deprecated (cf. the infamous Technote #2034), Apple still uses resource forks, for example for storing custom icons and information about the application to be launched when a file is double-clicked. Thus, a proper backup needs to preserve resource forks, even if no classic OS 9 software is used any more. I haven’t tried, but a clone of an OS X system without resource forks probably doesn’t work any more. The problem for backup purposes is that, in general, no Unix tools are aware of resource forks. Apple has always made them semi-available for BSD APIs by the pseudopath
/path/to/file/..namedfork/rsrc, and more recently by a hack to the
xattr mechanism (see below).
In fact, with HFS+ came the possibility of storing an arbitrary number of forks for a file, not only two. As far as I know, this feature is not yet used.
HFS+ Extended Attributes — With OS X 10.4 (Tiger), Apple introduced even more metadata, called HFS+ Extended Attributes. These extended attributes are name:data pairs that can essentially carry arbitrary information. The attributes are accessible via BSD APIs, but not via higher-level Carbon, Cocoa, or Core Foundation interfaces. I am not sure to what extent the extended attributes are actually being used today (except for ACLs, see below). Does anyone have examples?
Access Control Lists (ACLs) —
ACLs are a finer-granular way of setting file permissions. ACLs were introduced in OS X 10.4 (Tiger). Although ACLs are not in widespread use (yet), it is still desirable that a backup tool preserves these permissions. ACLs are stored in HFS+ Extended Attributes, but they are masked out for the
xattr APIs (according to Ars Technica). ACLs must be enabled for a volume before they can be used.
Spotlight Metadata —
For the purpose of searching, Spotlight under OS X 10.4 (Tiger) centrally stores key-value pairs of metadata about files. This data is accessible via the
mdls command-line tool. Spotlight metadata is an orthogonal concept to HFS+ Extended Attributes. Spotlight metadata is extracted from the file contents and stored centrally. To quote Ars Technica:
Yes, Spotlight extracts, stores, and indexes information about file system objects. Yes, this information is properly called file metadata. But this information is extracted from the file contents and traditional file system metadata fields (file name, dates, size, etc.) and is stored in external plain-file indexes.
The only way actual, arbitrarily extensible file system metadata is involved at all is if an application chooses to write extended attributes when it saves a file, and then a Spotlight metadata importer plug-in reads these extended attributes and passes their values off to Spotlight for storage in its index files. At the time of Tiger’s launch, no existing applications or metadata importer plug-ins do this.
Spotlight merely extracts, stores, and indexes file metadata. It does not and cannot be used to add arbitrary metadata to files. It can read a file and add metadata to the Spotlight index on behalf of that file, but the metadata is not “physically” attached to the file itself.
Hence, there is no need (neither is there a way) to backup Spotlight metadata belonging to a specific file.
Update 2006-04-04 As barefootguru rightly points out, Spotlight metadata is used by Apple to store data that is not stored in the corresponding file itself. In particular, Safari saves the download location of downloaded files in the
kMDItemWhereFroms property. One could argue that this is a design flaw, since the information is lost upon copying the file or moving to another partition. Ideally, all Spotlight metadata for a file should be recoverable from the contents of the file. A more appropriate location of permanent metadata would be suitable HFS+ Extended Attributes. I could imagine, however, that this design decision was due to a trade-off between functionality and security/privacy; perhaps Apple didn’t want to create a metadata security nightmare such as it exists for Word documents. The lack of high-level APIs for HFS+ Extended Attributes might also have contributed to the “improper” implementation of this feature.
inode number (a.k.a. file ID) — Each file and directory on a file system is identified by a unique number called inode in the file system catalog. Strictly speaking, this number is not really metadata, and ideally it should be irrelevant for backup purposes. However, Apple, in the OS 7 days, invented an ingenious concept called Alias, which is somewhat of a smarter Unix symlink. An alias not only stores the path to the file or directory it points to, but also the inode number. Thus, if the target is moved in the file system (a situation where every Unix symlink chokes fatally), the target can still be identified uniquely. Ideally, the alias record is updated with new path information once the target is moved, so that a complete set of redundant information is available again. However, there is no automatic mechanism for such an update. Finder sometimes performs an update if an alias is followed explicitly.
The problem in the context of backups now is that once a backup is performed and restored to another volume, the original inode numbers have become obsolete. Thus, every alias has effectively been degraded to a symlink. Once the target file is moved, the alias is worthless. I’ve also seen some cases where aliases would randomly associate with different files. An ideal OS X backup would, therefore, preserve inode numbers; however, this is not possible short of a device-level clone of an entire volume. In many cases, one will have to put up with aliases breaking slowly after a backup restore.
Analysis of Low-Level Copying Tools
Now, on to an analysis of the available tools for copying files on OS X. Ideally, we would wish to have a universal file copying (or even better, synching) tool that preserves directory structure (taken for granted), file contents (also taken for granted), and all categories of metadata described above (very hard to achieve).
In the following table, for commonly used tools, I have depicted what categories of metadata the corresponding tools preserve.
|cp -Rp||Apple (10.4.6)||x||- [e]||x||x||x||x||x||-||(x)||x||x||x||-|
|CpMac -r -p||Apple||x||-||x||-||x||x||x||x||(x)||x||-||-||-|
|ditto||Apple (10.4.6)||x||- [f]||x||-||x||-||x||-||(x)||x||-||-||-|
|rsync -aE||Apple (10.4.6)||x||x||x||-||x||-||- [b]||-||(x)||x||x||-||-|
|rsync_hfs –eahfs -a||x||-||x||(x) [d]||x||- [c]||x||x||(x)||x||-||-||-|
|ASR (dev mode)||Apple (10.4.5)||x||x||x||x||x||x||x||x||x||x||x||x||x|
|ASR (file mode)||Apple (10.4.5)||x||- [g]||x||?||x||x||x||-||x||x||x||x||-|
Update 2006-04-23 added BSD flags, symlink owner
own — owner information for regular files and directories.<br/>
SO — symlink owner information.<br/>
perm — POSIX permissions.<br/>
BF — BSD Flags, which can be set via
chflags (see man page).<br/>
FF — Finder Flags.<br/>
lck — Locked flag (this is part of Finder Flags).<br/>
MD — Modification date.<br/>
CD — Creation date.<br/>
FC â€” Finder comments.<br/>
RF — Resource fork.<br/>
EA — HFS+ extended attributes.<br/>
ACL — ACLs.<br/>
ind — inode.
cp -pis the
cpcommand provided by Apple in Tiger.
CpMacis contained in the Developer Tools. It was known to be even buggier, but some bugs appear to have been fixed recently.
- Finder stands for a drag-and-drop copy operation in Finder.
rsync -aEis the
rsynccommand provided by Apple in Tiger. Apple’s version of rsync has a history of being badly buggy. I wouldn’t trust it for serious backup purposes. Update 2006-04-23 In OS X 10.4.6 and with ACLs enabled, Apple’s rsync fails pretty much completely on me, almost no metadata is preserved; weird error messages such as “file has vanished” appear.
rsync_hfs --eahfs -ais part of the RSyncX package available at http://archive.macosxlabs.org/rsyncx/rsyncx.html.
- SuperDuper is a commercial backup solution available at http://www.shirt-pocket.com/SuperDuper/SuperDuperDescription.html.
- psync is available at http://www.dan.co.jp/cases/macosx/psync.html. Update 2006-04-23 I’ve now tested psync.
- ASR is Apple Software Restore, available as the
asrcommand-line utility and the Disk Utility as graphical frontend. [ Update 2006-04-26:
asr's behavior is fundamentally degraded in OS X 10.4.6, use with care. ]
[a] Finder comments are usually preserved provided that the
.DS_Store files are copied. Only Finder copies individual comments.
rsync_hfs refuses to copy locked files at all. Update 2006-04-23 I couldn’t observe this problem any more under OS X 10.4.6; but I still wouldn’t trust the tool.
rsync_hfs behaves seriously buggy when the
uappnd flag is set on directories.
rsync_hfs doesn’t copy the opaque flag.
[e] filed as # 4523881.
[f] filed as # 4523882.
[g] filed as # 4523924.
Preservation of Ownership
To be able to unconditionally preserve file ownership, a copying engine must be run as
root. All command-line tools and utilities that have explicit authorization facilities support such a mode. The only exception is the Finder, which is usually run under some nonroot user account. Hence, one cannot expect the Finder to preserve file ownership.
Preservation of Creation Date
[clarifying paragraph added 2008-02-07] When the creation time is not explicitly set on a file, it defaults to the modification date. Hence, the nonpreservation of creation dates only manifests if creation date and modification date of the original file were different to begin with. Should you want to reproduce my tests, keep this in mind.
Tiger and Apple’s BSD Tools
With the advent of Tiger, Apple touted that they had enhanced all BSD file manipulation utilities to be metadata-aware; i.e., all utilities such as
tar, etc. should handle metadata transparently. Basically, Apple patched the BSD utilities to fall back to facilities provided in
copyfile.h, which is provided in Darwin’s libc, for copying metadata. copyfile in turn uses the APIs exposed by the
xattr facilities, ergo, the HFS+ extended attributes. Apple also extended the kernel-level file system code to support “pseudoattributes”
com.apple.FinderInfo for extended Finder flags and
com.apple.ResourceFork for the resource fork. That is, though neither Finder flags nor resource forks are HFS+ extended attributes, the kernel exposes this data via the extended attributes (
xattr) API. This is the only way for BSD tools to directly access this data. However, Apple chose only to expose the 32 bytes of
ExtendedFolderInfo structures. The result is that finally finally all metadata is accessible on a BSD API level except for the creation date. Hence, whenever one uses Tiger’s new shiny tools that are based on
copyfile, the creation date gets clobbered. There is no way for a BSD-level tool to preserve the creation date, unless Apple fixes the Darwin kernel. Update 2006-04-23: The bug is filed as # 4506951.
Update 2006-06-27:After considering comment #30 below, it seems that my diagnosis actually comes to the wrong conclusion. With the
setattrlist(2) calls, there actually are BSD-level APIs for accessing the creation date.
It should be obvious from the above that file copying on Mac OS X under the most simple circumstances is already a fairly complex task. In real-world situations, many more issues may appear. For example, Apple’s
rsync is known to have issues with Spotlight, and weird things may happen when copying files that are being written. I haven’t spent any effort in stress-testing the tools; so, there may be more bugs lurking that prevent some of the tools from actually functioning in practice. It is advisable to use only tools that have a good track record of being used for backup purposes and that are actively supported.
The Good, The Bad, and the Ugly
Unfortunately, the state of low-level file copying tools on the Mac is sad. There is no all-round solution. Every tool has its drawbacks.
I’d make the following recommendations:
- Whenever a full device-level backup can be afforded, ASR is the tool of choice, since it guarantees to preserve absolutely all metadata including inode numbers.
- Apple’s new
cptool would be my second tool of choice. The only drawback, as with all
copyfile-based tools, is that the creation date is not preserved. One could see as another downside that
cphas not traditionally been used for large-scale backups on Mac OS X, so there is relatively little experience with the reliability of this solution.
- If a commercial solution is acceptable and no command-line tool is needed, I would recommend the SuperDuper engine. The engine seems to be rather bug-free and preserves all metadata.
The Bad and the Ugly
- Having a reliable
rsyncwould be highly desirable. However, Apple’s
rsynctool has severely fallen short of expectations, with numerous bugs still present, even in the sixth subrelease of Tiger. I wouldn’t trust Apple’s
rsyncfor backup purposes. There are numerous efforts to patch various sources of
rsyncfor Mac OS X compatibility, but I have lost track of the current status.
dittoused to be the workhorse of many backup solutions, among them the venerable CarbonCopyCloner.
dittodoes not preserve the creation date, just as Apple’s
cp. But it also doesn’t copy the locked flag (which is a minor issue). The show stopper, though, is that it does not copy HFS+ extended attributes. Thus, ditto will increasingly become insufficient for backup purposes as the extended attributes are being put to use.
psynchave severe issues that preclude their use for backup purposes.
- Although the Finder copying engine has become good with respect to metadata preservation, it is still not a useful backup tool. The only reason is that file ownership is generally not preserved, unless all files belong to the user running the Finder.
Once a full system backup is restored, it needs to be made bootable. The basic steps are described on Mike Bombich’s site. Most current backup/cloning tools are capable of restoring a bootable backup/clone.
Overview of Dedicated Backup/Cloning Tools
After I have reviewed the available low-level file copying engines above, I will now give a short overview of available backup/cloning solutions. Many of their properties are a direct function of the underlying engine. This is not intended to be a complete review of backup software functionality. I merely discuss the applications’ basic capabilities of faithfully cloning Macintosh files.
Apple Disk Utility
Disk utility comes with OS X.
Pros: Free, easy to use<br/>
Cons: only allows full volume backups
Recommendation: Disk utility is easy to use for a quick volume backup. Since it uses the
asr engine, its metadata preservation properties are generally good. In device-level copy mode, it can’t get better. [ Update 2006-04-23: this observation, if correct, would invalidate my recommendation for
Engine: SuperDuper proprietary<br/> Pros: Probably best cloning engine around, slick user interface, relatively bug-free, has scheduling support, incremental update functionality, great support, active development<br/> Cons: Commercial software; it’s cumbersome to selectively clone only parts of a volume; engine not usable as command-line tool.
Recommendation: The best cloning/backup solution around, but not free.
The veteran among free OS X cloning tools.
Pros: not commercial (donation-ware), easy to deselect subdirectories of volume to be cloned<br/>
Cons: all downsides of the
ditto engine; written in AppleScript studio (i.e., doesn’t feel too polished); no active development; closed source
Recommendation: Still acceptable to use, but one should be aware of the limitations of the
Pros: free, easy to deselect subdirectories of volume to be cloned<br/>
Cons: all downsides of the
rsync_hfs engine; no active development;
Recommendation: Don’t use.
I have no experience with Apple Backup. Feedback would be appreciated.
Engine: ?<br/> Pros: ?<br/> Cons: requires .Mac account
Engine: Finder<br/> Pros: quick and easy<br/> Cons: generally does not preserve file ownership
Recommendation: Use the Finder only to backup small amounts of user data. The Finder is not suitable for backing up an entire volume.
Conclusion and Outlook
In my above list, many backup tools are still missing, such as Unison and Retrospect. I have no further information about these tools at this time. What is also missing is a discussion of archive formats such as tar, cpio, and disk images. Archives can be potentially useful for encapsulating backups and for storing backups on foreign file systems that do not support OS X metadata.
In my eyes, there is no silver bullet for backup and cloning under Mac OS X. If you have any better solutions than the ones presented above, please let me know in the comments.
Update 2006-04-23 I have now posted an extensive analysis of free and commercial GUI-based tools.
119 comments March 5th, 2006
The current version of cvs2svn fails on Tiger with the error message
ERROR: your installation of Python does not contain a suitable DBM module -- cvs2svn cannot continue. See http://python.org/doc/current/lib/module-anydbm.html to solve.
That’s because there’s no suitable db module in Apple’s python install. However, there is an easy solution that works with the python 2.3 supplied by Apple:
install Berkeley db42 via fink (or however you like)
download bsddb3 from http://pybsddb.sourceforge.net
unpack and perform inside the unpacked directory
python setup.py build sudo python setup.py install
This should do the job.
3 comments March 4th, 2006