Blue Blazer

Achieve perfection by constant effort and creative will.
Code every day, no matter what...

Portable Meta-Information again...

Sth. about Portable Meta-Information storage with File Forks/Resource Forks.

Portable meta-information has been discussed twice recently on planetkde:
http://www.kdedevelopers.org/node/3923
http://zwabel.wordpress.com/2009/03/29/portable-meta-information/

Portable meta information can be thought as File's meta information is not stored centrally (they may still be indexed in a central database to optimize query).


It introduces some benefits:

1. You can transfer the files around without loose the meta informations, I only concern move files locally now.

2. When you re-install your system or your central database broken, as long as your files are still there, their meta information aren't lost.

3. Everything is file, as long as we standardlize the file format, apps/libraries needn't to learn nepomuk query, manipulate database to perform basic meta info operation (like backup, remove etc....)


My thought is realistic:
1. Will nepomuk really be more usable with portable/distributed stored meta information ?
I don't know much of nepomuk's server implementations, afaik it seems use a central database to index/store these information, so I've a question to ask : when user move his file to another local folder, will the file's meta information get lost , updated later(next time to index) or updated instantly (use inotify )? The first result(lost) is totally un acceptable, the second is OK, but nepomuk's value/power will be limited, the third is good, just stick to central storage.

2. How to implement it ?
someone has suggested to use side-car file, other suggested xattr.
There is a seems better way to implement it : file forks/resource forks (not process fork),
http://en.wikipedia.org/wiki/Fork_(filesystem)
http://en.wikipedia.org/wiki/Resource_fork
It's invented by Apple , store text file's encoding, applications' icon ...

Quote:
Apple's HFS, and the original Apple Macintosh file system MFS, were designed to allow a file to have a resource fork to store metadata that would be used by the system's graphical user interface (GUI), such as a file's icon to be used by the Finder or the menus and dialog boxes associated with an application.

It tightly bonded/embed to a file unlike side-car file, but also bypass the size limit of xattrs(Extended File Attributes), the size limit is the largest file you can create, and you can also assign serveral meta info file to one file.
This is a practise proof method,but only modern filesystem support it:
Only Apple's HFS, Microsoft's NTFS, Solaris's ZFS has full support.


And extensively used by Apple(store all sorts of metadata) and Microsoft(store its system backup related infomation, security control info).


Something need to mention, that Mac OS X's unix command line utilities (cp, mv, ....) can handle file with resource/file forks correctly. And Microsoft name it alternate data stream (ADS).

You can refer their dev docs to see how they design the api.


Oops the main linux file system, ext2/3/4, XFS, JFS... doesn't support it well.


3. Is it possible/hard to implement it under linux ? (you can skip the following paragraphs if you're not interested in implementing such things in kernel)


In my personal opinion, not hard indeed, i want :) and asume it to be implemented in VFS level (so all sorts of filesystem beneath this level gain support).
The simplest way is to add a "dentry" to each "general inode", that from the filesystem's view(not user visible), a regular file can be associated with a "directory" too, all metadata files resides under that "directory".
like this:
/home/xx/xxx/a.jpeg ----> nepomuk.xml

user.encoding

user.img.source.url

....
And make getfattr/setfattr related system call to lookup that dentry too, this method also remains backward compatible with xattrs (Extended file attributes), but remove the size limit put on them.

Or we can add new system call like getfmeta/setfmeta ....

Of course, we need more analysis/profiling to say sth. on the memory/time performance.
:) Just make a predication, the extra memory requirement it introduces(an extra "dentry" pointer) is affordable, and there's no/very subtle extra time needed by open/write ..regular system call, and it will be fast than create side-car files, since we needn't to create/open the side-car files from user space and pollute a directory with side-car file per data file, kernel handles it ..

Also need some security concerns.

Anyway, to implement portable meta information, i think we need support from underline library/filesystem/architecture.


4. This method's disadvantages:


a.We have no POSIX standard to define the API, and Apple's HFS, Microsoft's NTFS and Solaris's ZFS use different api for similar functionalities. We may need to abstract the api to make KDE cross platform.

b.No major filesystems under linux provide file forks / resources forks support now. Need to implement it or make feature requirement.

c.We can't directly associate a xml as a file's meta info, since xml is not an appendable format, data corruption may happens when we update this file's meta info while user eject the source disk, suddenly power goes off ,etc.......... These problems need to be solved if we want to support portable/distributed meta information.

---------------------------------------------------------------------------------------

Thanks for reading..


Regards,

4 comments:

Saem said...

This is the most sane suggestion I've heard to date on the matter.

Unknown said...

You should post things like that to the nepomuk kde mailling list.

Its seems to get lost here quite fast and discussing things in blog comments doesn't work very well.

So please (also) post it there.

Anonymous said...

IMO this is wrong.

Or else, you at least need a cross-platform abstraction. Probably a POSIX standard.

You'll not want 3 ifs/elses for all the filesystem stuff because KDE is now cross-platform and needs to be supported on Linux/Unix/Windows.

So please, I think going that low-level, without a standard is the wrong way.

So what can we be left with:
1) Cross-platorm database like sqlite. Nepomuk already does that. But that's bad because that is centralized.
2) Embed metadata to the data itself. But does all data type have a standard defining room for metadata ?
3) associate a foreign file (like .metadata). This, IMO, looks like the most practical and portable.

Wang Hoi said...

I mean use File folks standard to embed metadata into file itself.
KDE on mac can use HFS's.
KDE on windows can use NTFS's.
THe problem is their APIs are different, and under linux , it is even no possibility to do so......

Anyway, i'll continue to hack kernel and nepomuk server to add this feature, at least use it on my box as experiment to see what happens.