Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > A proposal to handle file encodings

Reply
Thread Tools

A proposal to handle file encodings

 
 
Peter J. Holzer
Guest
Posts: n/a
 
      12-02-2012
On 2012-11-29 02:22, Martin Gregorie <(E-Mail Removed)> wrote:
> On Tue, 27 Nov 2012 19:51:37 +0100, Peter J. Holzer wrote:
>> The problem is that this just doesn't fit into the Unix system call
>> scheme. There is no "copy" system call. The kernel just sees that a
>> process opens one file for reading and another file for writing. It
>> cannot assume that this process wants to copy the metadata from the
>> first to the second file.
>>

> Of course.
>
>> Of course Linux could introduce such a system
>> call, but then those umpteen utility programs and libraries would still
>> have to be modified to use that new system call.
>>

> I can see two ways of handling it:
>
> (1) introduce a pair of systems calls to retrieve and store the metadata
> associated with a file,


There are of course already system calls to do that (how else would you
get at the data?). There are four of them (list, get, set, remove),
however, not two, so ...

> and, yes, programs would need modification, but the amount would be
> trivial because you'd be looking at one extra line of code per file
> involved in the metadata transfer.


.... it's 3 extra lines, not 1. Not including error handling, of course.

But I don't think that's the problem. The problem is that a) you have to
do it and b) you have to think about how to do it. Plus there is no
consensus that it should be done at all (user_xattr isn't even enabled
by default on ext*). Microsoft and Apple have it easier: If they say
that some information has to be stored in an alternate stream/resource
fork, programmers will do it. Linux has no central authority which can
force programmers to do anything.


> (2) alternatively it may be possible to do the job by adding a mode or to
> to the file opening operations.


You mean an optional 4th parameter to open(2)?

> If they were defaulted appropriately, many programs could silently
> copy the metadata along with the data


I still don't see how that could work. That implies that the kernel
somehow guesses that you want to use the metadata from some file you
opened for reading for the file you are just opening for writing. While
that would be the right behaviour for "cp" or similar programs, it doubt
it would be right for the majority of programs.

It also raises the question of what the kernel should do if the process
doesn't have the necessary privileges to set some xattrs (or if the file
system doesn't support them). Fail? Silently drop them? I don't think
the kernel should make that decision. It's up to the application to
decide what's sensible ("mechanism, not policy" was a guiding principle
in the design of the Unix system call interface).

> and/or automagically apply the appropriate transforms, such as charset
> transforms, during the transfer.


That again makes no sense at the unix system call interface which deals
only with byte streams.

It does however make a lot of sense for higher level interfaces. So
it might be a good idea for java.io.FileReader to check the user.charset
xattr of the file and apply the appropriate encoding.


> Thinking about it a little more, (2) is definitely the best solution
> because it would be rather useful to be able to default the metadata
> applied to a new file with a similar mechanism to that used for the
> permission bits.


umask(2) is actually pretty broken IMHO.

hp


--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
| | | http://www.velocityreviews.com/forums/(E-Mail Removed) | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
 
Reply With Quote
 
 
 
 
Peter J. Holzer
Guest
Posts: n/a
 
      12-02-2012
On 2012-12-02 19:36, Martin Gregorie <(E-Mail Removed)> wrote:
> On Sun, 02 Dec 2012 13:02:27 +0100, Peter J. Holzer wrote:
>> That again makes no sense at the unix system call interface which deals
>> only with byte streams.
>>

> But, by definition, if you were using metadata to control the character
> encoding (which is where this discussion started) or to define the file
> as containing keyed, fixed field records, you would not be trying to
> write a byte stream.


We were obviously talking past each other. I was only talking about
mechanisms like xattr, alternate streams or resource forks, not about
revamping the whole unix file model.

hp

--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
| | | (E-Mail Removed) | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
read from file with mixed encodings in Python3 Jaroslav Dobrek Python 2 11-07-2011 02:42 PM
Patch to pydoc (partial) to handle encodings other than ascii w.m.gardella.sambeth@gmail.com Python 0 05-29-2007 02:55 AM
how to write file with cp1250 encodings? Grzegorz Smith Python 3 03-03-2006 02:33 PM
Possible to handle web requests without an ASPX page? i.e. have DLL handle request. jdlwright@shaw.ca ASP .Net 2 05-31-2005 05:42 PM
File Handle Reading Blues: Rereading a File Handle for Input Dietrich Perl 1 07-22-2004 10:02 AM



Advertisments