Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > offsets in a FileChannel ...

Reply
Thread Tools

offsets in a FileChannel ...

 
 
qwertmonkey@syberianoutpost.ru
Guest
Posts: n/a
 
      02-23-2013
What is missing in this code snippet to get the offsets in the underlying
FileChannel on which the MappedByteBuffer and then the CharBuffer are built?
~
CharBuffer.position() gives you the position alright, but how about wanting
to get the actual offset of certain characters in the actual data feed exposed
through the FileInputStream?
~
char c;
long lPsx;
FIS = new FileInputStream(IFl);
FileChannel FlChnl = FIS.getChannel();
MappedByteBuffer MptbChnlBfr = FlChnl.map(FileChannel.MapMode.READ_ONLY,
0, FlChnl.size());
CharBuffer cBfrUTF8 = ChrStDkdr.decode(MptbChnlBfr);
// __
while(cBfrUTF8.hasRemaining()){
c = cBfrUTF8.get();
lPsx = cBfrUTF8.position();
System.err.println("// __ |" + lPsx + "|" + c + "|" + (int)c + "|");
}
// __
FlChnl.close();
FIS.close();
~
Or do you know of any other way to basically do the same thing?
~
thanks,
lbrtchx
comp.lang.java.programmerffsets in a FileChannel ...
 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      02-23-2013
On 23.02.2013 15:11, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> What is missing in this code snippet to get the offsets in the underlying
> FileChannel on which the MappedByteBuffer and then the CharBuffer are built?
> ~
> CharBuffer.position() gives you the position alright, but how about wanting
> to get the actual offset of certain characters in the actual data feed exposed
> through the FileInputStream?
> ~
> char c;
> long lPsx;
> FIS = new FileInputStream(IFl);
> FileChannel FlChnl = FIS.getChannel();
> MappedByteBuffer MptbChnlBfr = FlChnl.map(FileChannel.MapMode.READ_ONLY,
> 0, FlChnl.size());
> CharBuffer cBfrUTF8 = ChrStDkdr.decode(MptbChnlBfr);
> // __
> while(cBfrUTF8.hasRemaining()){
> c = cBfrUTF8.get();
> lPsx = cBfrUTF8.position();
> System.err.println("// __ |" + lPsx + "|" + c + "|" + (int)c + "|");
> }
> // __
> FlChnl.close();
> FIS.close();
> ~
> Or do you know of any other way to basically do the same thing?


UTF8 is not an encoding with a fixed width. You would have to create
more complex code if you want to align char position and byte position.
Basically you need to read the file from the beginning and observe the
width of every char as it is being decoded. You could of course apply
heuristics if you have more knowledge about the file but I guess that
soon gets messy.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      02-25-2013
On Sat, 23 Feb 2013 15:39:08 +0100, Robert Klemme
<(E-Mail Removed)> wrote, quoted or indirectly quoted
someone who said :

>UTF8 is not an encoding with a fixed width.


You could use UTF-16. Then you could interconvert 8 byte and char
offsets. with a simple shift.

You could build a table of interesting byte offsets when you construct
the stream.

You could embed binary counts in bytes/chars at the head of phrases.
You build and take the stream apart with ByteArrayStreams.
--
Roedy Green Canadian Mind Products http://mindprod.com
One thing I love about having a website, is that when I complain about
something, I only have to do it once. It saves me endless hours of
grumbling.
 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      02-25-2013
On 25.02.2013 13:09, Roedy Green wrote:
> On Sat, 23 Feb 2013 15:39:08 +0100, Robert Klemme
> <(E-Mail Removed)> wrote, quoted or indirectly quoted
> someone who said :
>
>> UTF8 is not an encoding with a fixed width.

>
> You could use UTF-16. Then you could interconvert 8 byte and char
> offsets. with a simple shift.


I don't. And he don't either since UTF-16 isn't a fixed width encoding.
http://www.unicode.org/faq/utf_bom.html#gen6
http://www.unicode.org/versions/Unic...h03.pdf#G28070

> You could build a table of interesting byte offsets when you construct
> the stream.


So you would augment the file with an index file. This is certainly not
a general solution as you do not always have the option to transport
that extra data with the file. Plus, aligning offsets while writing
might prove as difficult as when reading (e.g. because of buffering).

> You could embed binary counts in bytes/chars at the head of phrases.
> You build and take the stream apart with ByteArrayStreams.


That's no longer a text document.

robert


--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      02-26-2013
On Mon, 25 Feb 2013 21:50:18 +0100, Robert Klemme
<(E-Mail Removed)> wrote, quoted or indirectly quoted
someone who said :

>So you would augment the file with an index file. This is certainly not
>a general solution as you do not always have the option to transport
>that extra data with the file.


In one application I wrote, on load I compose a temporary RAF from
sequential files with a in-RAM ArrayList of offsets of where records
start. It is a primitive form of hermit crab.

Now that I have RAM and address space to burn, I could put the whole
thing in RAM.
--
Roedy Green Canadian Mind Products http://mindprod.com
One thing I love about having a website, is that when I complain about
something, I only have to do it once. It saves me endless hours of
grumbling.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
offsets in a FileChannel ... qwertmonkey@syberianoutpost.ru Java 1 02-23-2013 08:22 PM
offsets in a FileChannel ... qwertmonkey@syberianoutpost.ru Java 0 02-23-2013 03:36 PM
Best way to get a few bytes from a java.nio.FileChannel'ed file... Spendius Java 0 09-07-2003 11:37 AM
Re: FileChannel.map() gives 'cannot allocate memory' Roedy Green Java 3 08-14-2003 08:44 PM
How come 'FileChannel.map()' returns a DirectByteBufferR ?... Spendius Java 4 07-03-2003 11:23 PM



Advertisments