Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > .read() returns a char why?

Reply
Thread Tools

.read() returns a char why?

 
 
JM
Guest
Posts: n/a
 
      12-12-2007
Why do the Java Reader classes (File/Buffered/Stream) etc .read()
methods return an int not a char?

For example the javadoc for BufferedReader

.... jdk1.6.0_03/docs/api/java/io/BufferedReader.html declares

"public int read()"

and then the javadoc indicates return value as:

"The character read, as an integer in the range 0 to 65535
(0x00-0xffff), or -1 if the end of the stream has been reached"

The only reason I have come up with is that the class wants to
indicate end-of-stream with a -1. Incidentally when did the character
(singular) become two bytes?

I am engineer and not a comp.sci so I'd appreciate some patience in
your reply.

Jonathan
 
Reply With Quote
 
 
 
 
Chris Dollin
Guest
Posts: n/a
 
      12-12-2007
JM wrote:

> The only reason I have come up with is that the class wants to
> indicate end-of-stream with a -1. Incidentally when did the character
> (singular) become two bytes?


Java's chars have always been two bytes, so as to store 16-bit
Unicode characters.

(We'll pass quietly over the problems with Unicode now needing more than
16 bits for an unpacked character.)

--
Chris "whistling, but not in the dark" Dollin

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

 
Reply With Quote
 
 
 
 
Mike Schilling
Guest
Posts: n/a
 
      12-12-2007
JM wrote:
> Why do the Java Reader classes (File/Buffered/Stream) etc .read()
> methods return an int not a char?
>
> For example the javadoc for BufferedReader
>
> ... jdk1.6.0_03/docs/api/java/io/BufferedReader.html declares
>
> "public int read()"
>
> and then the javadoc indicates return value as:
>
> "The character read, as an integer in the range 0 to 65535
> (0x00-0xffff), or -1 if the end of the stream has been reached"
>
> The only reason I have come up with is that the class wants to
> indicate end-of-stream with a -1.


That's exactly right. If it returned a char, there would be no
"illegal" value left to indicate EOF.

> Incidentally when did the character
> (singular) become two bytes?


A char in Java is a 16-bit unicode (technically UTF-16) character, not
a byte.



 
Reply With Quote
 
Patricia Shanahan
Guest
Posts: n/a
 
      12-12-2007
JM wrote:
> Why do the Java Reader classes (File/Buffered/Stream) etc .read()
> methods return an int not a char?
>
> For example the javadoc for BufferedReader
>
> ... jdk1.6.0_03/docs/api/java/io/BufferedReader.html declares
>
> "public int read()"
>
> and then the javadoc indicates return value as:
>
> "The character read, as an integer in the range 0 to 65535
> (0x00-0xffff), or -1 if the end of the stream has been reached"
>
> The only reason I have come up with is that the class wants to
> indicate end-of-stream with a -1. Incidentally when did the character
> (singular) become two bytes?


Yes, read returns a wider type than char so that there is a spare value
to represent end-of-stream.

One of the continuing trends in computing has been increasing numbers of
bits to represent a character, from 6 to 7 to 8 to 16... Java char is 16
bits.

Patricia
 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      12-12-2007
JM wrote:
> Why do the Java Reader classes (File/Buffered/Stream) etc .read()
> methods return an int not a char?
>
> For example the javadoc for BufferedReader
>
> .... jdk1.6.0_03/docs/api/java/io/BufferedReader.html declares
>
> "public int read()"
>
> and then the javadoc indicates return value as:
>
> "The character read, as an integer in the range 0 to 65535
> (0x00-0xffff), or -1 if the end of the stream has been reached"
>
> The only reason I have come up with is that the class wants to
> indicate end-of-stream with a -1.


It allows any value in the range of char to be represented as a positive
value. -1 is therefore guaranteed to be distinct from any valid value.

If you return a char, you cannot get the value 32768 or larger.

> Incidentally when did the character (singular) become two bytes?


In Java's case, with the invention of Java.

--
Lew
 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      12-12-2007
Lew wrote:
> If you return a char, you cannot get the value 32768 or larger.


Oops, that's wrong. If you return a *short* you cannot get such values.

--
Lew
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      12-12-2007
On Wed, 12 Dec 2007 06:45:36 -0800 (PST), JM <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>Incidentally when did the character
>(singular) become two bytes?


with Java 1.0. C++ is in transition from 8 to 16.

It is now much more common to have a document containing multiple
languages. You can't encode it with only 8-bits per char. So Java
from day one used Unicode, which has 16-bits per char. Unicode-16 was
even big enough to include Chinese. However, Unicode has since been
extended to 32-bits to allow Ugaritic (cuneiform), musical symbols,
Cypriot etc. Java has somewhat bailing wire support for 32-bit
Unicode.

See http://mindprod.com/jgloss/unicode.html

Of course this would make documents on average twice as big as they
used to be. So UTF-8 was invented to make simple documents almost as
compact as if they have been encoded with an 8-bit national encoding.

see http://mindprod.com/jgloss/utf.html

Encoding is about how documents are encoded which is very complicated
and varied to deal with interchange with other computer languages and
legacy applications. Internally they are all stored simply in
Unicode-16.

See http://mindprod.com/jgloss/encoding.html
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
John W. Kennedy
Guest
Posts: n/a
 
      12-13-2007
Patricia Shanahan wrote:
> One of the continuing trends in computing has been increasing numbers of
> bits to represent a character, from 6 to 7 to 8 to 16... Java char is 16
> bits.


Not if you go back far enough, though. The IBM 650 took 14 bits to
represent a character (double bi-quinary), and its market successor, the
707x series, took 10 (double 2-of-5).

--
John W. Kennedy
"The grand art mastered the thudding hammer of Thor
And the heart of our lord Taliessin determined the war."
-- Charles Williams. "Mount Badon"
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      12-13-2007
On Wed, 12 Dec 2007 19:03:54 -0500, "John W. Kennedy"
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone who
said :

>Not if you go back far enough, though. The IBM 650 took 14 bits to
>represent a character (double bi-quinary), and its market successor, the
>707x series, took 10 (double 2-of-5).


In the olden days, each site would invent its own private 6-bit
encoding. I recall sitting with Vern Detwiler (later of MacDonald
Detwiler) looking at this new fangled 7-bit ASCII code and playing
with how we might make UBC's 6-bit code somewhat ASCII compatible for
the new IBM 7044. We had to decide what characters to include. Back
then popular characters included the word mark and record mark.

Later with the IBM 360 we had ENORMOUS 8-bit EBCDIC character sets
that came in a zillion variants. You still constrained yourself mainly
to upper case because printers used a rotating chain or band of
pre-formed characters, and extra chars slowed it down drastically.
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
JM
Guest
Posts: n/a
 
      12-15-2007
On Dec 12, 2:58 pm, "Mike Schilling" <(E-Mail Removed)>
wrote:
> JM wrote:
> > Why do theJavaReader classes (File/Buffered/Stream) etc .read()
> > methods return an int not a char?

>
> > For example the javadoc for BufferedReader

>
> > ... jdk1.6.0_03/docs/api/java/io/BufferedReader.html declares

>
> > "public int read()"

>
> > and then the javadoc indicates return value as:

>
> > "The character read, as an integer in the range 0 to 65535
> > (0x00-0xffff), or -1 if the end of the stream has been reached"

>
> > The only reason I have come up with is that the class wants to
> > indicate end-of-stream with a -1.

>
> That's exactly right. If it returned a char, there would be no
> "illegal" value left to indicate EOF.
>
> > Incidentally when did the character
> > (singular) become two bytes?

>
> A char inJavais a 16-bit unicode (technically UTF-16) character, not
> a byte.


Many thanks for everyone's replied. Now what does not make sense is
when I call BufferedWriter.write(int) only one 8 bit byte gets
written.

BufferedWriter bw = new BufferedWriter(new FileWriter("a"));
bw.write(1);
bw.write(256);
bw.close();
System.exit(0);

Creates a file of length 2 (bytes) containing
01
3F
in file "a" and not 16 bits.

Makes no sense to me.

Jonathan
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
(const char *cp) and (char *p) are consistent type, (const char **cpp) and (char **pp) are not consistent lovecreatesbeauty C Programming 1 05-09-2006 08:01 AM
/usr/bin/ld: ../../dist/lib/libjsdombase_s.a(BlockGrouper.o)(.text+0x98): unresolvable relocation against symbol `std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostre silverburgh.meryl@gmail.com C++ 3 03-09-2006 12:14 AM
char *fred; char * fred; char *fred; any difference? Ben Pfaff C Programming 5 01-17-2004 07:37 PM
The difference between char a[6] and char *p=new char[6] ? wwj C Programming 24 11-07-2003 05:27 PM
the difference between char a[6] and char *p=new char[6] . wwj C++ 7 11-05-2003 12:59 AM



Advertisments