![]() |
Do you know how Java read character value greater than 128/255?
I have BufferedReader bufferedReader = new BufferedReader(new FileReader(inputfile_name)); int c; while ((c = bufferedReader.read()) > -1 ) { if (c > (int)128) { System.err.println( (char)c + " " + c + " " + Integer.toOctalString(c) + " " + Integer.toHexString(c) ); } } bufferedReader.close(); This is fine, I got print all characters which ASCII value greater than 128. Now I do the same in C if ((fp = fopen("inputfile_name", "r")) == NULL) { fprintf(stderr, "Can't open %s\n", argv[1]); exit(2); } int c; while ((c = getc(fp)) != EOF) { if (c > 128) { printf("%c %d %o %x\n", c, c, c, c); } } fclose(fp); But in C I don't get print any character ASCII value greater than 128 by read the same file. I just wonder why, how do Java read those character ASCII greater than 128? |
Re: Do you know how Java read character value greater than 128/255?
RC wrote:
> int c; > while ((c = bufferedReader.read()) > -1 ) { > if (c > (int)128) { 128 is already an int, so casting it to int has no effect. > System.err.println( > (char)c + " " + > c + " " + > Integer.toOctalString(c) + " " + > Integer.toHexString(c) > ); > } > } > bufferedReader.close(); > > This is fine, I got print all characters which ASCII value greater than > 128. > > Now I do the same in C > > if ((fp = fopen("inputfile_name", "r")) == NULL) { > fprintf(stderr, "Can't open %s\n", argv[1]); > exit(2); > } > int c; > while ((c = getc(fp)) != EOF) { The C function getc() returns a byte-scale value, not a 16-bit value as does Java. > if (c > 128) { > printf("%c %d %o %x\n", c, c, c, c); > } > } > fclose(fp); > > But in C I don't get print any character ASCII value greater than 128 by > read the same file. > I just wonder why, how do Java read those character ASCII greater > than 128? Java is likely not reading ASCII but UTF-8. Have you tried the Java program with the InputStreamReader encoding set to "US-ASCII"? For a fuller answer one would need to know the contents of the file. Check out the API docs for java.io.InputStreamReader and java.nio.charset.Charset. - Lew |
Re: Do you know how Java read character value greater than 128/255?
"RC" <raymond.chui@nospam.noaa.gov> wrote in message news:elp6m5$riu$1@news.nems.noaa.gov... > > But in C I don't get print any character ASCII value greater than 128 by > read the same file. > > I just wonder why, how do Java read those character ASCII greater > than 128? I think it's basically because C uses ASCII internally, while Java uses a modified version of UTF-16 internally. - Oliver |
Re: Do you know how Java read character value greater than 128/255?
"Oliver Wong" <owong@castortech.com> wrote in message news:BFXfh.75354$aJ6.667154@wagner.videotron.net.. . > > "RC" <raymond.chui@nospam.noaa.gov> wrote in message > news:elp6m5$riu$1@news.nems.noaa.gov... >> >> But in C I don't get print any character ASCII value greater than 128 by >> read the same file. >> >> I just wonder why, how do Java read those character ASCII greater >> than 128? > > I think it's basically because C uses ASCII internally, while Java uses > a modified version of UTF-16 internally. It's because the C code shown was reading bytes, while the Java code shown was reading characters. Java that reads bytes, e.g. InputStream strm; int b = strm.read(); would never see anything outside the range [-128..127], while C that reads "wide" characters, e.g. wint_t c = getwc(stdin); can see characters outside that range. |
Re: Do you know how Java read character value greater than 128/255?
These two bits of code do not do the same thing. The java code has the
opportunity to use the file encoding, including multi-byte schemes (e.g. UTF8) to re-map bytes in the file stream to characters represented as UTF16 code points. The C code should just be consuming bytes and retuning them as unsigned chars. Question: Do both of them read the same number of characters from the stream? Question: What does java think your default file encoding and code page is? You can force it to read US-ASCII or LATIN 1 and run again. On Wed, 13 Dec 2006, RC wrote: > > I have > > BufferedReader bufferedReader = > new BufferedReader(new FileReader(inputfile_name)); > > int c; > while ((c = bufferedReader.read()) > -1 ) { > if (c > (int)128) { > System.err.println( > (char)c + " " + > c + " " + > Integer.toOctalString(c) + " " + > Integer.toHexString(c) > ); > } > } > bufferedReader.close(); > > This is fine, I got print all characters which ASCII value greater than > 128. > > Now I do the same in C > > if ((fp = fopen("inputfile_name", "r")) == NULL) { > fprintf(stderr, "Can't open %s\n", argv[1]); > exit(2); > } > int c; > while ((c = getc(fp)) != EOF) { > if (c > 128) { > printf("%c %d %o %x\n", c, c, c, c); > } > } > fclose(fp); > > But in C I don't get print any character ASCII value greater than 128 by > read the same file. > > I just wonder why, how do Java read those character ASCII greater > than 128? > > > |
Re: Do you know how Java read character value greater than 128/255?
"Mike Schilling" <mscottschilling@hotmail.com> wrote in message news:53Yfh.30140$wP1.8682@newssvr14.news.prodigy.n et... > > "Oliver Wong" <owong@castortech.com> wrote in message > news:BFXfh.75354$aJ6.667154@wagner.videotron.net.. . >> >> "RC" <raymond.chui@nospam.noaa.gov> wrote in message >> news:elp6m5$riu$1@news.nems.noaa.gov... >>> >>> But in C I don't get print any character ASCII value greater than 128 by >>> read the same file. >>> >>> I just wonder why, how do Java read those character ASCII greater >>> than 128? >> >> I think it's basically because C uses ASCII internally, while Java >> uses a modified version of UTF-16 internally. > > It's because the C code shown was reading bytes, while the Java code shown > was reading characters. Java that reads bytes, e.g. > > InputStream strm; > > int b = strm.read(); > > would never see anything outside the range [-128..127], while C that reads > "wide" characters, e.g. > > wint_t c = getwc(stdin); > > can see characters outside that range. I was referring to the language-built-in datatypes known as "char" in C and "char" in Java. Both languages seem to assume that there is a finite number of characters that will ever used in computing (256 in the case of C, 65536 in the case of Java), and when they were shown wrong, libraries needed to be added to support the extra characters. The OQ (Original Question) was informally phrased (e.g. contrasting C's printing versus Java's reading -- I would further argue that Java doesn't "read" characters at all in this scenario, but instead reads bytes, and then does some behind the scenes conversions to characters), so I was sort of guessing at what the OP was really asking. - Oliver |
Re: Do you know how Java read character value greater than 128/255?
"Oliver Wong" <owong@castortech.com> wrote in message news:XqZfh.77375$aJ6.682653@wagner.videotron.net.. . > > "Mike Schilling" <mscottschilling@hotmail.com> wrote in message > news:53Yfh.30140$wP1.8682@newssvr14.news.prodigy.n et... >> >> "Oliver Wong" <owong@castortech.com> wrote in message >> news:BFXfh.75354$aJ6.667154@wagner.videotron.net.. . >>> >>> "RC" <raymond.chui@nospam.noaa.gov> wrote in message >>> news:elp6m5$riu$1@news.nems.noaa.gov... >>>> >>>> But in C I don't get print any character ASCII value greater than 128 >>>> by >>>> read the same file. >>>> >>>> I just wonder why, how do Java read those character ASCII greater >>>> than 128? >>> >>> I think it's basically because C uses ASCII internally, while Java >>> uses a modified version of UTF-16 internally. >> >> It's because the C code shown was reading bytes, while the Java code >> shown was reading characters. Java that reads bytes, e.g. >> >> InputStream strm; >> >> int b = strm.read(); >> >> would never see anything outside the range [-128..127], while C that >> reads "wide" characters, e.g. >> >> wint_t c = getwc(stdin); >> >> can see characters outside that range. > > I was referring to the language-built-in datatypes known as "char" in C > and "char" in Java. Both languages seem to assume that there is a finite > number of characters that will ever used in computing (256 in the case of > C, 65536 in the case of Java), and when they were shown wrong, libraries > needed to be added to support the extra characters. Yes, but that's an apples-to-oranges comparison. Java has "byte" and "char" for octets and character-set-members respectively. C has "char" and "wchar_t" for those purposes. The confusion (if any) arises from the fact that C and Java use the same name ("char") for two different things. > > The OQ (Original Question) was informally phrased (e.g. contrasting C's > printing versus Java's reading -- I would further argue that Java doesn't > "read" characters at all in this scenario, but instead reads bytes, and > then does some behind the scenes conversions to characters), Any language (or library) that handles multi-byte character sets has to do the same. > so I was sort of guessing at what the OP was really asking. I was too. |
| All times are GMT. The time now is 03:21 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.