Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Re: Why is java.io.FileInputStream.readBytes my performance bottleneck

Reply
Thread Tools

Re: Why is java.io.FileInputStream.readBytes my performance bottleneck

 
 
Roedy Green
Guest
Posts: n/a
 
      07-21-2003
On 21 Jul 2003 11:16:09 -0700, http://www.velocityreviews.com/forums/(E-Mail Removed) (Harald Kirsch) wrote
or quoted :

>java.io.FileInputStream.readBytes
>seems to be the bottleneck


try a BufferedFileInputStream.
see http://mindprod.com/fileio.html for sample code.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
 
Reply With Quote
 
 
 
 
Harald Kirsch
Guest
Posts: n/a
 
      07-22-2003
Roedy Green <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>. ..
> On 21 Jul 2003 11:16:09 -0700, (E-Mail Removed) (Harald Kirsch) wrote
> or quoted :
>
> >java.io.FileInputStream.readBytes
> >seems to be the bottleneck

>
> try a BufferedFileInputStream.
> see http://mindprod.com/fileio.html for sample code.


Well, yes and no. Actually what I am working with in the end
is a Reader. I used BufferedFileInputStream and it changed nothing.
Then I put a BufferedReader in the chain and the speedup is nice,
but still not convincing. Funny enough, java.io.FileInputStream.readBytes
is still blamed by -Xrunhprof:cpu=samples to consume most of the
processor time.

To investigate it a bit, I wrote this dummy test program:


import java.io.*;

public class Javabug {
public static void main(String[] argv)
throws java.io.IOException
{
InputStream is = System.in;
BufferedInputStream bis = new BufferedInputStream(is);
InputStreamReader isr = new InputStreamReader(bis);
BufferedReader br = new BufferedReader(isr);

// a reader with the stream unbuffered
BufferedReader xbr = new BufferedReader(new InputStreamReader(is));

int ch;
int count = 0;
while( -1!=(ch=br.read()) ) count += 1;
}
}

By changing the last line, I played with different types of input
and measured throughput with this command line:

cpipe -vw </dev/zero |java Javabug

cpipe (for counting pipe, see freshmeat.net) measures how long it takes
to write output and prints statistics to stderr as shown below:

1) reading from InputStreamReader isr (via BufferedInputStream)
out: 34.307ms at 3.6MB/s ( 3.7MB/s avg) 100.0MB

This means: after writing out 100MB the average output rate was
3.7MB/s. (While the last buffer of 128k took 34.307ms equiv of 3.6MB/s).

2) reading from BufferedReader br:
out: 4.698ms at 26.6MB/s ( 25.2MB/s avg) 100.0MB

3) reading from BufferedReader xbr (stream not buffered)
out: 4.626ms at 27.0MB/s ( 25.3MB/s avg) 100.0MB


This shows that buffering the reader helps, but buffering the input
stream does not seem to help when using a reader. Finally, for a really
unfair comparison, look at this:

% cpipe -vw </dev/zero |cat >/dev/null
out: 0.233ms at 536.5MB/s ( 524.4MB/s avg) 100.0MB

Of course 'cat' does not have to deal with character encoding.
Well then, lets compare with buffered reading right
from System.in. I understand no character encoding happens then, right?

4) reading from bis:
out: 3.666ms at 34.1MB/s ( 32.0MB/s avg) 100.0MB

Compared with 'cat' this is still very bad, i.e. a factor of 16 slower.

Nevertheless, I will change from the 1.4.2-beta version to the real thing
and see what happens.

Harald.
 
Reply With Quote
 
 
 
 
dhek bhun kho
Guest
Posts: n/a
 
      07-22-2003
(E-Mail Removed) (Harald Kirsch), Tue, 22 Jul 2003 03:38:20 -0700:

> Roedy Green <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>. ..
>> On 21 Jul 2003 11:16:09 -0700, (E-Mail Removed) (Harald Kirsch) wrote
>> or quoted :


> To investigate it a bit, I wrote this dummy test program:
>
>
> import java.io.*;
>
> public class Javabug {
> public static void main(String[] argv)
> throws java.io.IOException
> {
> InputStream is = System.in;
> BufferedInputStream bis = new BufferedInputStream(is);
> InputStreamReader isr = new InputStreamReader(bis);
> BufferedReader br = new BufferedReader(isr);
>
> // a reader with the stream unbuffered
> BufferedReader xbr = new BufferedReader(new InputStreamReader(is));
>
> int ch;
> int count = 0;
> while( -1!=(ch=br.read()) ) count += 1;
> }
> }
>
> By changing the last line, I played with different types of input
> and measured throughput with this command line:
>
> cpipe -vw </dev/zero |java Javabug
>
> cpipe (for counting pipe, see freshmeat.net) measures how long it takes
> to write output and prints statistics to stderr as shown below:
>
> 1) reading from InputStreamReader isr (via BufferedInputStream)
> out: 34.307ms at 3.6MB/s ( 3.7MB/s avg) 100.0MB
>
> This means: after writing out 100MB the average output rate was
> 3.7MB/s. (While the last buffer of 128k took 34.307ms equiv of 3.6MB/s).
>
> 2) reading from BufferedReader br:
> out: 4.698ms at 26.6MB/s ( 25.2MB/s avg) 100.0MB
>
> 3) reading from BufferedReader xbr (stream not buffered)
> out: 4.626ms at 27.0MB/s ( 25.3MB/s avg) 100.0MB
>
>
> This shows that buffering the reader helps, but buffering the input
> stream does not seem to help when using a reader. Finally, for a really
> unfair comparison, look at this:
>


[---snip--]

Is this test program valid? System.in usually blocks until a newline is
entered (that's what I noticed); did you put the terminal into raw mode
before executing the test?

Greets
Bhun.

 
Reply With Quote
 
Gordon Beaton
Guest
Posts: n/a
 
      07-22-2003
On Tue, 22 Jul 2003 15:42:35 GMT, dhek bhun kho wrote:
> Is this test program valid? System.in usually blocks until a newline
> is entered (that's what I noticed); did you put the terminal into
> raw mode before executing the test?


System.in doesn't wait for newline, it's the terminal (or console, or
whatever) that waits before passing any input to the process.

If the process' stdin isn't connected to a terminal (such as in his
example), there is no line-based buffering involved. The terminal mode
has no bearing on this case.

/gordon

--
[ do not send me private copies of your followups ]
g o r d o n . b e a t o n @ e r i c s s o n . c o m
 
Reply With Quote
 
Marco Schmidt
Guest
Posts: n/a
 
      07-23-2003
Harald Kirsch:

> InputStream is = System.in;
> BufferedInputStream bis = new BufferedInputStream(is);


Try FileReader reader = new FileReader(FileDescriptor.in); instead of
System.in (another version of getting standard input). IIRC I got a
speed-up from that once. Depending on operating system, Java version
and implementation, that may be different for you, though.

Regards,
Marco
--
Please reply in the newsgroup, not by email!
Java programming tips: http://jiu.sourceforge.net/javatips.html
Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
 
Reply With Quote
 
dhek bhun kho
Guest
Posts: n/a
 
      07-23-2003
Gordon Beaton <(E-Mail Removed)>, Tue, 22 Jul 2003 16:43:03 +0000:

> On Tue, 22 Jul 2003 15:42:35 GMT, dhek bhun kho wrote:
>> Is this test program valid? System.in usually blocks until a newline
>> is entered (that's what I noticed); did you put the terminal into
>> raw mode before executing the test?

> System.in doesn't wait for newline, it's the terminal (or console, or
> whatever) that waits before passing any input to the process.
>
> If the process' stdin isn't connected to a terminal (such as in his
> example), there is no line-based buffering involved. The terminal mode
> has no bearing on this case.
>
> /gordon


OOps. Thanks. Learnt something new.


 
Reply With Quote
 
Steve Horsley
Guest
Posts: n/a
 
      07-23-2003
On Wed, 23 Jul 2003 12:11:44 -0700, Harald Kirsch wrote:

> (E-Mail Removed) (Harald Kirsch) wrote:
>> % cpipe -vw </dev/zero |cat >/dev/null
>> out: 0.233ms at 536.5MB/s ( 524.4MB/s avg) 100.0MB
>>
>> Of course 'cat' does not have to deal with character encoding.
>> Well then, lets compare with buffered reading right
>> from System.in. I understand no character encoding happens then, right?
>>
>> 4) reading from bis:
>> out: 3.666ms at 34.1MB/s ( 32.0MB/s avg) 100.0MB
>>
>> Compared with 'cat' this is still very bad, i.e. a factor of 16 slower.

>
> Nobody pointed out that 'cat' is really the wrong thing to compare to.
> But now I have the real two things to compare against each other:
>
> A) JAVA
> import java.io.InputStream;
> public class Jbug {
> public static void main(String[] argv) throws java.io.IOException {
> int count = 0;
> while( -1!=(System.in.read()) ) count += 1;
> }
> }
>
> B) plain old C
> #include <stdio.h>
> int
> main(int argc, char **argv) {
> int count = 0;
> while( EOF!=getchar() ) count += 1;
> return 0;
> }
>
>
> Now the surprising part:
>
> JAVA:
> % head -c `expr 1024 \* 1024 \* 400` /dev/zero |cpipe -vw -b 1024 | java Jbug
> out: 29.227ms at 34.2MB/s ( 33.9MB/s avg) 400.0MB
>
> plain old C after compilation with "cc -O2 -W -Wall -ansi Jbug.c -o Jbug":
> % head -c `expr 1024 \* 1024 \* 400` /dev/zero |cpipe -vw -b 1024|./Jbug
> out: 34.247ms at 29.2MB/s ( 27.7MB/s avg) 400.0MB
>
> Zap ... Ouuch! Java is 22% *faster* here. I am impressed/puzzled.
>
> Harald.


I guess java has the better optimisation in its compiler. I also guess
that a lot of the time taken up by the java version is VM startup time.
If I'm right, then I predict 2 things:

* Java will look even better against larger files.

* Java will slow down if you bother to print count at the end, because
counting can no longer be optimised away.

Steve
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM
Access to remote application and bottleneck =?Utf-8?B?bWF0dmRs?= ASP .Net 0 07-10-2005 06:08 AM
Nio performance bottleneck JLM Java 2 11-25-2004 01:18 PM
Performance Bottleneck in ASP.NET Glenn ASP .Net 2 01-08-2004 03:04 AM
Re: Why is java.io.FileInputStream.readBytes my performance bottleneck David Zimmerman Java 1 07-22-2003 10:08 AM



Advertisments