![]() |
Improving performance of code
Hi,
I'm reading a file and doing some operations on it..It is a huge file going in GB's..... The code is working correctly but is very slow....How do i optimise it... My code snipnet is: class Risk { public void compare(String infile) throws IOException { cnt=0; for(i=0;i<qid.size();i++) { no=0; fr=new FileReader(infile); br=new BufferedReader(fr); while((str=br.readLine())!=null) { no++; if((str.startsWith("$"))||(str.startsWith("-CONT-"))) continue; else { s2=str.substring(0,10); if(s2.equals(qid.elementAt(i))) { cnt++; start=no; end=no+29; quadarray(infile,start,end); } if((cnt==sc) && (i<qid.size())) { System.out.println("qid="+qid.elementAt(i)); cnt=0; writesubcase1(); } } } fr.close(); } for(i=0;i<tid.size();i++) { no=0; fr=new FileReader(infile); br=new BufferedReader(fr); while((str=br.readLine())!=null) { no++; if((str.startsWith("$"))||(str.startsWith("-CONT-"))) continue; else { s2=str.substring(0,10); if(s2.equals(tid.elementAt(i))) { cnt++; start=no; end=no+29; triaarray(infile,start,end); } if((cnt==sc) && (i<tid.size())) { System.out.println("tid="+tid.elementAt(i)); cnt=0; writesubcase2(); } } } fr.close(); } } public void quadarray(String ifile,int start,int end) throws IOException { try { fr1=new FileReader(ifile); br1=new BufferedReader(fr1); line=0; k=0; x=0; while((str1=br1.readLine())!=null) { line++; if((line>=start) && (line<end)) { if(j==0) quad[j][k]=str1; if((k==3) ||(k==17)||(k==20)) { val1=Double.parseDouble(str1.substring(18,36)); if(val1>qmax[x]) { qmax[x]=val1; x++; } } if((k==5) ||(k==8)||(k==22)||(k==25)) { val2=Double.parseDouble(str1.substring(54,72)); if(val2>qmax[x]) { qmax[x]=val2; x++; } } if((k==11)||(k==14)||(k==28)) { val3=Double.parseDouble(str1.substring(36,54)); if(val3>qmax[x]) { qmax[x]=val3; x++; } } k++; } } } catch (Exception e) { } } public void writesubcase1() throws IOException { x=0; try { fw=new FileWriter("Result.txt",true); for(y=0;y<30;y++) { if((y==0)||(y==1)||(y==2)||(y==4)||(y==6)||(y==7)| |(y==9)|| (y==10)||(y==12)||(y==13)||(y==15) || (y==16)||(y==18)||(y==19)|| (y==21)||(y==23)||(y==24) || (y==26)||(y==27)) { fw.write(quad[0][y]+"\n"); continue; } else { if((y==3)||(y==17)||(y==20)) { s=quad[0][y]; fw.write(s.substring(0,28)+qmax[x]+s.substring(37)+"\n"); x++; continue; } if((y==5)||(y==8)||(y==22)||(y==25)) { s=quad[0][y]; fw.write(s.substring(0,64)+qmax[x]+"\n"); x++; continue; } if((y==11)||(y==14)) { s=quad[0][y]; fw.write(s.substring(0,46)+qmax[x]+s.substring(55)+"\n"); x++; continue; } if(y==28) { s=quad[0][y]; fw.write(s.substring(0,46)+qmax[x]+"\n"); x++; break; } } } fw.close(); } catch(Exception e) {} } public void triaarray(String ifile,int start,int end) throws IOException { try { fr1=new FileReader(ifile); br1=new BufferedReader(fr1); line=0; while((str1=br1.readLine())!=null) { line++; if((line>=start) && (line<end)) { if(j==0) tria[j][k]=str1; if(k==2) { val1=Double.parseDouble(str1.substring(37,54)); if(val1>tmax[0]) tmax[0]=val1; } if(k==5) { val2=Double.parseDouble(str1.substring(19,36)); if(val2>tmax[1]) tmax[1]=val2; } k++; } } } catch(Exception e) {} } public void writesubcase2() { try { fw=new FileWriter("Result.txt",true); for(y=0;y<7;y++) { if((y==0)||(y==1)||(y==3)||(y==4)) { fw.write(tria[0][y]+"\n"); continue; } if(y==2) { s=tria[0][y]; fw.write(s.substring(0,47)+tmax[0]+s.substring(55)+"\n"); continue; } if(y==5) { s=tria[0][y]; fw.write(s.substring(0,29)+tmax[1]+"\n"); break; } } fw.close(); } catch(Exception e) {} } public static void main(String args[]) { Risk r=new Risk(); ipfile=args[0]; try { r.compare(ipfile); } catch (Exception e) { } } } The code takes a lot of time in functions Quadarray and Triaaray. As u can see the de is very simple in these functions but still it takes lot of time... How do i improve it?? |
Re: Improving performance of code
ruds wrote:
> How do i improve it?? 1. I don't see any need to read the files twice. Read them once each, and look for both subcases on each line. This will double your speed. If the output comes out in the wrong order, sort it later. BTW you should be closing 'br' not 'fr' in this loop. 2. The loops on 'y' in the writesubcaseN() and xxxarray() methods seem pretty pointless, as you do different things depending on the value of 'y'. Unroll these loops. You could use a lookup table to give you the various offsets you need, and just loop over the lookup table. Or else use a switch statement instead of all the tests on 'y'. 3. The triarray() and quadarray() methods probably spend most of their time catching up to where you already are in the file. Do you really need to do this? |
Re: Improving performance of code
On Apr 7, 10:03 am, Esmond Pitt <esmond.p...@nospam.bigpond.com>
wrote: > ruds wrote: > > How do i improve it?? > > 1. I don't see any need to read the files twice. Read them once each, > and look for both subcases on each line. This will double your speed. If > the output comes out in the wrong order, sort it later. BTW you should > be closing 'br' not 'fr' in this loop. > > 2. The loops on 'y' in the writesubcaseN() and xxxarray() methods seem > pretty pointless, as you do different things depending on the value of > 'y'. Unroll these loops. You could use a lookup table to give you the > various offsets you need, and just loop over the lookup table. Or else > use a switch statement instead of all the tests on 'y'. > > 3. The triarray() and quadarray() methods probably spend most of their > time catching up to where you already are in the file. Do you really > need to do this? For the 1 & 2 sugestion points i did get those..but for the 3 point I dont have any other way out..atleast from my point of view If u can suggest me smthing better than this ur welcome... I'm a newbie at handling files... Thanx a lot. |
Re: Improving performance of code
"ruds" <rudranee@gmail.com> wrote in message news:1175917594.394914.13880@d57g2000hsg.googlegro ups.com... > > How do i improve it?? Indent it and comment it, for a start. In its current state, it's unreadable. |
Re: Improving performance of code
Mike Schilling wrote:
> > How do i improve it?? > > Indent it and comment it, for a start. In its current state, it's > unreadable. The apparent lack of indentation is a bug in the newsreader you (and I) are using, not a deficiency in the posted source. -- chris |
Re: Improving performance of code
ruds wrote:
> I'm reading a file and doing some operations on it..It is a huge file > going in GB's..... > The code is working correctly but is very slow....How do i optimise > it... I found your code difficult to follow, you could improve it by using case statements instead of lots of if-s, by returning from functions as soon as you know the there is nothing else to do (rather than having the "real" code buried inside several nested if-s), and above all (as Mike has already mentioned) by commenting it properly. So, it's quite possible that I've misread or misunderstood what the code is doing, but if I /haven't/ got it wrong, then I'm puzzled by what quadarray() is doing (and the other similar methods). I /looks/ as if it loops over the entire (huge) input file, keeping count of which line it's looking at (in variable 'k' -- /not/ a good name, unless there's something special in the domain which makes 'k' self-explanatory), and only doing anything with certain numbered lines, 20, 14, 28, and so on. But if that's true, then it doesn't do anything at all with lines > 28, so there is no point in looping over the remaining lines in the input file. If I'm wrong about that (i.e. if you do have to read data from every, or nearly every, line of the big files), and if Daniel's suggestion about reducing the number of passes isn't suitable, then I don't think there's very much you can do to speed it up. If I /had/ to maximise the speed of something like this, then I'd first try to work out what was the fastest I could possibly scan data from the files, by writing a small test program which read in all the data as /binary/ (so there are no conversion costs), and which didn't do anything with the data. That would give me a baseline so I could tell whether there was any reasonable speedup available even in theory (there might not be). If that did turn out to be significantly, /and usefully/, faster than my current code, then I'd consider (i.e do a few experiments with), doing most of the processing as binary. It seems to me that you don't use most of the data on most lines, so if you can scan the data as binary, and only incur the expense of converting the data you actually need into text, then you might be able to save some time. But there again, it might make almost no difference. Only measurement will tell you (or an analytic, numeric, understanding of the performance could do tell you too, but that would require data that I don't have here, and I suspect you don't have either). BTW, this sounds like one of the examples where profiling is unlikely to be very helpful (like many examples of using profiling, in my experience). Profiling is an excellent tool if you have an unexpected hot-spot in your code which you don't realise is there -- it will point out your error with devastating clarity. But that situation's not too likely to happen to competent programmers[*]. The other case where profiling is useful is where you have a reasonable idea of how long things /should/ take, and you can use profiling to attach actual numbers to your mental model of the performance. Oh, another thing that's often worth a try (if you are on Windows or some other OS which supports transparent compression in the filesystem), is to tell the OS to compress the data. If your program is primarily IO bound, rather than CPU bound (which sounds likely in your case -- and it's easy for you to check), then compressing the data will reduce the amount of data which has to be read off-disk, albeit at the expense of more processing, which can sometimes be a useful saving. -- chris [*] but it never hurts to check, even so -- if you have time... |
Re: Improving performance of code
Mike Schilling wrote:
>> Indent it and comment it, for a start. In its current state, it's >> unreadable. Chris Uppal wrote: > The apparent lack of indentation is a bug in the newsreader you (and I) are > using, not a deficiency in the posted source. I'm using Thunderbird. I see the original post's indentation, and that it was done with the TAB character. No doubt the space character would not have caused such difficulties. Even though I can see the indentation, the TAB character makes it so wide as to damage readability. So either way, OP, using TABs to indent Usenets posts is a Bad Thing. -- Lew |
Re: Improving performance of code
Lew wrote:
> Mike Schilling wrote: >>> Indent it and comment it, for a start. In its current state, it's >>> unreadable. > > Chris Uppal wrote: >> The apparent lack of indentation is a bug in the newsreader you (and >> I) are >> using, not a deficiency in the posted source. > > I'm using Thunderbird. I see the original post's indentation, and that > it was done with the TAB character. > > No doubt the space character would not have caused such difficulties. > Even though I can see the indentation, the TAB character makes it so > wide as to damage readability. > > So either way, OP, using TABs to indent Usenets posts is a Bad Thing. > I am not that worried about the indentation, because if I get serious about looking at posted program I copy it into Eclipse and click Source-Format. I do think the first step in a performance campaign should be making sure the code is properly commented, as well as having meaningful identifiers, no arbitrary, unexplained constants etc. The big improvements usually depend on understanding the code, so that data structures and algorithms can be changed. Patricia |
Re: Improving performance of code
"ruds" <rudranee@gmail.com> wrote in news:1175917594.394914.13880
@d57g2000hsg.googlegroups.com: > How do i improve it?? > 1. USE MEANINGFUL VARIABLE NAMES (i.e. more that just a single letter)! 2. Pay attention to horizontal white space -- makes code a LOT easier to read if there are spaces. Use: if ((str.startsWith("$")) || (str.startsWith("-CONT-"))) or if ((str.startsWith("$")) || (str.startsWith("-CONT-"))) instead of if((str.startsWith("$"))||(str.startsWith("-CONT-"))) 3. Declare ALL of your variables before you use them. In quadarray() it appears to me that the variables "j", "quad", "str1", "val1", "qmax", "val2" are used without having been previously declared. Just a few suggestions that will prevent your name being cursed by those who come after you and maintain your code. Cheers! -- --------------------------------------------------------------------- Greg R. Broderick gregb+usenet200612@blackholio.dyndns.org A. Top posters. Q. What is the most annoying thing on Usenet? --------------------------------------------------------------------- |
Re: Improving performance of code
Chris Uppal skrev:
> Mike Schilling wrote: > >>> How do i improve it?? >> Indent it and comment it, for a start. In its current state, it's >> unreadable. > > The apparent lack of indentation is a bug in the newsreader you (and I) are > using, not a deficiency in the posted source. > Thunderbird shows all of the tabs, which should have been replaced by two or maybe three spaces each. It certainly was indented. |
| All times are GMT. The time now is 12:23 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.