Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > CSV Parsing algorithms in Java

Reply
Thread Tools

CSV Parsing algorithms in Java

 
 
Jeffrey Spoon
Guest
Posts: n/a
 
      11-03-2006



Hello, has anybody seen well-known/good practice CSV parsing algorithms
in Java? I've been googling about but can't see anything suitable so
far. I'm not interested in using library functions, rather implementing
the algorithm myself (or at least learning how to).

Any pointers appreciated, thanks.



--
Jeffrey Spoon

 
Reply With Quote
 
 
 
 
David Segall
Guest
Posts: n/a
 
      11-03-2006
Jeffrey Spoon <(E-Mail Removed)> wrote:

>
>
>
>Hello, has anybody seen well-known/good practice CSV parsing algorithms
>in Java? I've been googling about but can't see anything suitable so
>far. I'm not interested in using library functions, rather implementing
>the algorithm myself (or at least learning how to).
>
>Any pointers appreciated, thanks.

Roedy Green has assembled some useful information on this topic.
<http://mindprod.com/jgloss/csv.html>
 
Reply With Quote
 
 
 
 
JanTheKing
Guest
Posts: n/a
 
      11-04-2006
Probably you may get some idea from the Apache POI project..

On Nov 3, 9:32 pm, David Segall <(E-Mail Removed)> wrote:
> Jeffrey Spoon <(E-Mail Removed)> wrote:
>
> >Hello, has anybody seen well-known/good practice CSV parsing algorithms
> >in Java? I've been googling about but can't see anything suitable so
> >far. I'm not interested in using library functions, rather implementing
> >the algorithm myself (or at least learning how to).

>
> >Any pointers appreciated, thanks.Roedy Green has assembled some useful information on this topic.

> <http://mindprod.com/jgloss/csv.html>


 
Reply With Quote
 
Davide Consonni
Guest
Posts: n/a
 
      11-04-2006
Jeffrey Spoon wrote:

> Hello, has anybody seen well-known/good practice CSV parsing algorithms
> in Java? I've been googling about but can't see anything suitable so
> far. I'm not interested in using library functions, rather implementing
> the algorithm myself (or at least learning how to).
>
> Any pointers appreciated, thanks.


take a look at my project
http://csvtosql.sourceforge.net

--
Davide Consonni <(E-Mail Removed)> http://csvtosql.sourceforge.net
Linux: basta con le clessidre sullo schermo! -- By Zuse

 
Reply With Quote
 
Jeffrey Spoon
Guest
Posts: n/a
 
      11-04-2006
In message <(E-Mail Removed)>, David Segall
<(E-Mail Removed)> writes
>Jeffrey Spoon <(E-Mail Removed)> wrote:
>
>>
>>
>>
>>Hello, has anybody seen well-known/good practice CSV parsing algorithms
>>in Java? I've been googling about but can't see anything suitable so
>>far. I'm not interested in using library functions, rather implementing
>>the algorithm myself (or at least learning how to).
>>
>>Any pointers appreciated, thanks.

>Roedy Green has assembled some useful information on this topic.
><http://mindprod.com/jgloss/csv.html>



Thanks, I had a look. The reason I'm asking is because I had a graduate
role interview and they asked this as a question, as in to write one. I
didn't know how to anyway, but looking at Roedy's, just the get() method
is 200 hundred lines, am I really expected to know this stuff off by
heart?


Thanks to the others who suggested as well, I'll get around to them.



--
Jeffrey Spoon

 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      11-04-2006
Jeffrey Spoon <(E-Mail Removed)> writes:
>Thanks, I had a look. The reason I'm asking is because I had a graduate
>role interview and they asked this as a question, as in to write one. I
>didn't know how to anyway, but looking at Roedy's, just the get() method
>is 200 hundred lines, am I really expected to know this stuff off by
>heart?


The correct answer would have been:

»There are dozens of different formal languages, all
referred to by the name of "CSV". Some differ only by
minor details, but these are important, when one wants to
write a parser. So, I would like to invite you to join me
in a process to figure out the exact specifications of the
language you want me to parse or - if available - please
give me a language specification«.

After all such questions would have been cleared, I would have
been able to write a parser from scratch if the interviewer
would have the patience to wait for me to finish it. The Java
SE API documentation at hand might be helpful during this.

 
Reply With Quote
 
Jeffrey Spoon
Guest
Posts: n/a
 
      11-04-2006
In message <(E-Mail Removed)-berlin.de>, Stefan Ram
<(E-Mail Removed)-berlin.de> writes
>Jeffrey Spoon <(E-Mail Removed)> writes:
>>Thanks, I had a look. The reason I'm asking is because I had a graduate
>>role interview and they asked this as a question, as in to write one. I
>>didn't know how to anyway, but looking at Roedy's, just the get() method
>>is 200 hundred lines, am I really expected to know this stuff off by
>>heart?

>
> The correct answer would have been:
>
> ›There are dozens of different formal languages, all
> referred to by the name of "CSV". Some differ only by
> minor details, but these are important, when one wants to
> write a parser. So, I would like to invite you to join me
> in a process to figure out the exact specifications of the
> language you want me to parse or - if available - please
> give me a language specification‹.
>
> After all such questions would have been cleared, I would have
> been able to write a parser from scratch if the interviewer
> would have the patience to wait for me to finish it. The Java
> SE API documentation at hand might be helpful during this.
>


So that's a no then?

They did specify that some of the values may contain double quotes.
I had two other questions to do as well, in 30 minutes. One was a fairly
advanced SQL question (for me anyway) and the other was easy enough,
about client/server stuff. They left me to write the answers down with
no references other than the question sheet. Oh, and there were some
other multiple choice questions, but they were fairly straightforward.




--
Jeffrey Spoon

 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      11-04-2006
Jeffrey Spoon <(E-Mail Removed)> writes:
>So that's a no then?
>They did specify that some of the values may contain double quotes.
>I had two other questions to do as well, in 30 minutes.


Assuming that there are only about 10 minutes to write such a
parser on paper without any reference, it is difficult, indeed.

Let me try to see, what I can write in 10 minutes without a
reference

// 2006-11-04T17:48:18+01:00

public class CsvParser
{ private CsvScanner tokenSource;
public CsvParser( final CsvScanner tokenSource )
{ this.tokenSource = tokenSource; }

// 2006-11-04T17:50:09+01:00

public void parseAll()
{ while( tokenSource.isMoreInSource() )parseLine(); }

// 2006-11-04T17:51:26+01:00

public void parseLine()
{ while( tokenSource.isMoreInLine() )parseValue(); }

// 2006-11-04T17:54:43+01:00

public void parseValue()
{ final Token token = tokenSource.getToken();
token.to( new TokenProcessor()
{ public void processNumericStart(){ /* todo */ }
public void processTextStart(){ /* todo */ }
/* here my time limit was reached */

// 2006-11-04T17:58:31+01:00

Sometimes an interviewer might give you an "impossible"
task just to see how you cope with that.

 
Reply With Quote
 
Simon Brooke
Guest
Posts: n/a
 
      11-04-2006
in message <(E-Mail Removed)>, Jeffrey Spoon
('(E-Mail Removed)') wrote:

> In message <(E-Mail Removed)>, David Segall
> <(E-Mail Removed)> writes
>>Jeffrey Spoon <(E-Mail Removed)> wrote:
>>
>>>Hello, has anybody seen well-known/good practice CSV parsing algorithms
>>>in Java? I've been googling about but can't see anything suitable so
>>>far. I'm not interested in using library functions, rather implementing
>>>the algorithm myself (or at least learning how to).
>>>
>>>Any pointers appreciated, thanks.

>>Roedy Green has assembled some useful information on this topic.
>><http://mindprod.com/jgloss/csv.html>

>
> Thanks, I had a look. The reason I'm asking is because I had a graduate
> role interview and they asked this as a question, as in to write one. I
> didn't know how to anyway, but looking at Roedy's, just the get() method
> is 200 hundred lines, am I really expected to know this stuff off by
> heart?
>
> Thanks to the others who suggested as well, I'll get around to them.


Heavens, writing a CSV parser is trivial. It's simply a case of a
StringTokenizer in a for loop:

public ResultClass parse( InputStream in, String separatorChars)
throws IOException
{
ResultClass result = new ResultClass();
BufferedReader buffy =
new BufferedReader( new InputStreamReader( in));

for ( String line = buffy.readLine(); line != null;
line = buffy.readLine)
{
StringTokenizer tok =
new StringTokenizer( line, separatorChars);

while ( tok.hasMoreTokens())
{
// do something with result and tok.nextToken()
}
}
/* consider (and document) whether it's your or the caller's
* responsibility to close the stream; since you were passed the
* stream I suggest it's the caller's */

return result;
}

As to what that ResultClass object should be, if the first line in your CSV
may be column headers and each value in the first row is distinct then
probably what you want is a vector of maps where the keys of the maps are
the corresponding values from the first line; otherwise I'd probably just
return a vector of vectors.

Obviously you may not want to schlurp a whole CSV file into core memory at
one go; it may be better to produce a parser to which you can add
callbacks/listeners for the fields or patterns you are interested in. But
the general pattern is as given.

--
http://www.velocityreviews.com/forums/(E-Mail Removed) (Simon Brooke) http://www.jasmine.org.uk/~simon/
;; Let's have a moment of silence for all those Americans who are stuck
;; in traffic on their way to the gym to ride the stationary bicycle.
;; Rep. Earl Blumenauer (Dem, OR)
 
Reply With Quote
 
Karl Uppiano
Guest
Posts: n/a
 
      11-04-2006

"Simon Brooke" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> in message <(E-Mail Removed)>, Jeffrey Spoon
> ('(E-Mail Removed)') wrote:
>
>> In message <(E-Mail Removed)>, David Segall
>> <(E-Mail Removed)> writes
>>>Jeffrey Spoon <(E-Mail Removed)> wrote:
>>>
>>>>Hello, has anybody seen well-known/good practice CSV parsing algorithms
>>>>in Java? I've been googling about but can't see anything suitable so
>>>>far. I'm not interested in using library functions, rather implementing
>>>>the algorithm myself (or at least learning how to).
>>>>
>>>>Any pointers appreciated, thanks.
>>>Roedy Green has assembled some useful information on this topic.
>>><http://mindprod.com/jgloss/csv.html>

>>
>> Thanks, I had a look. The reason I'm asking is because I had a graduate
>> role interview and they asked this as a question, as in to write one. I
>> didn't know how to anyway, but looking at Roedy's, just the get() method
>> is 200 hundred lines, am I really expected to know this stuff off by
>> heart?
>>
>> Thanks to the others who suggested as well, I'll get around to them.

>
> Heavens, writing a CSV parser is trivial. It's simply a case of a
> StringTokenizer in a for loop:
>
> public ResultClass parse( InputStream in, String separatorChars)
> throws IOException
> {
> ResultClass result = new ResultClass();
> BufferedReader buffy =
> new BufferedReader( new InputStreamReader( in));
>
> for ( String line = buffy.readLine(); line != null;
> line = buffy.readLine)
> {
> StringTokenizer tok =
> new StringTokenizer( line, separatorChars);
>
> while ( tok.hasMoreTokens())
> {
> // do something with result and
> tok.nextToken()
> }
> }
> /* consider (and document) whether it's your or the
> caller's
> * responsibility to close the stream; since you were
> passed the
> * stream I suggest it's the caller's */
>
> return result;
> }
>
> As to what that ResultClass object should be, if the first line in your
> CSV
> may be column headers and each value in the first row is distinct then
> probably what you want is a vector of maps where the keys of the maps are
> the corresponding values from the first line; otherwise I'd probably just
> return a vector of vectors.
>
> Obviously you may not want to schlurp a whole CSV file into core memory at
> one go; it may be better to produce a parser to which you can add
> callbacks/listeners for the fields or patterns you are interested in. But
> the general pattern is as given.
>
> --
> (E-Mail Removed) (Simon Brooke) http://www.jasmine.org.uk/~simon/
> ;; Let's have a moment of silence for all those Americans who are stuck
> ;; in traffic on their way to the gym to ride the stationary bicycle.
> ;; Rep. Earl Blumenauer (Dem, OR)



or this:

String[] columnData = rowData.split("[,]");


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
read and write csv file using csv module jliu66 Python 0 10-19-2007 03:12 PM
How to move data from a CSV file to a JTable, and from a JTable to a CSV file ? Tintin92 Java 1 02-14-2007 06:51 PM
Re: csv writerow creates double spaced excel csv files Skip Montanaro Python 0 02-13-2004 08:50 PM
csv writerow creates double spaced excel csv files Michal Mikolajczyk Python 0 02-13-2004 08:38 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments