Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Best way to parse a csv...... a csv that has CRLF in the fields

Reply
Thread Tools

Best way to parse a csv...... a csv that has CRLF in the fields

 
 
sso
Guest
Posts: n/a
 
      04-24-2009
Any suggestions as to the best way to parse through a csv file that
has carriage returns in some of the fields? Its in an ods file that I
save to csv. I'm lost....
 
Reply With Quote
 
 
 
 
Knute Johnson
Guest
Posts: n/a
 
      04-24-2009
sso wrote:
> Any suggestions as to the best way to parse through a csv file that
> has carriage returns in some of the fields? Its in an ods file that I
> save to csv. I'm lost....


Is the CRLF a delimiter? In any case, you can use the Scanner class to
do that sort of thing.

--

Knute Johnson
email s/nospam/knute2009/

--
Posted via NewsDemon.com - Premium Uncensored Newsgroup Service
------->>>>>>http://www.NewsDemon.com<<<<<<------
Unlimited Access, Anonymous Accounts, Uncensored Broadband Access
 
Reply With Quote
 
 
 
 
Mark Space
Guest
Posts: n/a
 
      04-24-2009
Knute Johnson wrote:
> sso wrote:
>> Any suggestions as to the best way to parse through a csv file that
>> has carriage returns in some of the fields? Its in an ods file that I
>> save to csv. I'm lost....

>
> Is the CRLF a delimiter? In any case, you can use the Scanner class to
> do that sort of thing.
>



I think he's say the CRLF is part of the data, and the program has to
distinguish between LF as part of a field, and LF when it ends a line.

Not really easy with Scanner. I can't think of a good way to do it off
hand...
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      04-24-2009
On Thu, 23 Apr 2009 20:45:09 -0700 (PDT), sso
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>Any suggestions as to the best way to parse through a csv file that
>has carriage returns in some of the fields? Its in an ods file that I
>save to csv. I'm lost....

use my CSVReader class. It has an allowMultilineFields boolean in the
constructor.

See http://mindprod.com/products1.html#CSV

Other possibilities are listed at http://mindprod.com/jgloss/csv.html
--
Roedy Green Canadian Mind Products
http://mindprod.com

"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
~ Charles Darwin
 
Reply With Quote
 
sso
Guest
Posts: n/a
 
      04-24-2009
On Apr 24, 12:03 am, Knute Johnson <(E-Mail Removed)>
wrote:
> sso wrote:
> > Any suggestions as to the best way to parse through a csv file that
> > has carriage returns in some of the fields? Its in an ods file that I
> > save to csv. I'm lost....

>
> Is the CRLF a delimiter? In any case, you can use the Scanner class to
> do that sort of thing.
>
> --
>
> Knute Johnson
> email s/nospam/knute2009/
>
> --
> Posted via NewsDemon.com - Premium Uncensored Newsgroup Service
> ------->>>>>>http://www.NewsDemon.com<<<<<<------
> Unlimited Access, Anonymous Accounts, Uncensored Broadband Access


This is definitely working better. Thanks!

Scanner doesn't seem to like my Chinese characters. | is the delim.
Example:

AI YE
艾葉
Folium Artemisiae Argyi
Wormwood/ MOXA|


 
Reply With Quote
 
sso
Guest
Posts: n/a
 
      04-24-2009
On Apr 24, 12:52*am, "Peter Duniho" <(E-Mail Removed)>
wrote:
> On Thu, 23 Apr 2009 21:30:00 -0700, Mark Space <(E-Mail Removed)> *
> wrote:
>
> > Knute Johnson wrote:
> >> sso wrote:
> >>> Any suggestions as to the best way to parse through a csv file that
> >>> has carriage returns in some of the fields? *Its in an ods file that I
> >>> save to csv. *I'm lost....

>
> >> *Is the CRLF a delimiter? *In any case, you can use the Scanner class *
> >> to do that sort of thing.

>
> > I think he's say the CRLF is part of the data, and the program has to *
> > distinguish between LF as part of a field, and LF when it ends a line.

>
> Which begs the question, how does he differentiate between a CRLF *
> terminating a line of input, and one that's in a field.
>
> The most obvious answer is that the CRLF is quoted. *But whatever the *
> indicator, I'd guess that a suitable regex could distinguish the *
> individual fields without picking up the CRLF as a terminator for the line *
> (you'd have to disable the end-of-line processing for the regex, of *
> course).
>
> > Not really easy with Scanner. *I can't think of a good way to do it off *
> > hand...

>
> I'm not familiar with Scanner, but it looks to me as though you can use a *
> custom regex to tell it how to break apart the input line. *Assuming he *
> can come up with an appropriate regex to do the job, it should be *
> relatively easy to move from that to using Scanner for the actual input *
> processing.
>
> As far as the exact regex goes, well...that'd be for someone else to *
> figure out. *I'm not good enough with regular expressions to come up with *
> that easily, and don't have the time or interest to work it out myself. *
>
> Pete


I could regex it, but there are about 400 records in the file.
Perhaps that would be cumbersome? As far as the LF being a delimiter,
well it is part of the data, but the records always have the same
number of fields. I will try this CSVreader class.
 
Reply With Quote
 
Mark Space
Guest
Posts: n/a
 
      04-24-2009
Peter Duniho wrote:

>
> As far as the exact regex goes, well...that'd be for someone else to
> figure out.



That's what I'm saying. Sure, as long as one can be determined. I
can't. I saw the regex delimiters on Scanner, I just can't come up with
an actual regex to make it work.

I'm at least somewhat interested, because CSV is common and handy.
There are third part libraries (like Roedy's) but it would be nice if I
didn't have to download any extra jar files. However, that may not be
possible.
 
Reply With Quote
 
Mayeul
Guest
Posts: n/a
 
      04-24-2009
sso wrote:
> On Apr 24, 12:52 am, "Peter Duniho" <(E-Mail Removed)>
> wrote:
>> On Thu, 23 Apr 2009 21:30:00 -0700, Mark Space <(E-Mail Removed)>
>> wrote:
>>
>>> Knute Johnson wrote:
>>>> sso wrote:
>>>>> Any suggestions as to the best way to parse through a csv file that
>>>>> has carriage returns in some of the fields? Its in an ods file that I
>>>>> save to csv. I'm lost....
>>>> Is the CRLF a delimiter? In any case, you can use the Scanner class
>>>> to do that sort of thing.
>>> I think he's say the CRLF is part of the data, and the program has to
>>> distinguish between LF as part of a field, and LF when it ends a line.

>> Which begs the question, how does he differentiate between a CRLF
>> terminating a line of input, and one that's in a field.
>>
>> The most obvious answer is that the CRLF is quoted. But whatever the
>> indicator, I'd guess that a suitable regex could distinguish the
>> individual fields without picking up the CRLF as a terminator for the line
>> (you'd have to disable the end-of-line processing for the regex, of
>> course).
>>
>>> Not really easy with Scanner. I can't think of a good way to do it off
>>> hand...

>> I'm not familiar with Scanner, but it looks to me as though you can use a
>> custom regex to tell it how to break apart the input line. Assuming he
>> can come up with an appropriate regex to do the job, it should be
>> relatively easy to move from that to using Scanner for the actual input
>> processing.
>>
>> As far as the exact regex goes, well...that'd be for someone else to
>> figure out. I'm not good enough with regular expressions to come up with
>> that easily, and don't have the time or interest to work it out myself.
>>
>> Pete

>
> I could regex it, but there are about 400 records in the file.
> Perhaps that would be cumbersome? As far as the LF being a delimiter,
> well it is part of the data, but the records always have the same
> number of fields. I will try this CSVreader class.


Obvious question:

Is the last field of each record terminated with a delimiter, or does it
guarantee it does _not_ contain a CRLF?

--
Mayeul
 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      04-24-2009
sso <(E-Mail Removed)> writes:
>Any suggestions as to the best way to parse through a csv file that
>has carriage returns in some of the fields? Its in an ods file that I
>save to csv. I'm lost....


To write a parser, I need a specification of the language
used.

The name CSV is not such a specification, because there are
several different languages in the world that are referred to
by CSV.

Given a specification, writing a parser often is
straightforward (for those having learned how to write
parsers).

(There are some languages, for example, C++, that are
difficult to parse, even with proper education and a proper
specification. But most languages named CSV should be easy
to parse.)

 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      04-24-2009
Mark Space wrote:
> I'm at least somewhat interested, because CSV is common and handy. There
> are third part libraries (like Roedy's) but it would be nice if I didn't
> have to download any extra jar files. However, that may not be possible.


So it's nicer to reinvent the wheel than to use someone else's tried-and-true
solution?

--
Lew
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem in running a basic code in python 3.3.0 that includes HTML file Satabdi Mukherjee Python 1 04-04-2013 07:48 PM
Re: Efficient, built-in way to determine if string has non-ASCIIchars outside ASCII 32-127, CRLF, Tab? Dave Angel Python 6 11-01-2011 11:27 PM
Re: Efficient, built-in way to determine if string has non-ASCIIchars outside ASCII 32-127, CRLF, Tab? Ian Kelly Python 5 11-01-2011 07:25 PM
How to best parse a CSV data file and do a lookup in C? Johnny Google C Programming 19 12-08-2004 10:31 PM
Problem in CRLF in multiline fields Jack Wright ASP .Net 1 04-21-2004 05:00 AM



Advertisments