"Domenico Discepola" <> writes:
> Hello. I'm trying to parse a text file into a 2-d array using Text::CSV_XS.
> The input file is structured as follows. "Fields" are separated with a
> "\x0d\x0a" (CRLF) and are enclosed in double-quotes. "Records" are
> separated with a "\x0c" (FF). My fields can contain embedded CRLF's hence
> the need for double-quoting. How can I use Text::CSV_XS to solve my
> problem? My code below only outputs the first line in the input file.
> Thanks in advance.
Text::CSV_XS assumes that it's handed a full record at a time, and
expects you to independently figure out where one record ends and the
next one begins.
So you have three choices.
The easiest is to use Text:

SV instead of Text::CSV_XS. This handles
embedded newlines as you'd expect, and in general works quite well.
Unfortunately I've found it's about 6 times slower than Text::CSV_XS.
If you can't afford that kind of slowdown, read on.
The next easiest thing to do is find record boundaries on your own.
In one application I wrote, I found this worked well; the file I had
always had lines ending in a quote followed by a newline, so I just
kept appending lines to a buffer until I found a quote at the end of a
line that wasn't preceded by an escape character, then passed it on to
Text::CSV_XS. This won't work with all data files, so it might not be
for you.
The third option is to take each line, ask Text::CSV_XS to parse it,
and if it fails, append the next line and try again. This should work
with properly formed CSV files, but will behave poorly in the face of
an error; if there's some corruption on the first line, you may not
read anything, since it will keep appending and finding the same
error.
Good luck!
----ScottG.