Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Parsing a comma-separated file

Reply
Thread Tools

Parsing a comma-separated file

 
 
Justin To
Guest
Posts: n/a
 
      06-09-2008
Hi, I had a question about parsing just one line at a time beforehand
and now I'm working on a program to parse multiple items on each
line-something like the following:

name, age, gender
Bob, 32, M
Stacy, 14, F
...
...

How do I parse 'Bob', knowing it's the first element on the line, '32'
is the second, 'M' is the last...I've been reading about regular
expressions. Is this the best way to solve this problem? And how exactly
do you use them?

Thanks!!
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
ThoML
Guest
Posts: n/a
 
      06-09-2008
Are you looking for this?
http://fastercsv.rubyforge.org/

Ruby also has the csv standard library.

Regards,
Thomas.
 
Reply With Quote
 
 
 
 
Justin To
Guest
Posts: n/a
 
      06-09-2008
ThoML wrote:
> Are you looking for this?
> http://fastercsv.rubyforge.org/
>
> Ruby also has the csv standard library.
>
> Regards,
> Thomas.


That is great Thomas! Although, I'd like to know how to do it with the
regular expressions as well.

Thanks!
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Avdi Grimm
Guest
Posts: n/a
 
      06-09-2008
On Mon, Jun 9, 2008 at 12:46 PM, Justin To <(E-Mail Removed)> wrote:
> That is great Thomas! Although, I'd like to know how to do it with the
> regular expressions as well.


I'd recommend using Sring#split. In the simplest case you could just
specify line.split(','); no regular expressions needed. If you wanted
you could use a regular expression argument to #split in order to skip
whitespace:

line.split(/\s*,\s*/)

but you could just as easily trim the values after the fact too:

line.split(',').map{|v| v.strip}


Regular expressions are not the best solution for parsing CSV,
especially once you start dealing with quoted values.

--
Avdi

Home: http://avdi.org
Developer Blog: http://avdi.org/devblog/
Twitter: http://twitter.com/avdi
Journal: http://avdi.livejournal.com

 
Reply With Quote
 
Justin To
Guest
Posts: n/a
 
      06-09-2008
So is the fasterCSV the most effective way of parsing a comma-separated
file?

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Avdi Grimm
Guest
Posts: n/a
 
      06-09-2008
On Mon, Jun 9, 2008 at 2:08 PM, Justin To <(E-Mail Removed)> wrote:
> So is the fasterCSV the most effective way of parsing a comma-separated
> file?


It is the fastest and most robust way.

--
Avdi

Home: http://avdi.org
Developer Blog: http://avdi.org/devblog/
Twitter: http://twitter.com/avdi
Journal: http://avdi.livejournal.com

 
Reply With Quote
 
Charles Walden
Guest
Posts: n/a
 
      06-09-2008
My experience (at least a year ago) was that fastercsv was a great way
to go if you had very clean files without errors, odd characters,
etc. Unfortunately, I had files that were a bit more problematic and
so I ended up using a combination of either parsing it myself (split,
regexs. etc) and catching all the errors and handling them or using
the parse_line method in the standard csv library.
On Jun 9, 2008, at 2:09 PM, Avdi Grimm wrote:

> On Mon, Jun 9, 2008 at 2:08 PM, Justin To <(E-Mail Removed)> wrote:
>> So is the fasterCSV the most effective way of parsing a comma-
>> separated
>> file?

>
> It is the fastest and most robust way.
>
> --
> Avdi
>
> Home: http://avdi.org
> Developer Blog: http://avdi.org/devblog/
> Twitter: http://twitter.com/avdi
> Journal: http://avdi.livejournal.com
>



 
Reply With Quote
 
Justin To
Guest
Posts: n/a
 
      06-09-2008
Great guys, thanks for the help!
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Greg Willits
Guest
Posts: n/a
 
      06-09-2008
> name, age, gender
> Bob, 32, M
> Stacy, 14, F
> ...
> How do I parse 'Bob', knowing it's the first element on the line, '32'
> is the second, 'M' is the last...I've been reading about regular
> expressions. Is this the best way to solve this problem? And how exactly
> do you use them?


This doesn't handle all CSV specs, but if you know you have pure data
like you show above, these are the rudimentary steps without the
one-liner tricks, so it should be pretty straight forward to understand
each step. Arranging them as methods to a class would be good.


# read the file into a var

if FileTest::exist?(file_name)
file_lines = IO.readlines(file_name)
end

# normalize line endings so it doesn't matter what they are

file_lines.strip!
file_lines.gsub!(/\r\n/,'\n')
file_lines.gsub!(/\r/,'\n')

# normalize comma delimiters so it doesn't matter
# if you have one, two or one,two or one , two etc...

file_lines.gsub!(/\s*,\s*/, ',')

# split lines into a single array of lines

lines_array = file_lines.split('\n')

# split each line into an array

final_data = []

lines_array.each do |this_line|
final_data << this_line.split(',')
end

# final_data is now an array of arrays that looks like this:

[
['name', 'age', 'gender'],
['Bob', '32', 'M'],
['Stacy', '14', 'F']
]

So, to get Bob, you'd have to know his line number, and index into the
record array:

final_data[1][0] # Bob
final_data[2][3] # F


-- greg willits

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
James Gray
Guest
Posts: n/a
 
      06-09-2008
On Jun 9, 2008, at 4:52 PM, Charles Walden wrote:

> My experience (at least a year ago) was that fastercsv was a great
> way to go if you had very clean files without errors, odd
> characters, etc. Unfortunately, I had files that were a bit more
> problematic and so I ended up using a combination of either parsing
> it myself (split, regexs. etc) and catching all the errors and
> handling them or using the parse_line method in the standard csv
> library.


FasterCSV has a parse_line() method as well, just FYI.

James Edward Gray II

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 09:01 PM
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 08:58 PM
In file parsing, taking the first few characters of a text file after a readfile or streamreader file read... .Net Sports ASP .Net 11 01-17-2006 12:44 AM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments