Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Parsing excel CVS data on a mac OSX to extract blocks of cells

Reply
Thread Tools

Parsing excel CVS data on a mac OSX to extract blocks of cells

 
 
anne001
Guest
Posts: n/a
 
      11-21-2005
I would like to parse some excel CVS data which has a repetitive block
pattern

"Experiment ID: 1",,,,,,,,,,,,
"Subject ID: 1013938829432171e868c340.
Trial,stimulus,time,type,field1,field2,text_respon se,Abs. time of
response,,,,,
26,undefined,14828,KEY,RETURN,UNUSED,DCS,Sat Oct 15 17:48:04 GMT-0400
2005,,,,,
23,undefined,15078,KEY,RETURN,UNUSED,244,Sat Oct 15 17:48:19 GMT-0400
2005,,,,,
7,nixontrialleft copy.pct [TAG: 1],5953,KEY,1,UNUSED,,Sat Oct 15
17:49:24 GMT-0400 2005,,,,,
8,nixontrialfront copy.pct [TAG: 3],6250,KEY,3,UNUSED,,Sat Oct 15
17:49:31 GMT-0400 2005,,,,,
9,nixontrialright copy.pct [TAG: 2],2469,KEY,2,UNUSED,,Sat Oct 15
17:49:34 GMT-0400 2005,,,,,
#####
more data
######
,,,,,,,,,,4374.347222,,
,,,,,,,,,,,,1.00
,,,,,,,,,,,,0.93
### and a new block starts
"Experiment ID: 3",,,,,,,,,,,,0.92
....

Question 1:
------------------
Arr = IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"))
has a length of 1, why?
I noticed that the puts has ^M everywhere
Arr =
IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"),sep_string="^M")
seems to split the array into lines.
Arr = Arr.split(",")
gives me a message, private method `split' called
puts Arr.length gets me no response. I expected split to take the 1D
array and transform it into a 2D array.

What is the best way to get the excel CVS data into a Ruby 2D array for
further analysis?

Question 2:
-----------------
Once I have a 2D array, what is the best way to find the index of the
key word Trial (which starts the data I want).

In matlab, I would do something like,
Find the index pair (I,J) at which Array = Trial
If J==1
Extract the array starting at row I+1 to I+82, for columns [2, 3, and
5]
and I would love some pointers at Ruby structures, methods, etc I need
to look at.
- First ruby code -- Thanks for your help

 
Reply With Quote
 
 
 
 
anne001
Guest
Posts: n/a
 
      11-21-2005
PS: I have found a tutorial which should help me through part 1.
It extracts columns from a CVS file, and working my way through it, I
should figure out how to use split.

http://www.linuxdevcenter.com/pub/a/...sv.html?page=1

I would love some pointers to question 2.

 
Reply With Quote
 
 
 
 
Ara.T.Howard
Guest
Posts: n/a
 
      11-21-2005
On Mon, 21 Nov 2005, anne001 wrote:

> Question 1:


require "csv"

path = File::expand_path "~/Desktop/FaceRetest.cvs"

table = [] # 2d array
CSV:pen(path, 'r'){|row| table << row}


that ought to get you going.


-a
--
================================================== =============================
| ara [dot] t [dot] howard [at] gmail [dot] com
| all happiness comes from the desire for others to be happy. all misery
| comes from the desire for oneself to be happy.
| -- bodhicaryavatara
================================================== =============================

 
Reply With Quote
 
James Edward Gray II
Guest
Posts: n/a
 
      11-21-2005
On Nov 20, 2005, at 7:37 PM, anne001 wrote:

> I would like to parse some excel CVS data which has a repetitive block
> pattern
>
> "Experiment ID: 1",,,,,,,,,,,,
> "Subject ID: 1013938829432171e868c340.
> Trial,stimulus,time,type,field1,field2,text_respon se,Abs. time of
> response,,,,,
> 26,undefined,14828,KEY,RETURN,UNUSED,DCS,Sat Oct 15 17:48:04 GMT-0400
> 2005,,,,,
> 23,undefined,15078,KEY,RETURN,UNUSED,244,Sat Oct 15 17:48:19 GMT-0400
> 2005,,,,,
> 7,nixontrialleft copy.pct [TAG: 1],5953,KEY,1,UNUSED,,Sat Oct 15
> 17:49:24 GMT-0400 2005,,,,,
> 8,nixontrialfront copy.pct [TAG: 3],6250,KEY,3,UNUSED,,Sat Oct 15
> 17:49:31 GMT-0400 2005,,,,,
> 9,nixontrialright copy.pct [TAG: 2],2469,KEY,2,UNUSED,,Sat Oct 15
> 17:49:34 GMT-0400 2005,,,,,
> #####
> more data
> ######
> ,,,,,,,,,,4374.347222,,
> ,,,,,,,,,,,,1.00
> ,,,,,,,,,,,,0.93
> ### and a new block starts
> "Experiment ID: 3",,,,,,,,,,,,0.92
> ....
>
> Question 1:
> ------------------
> Arr = IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"))
> has a length of 1, why?


Well, if you're on Windows, Ruby is looking for a \015\012 sequence
to end the line. On Unix it would be looking for a \012...

> I noticed that the puts has ^M everywhere


^M (control-M) is just \015. So it's not seeing the line endings.
Some OSes actually used this line ending, like Mac OS 9 and lower,
but it's pretty rare.

> Arr =
> IO.readlines(File.expand_path("~/Desktop/
> FaceRetest.cvs"),sep_string="^M")


That's a fine fix. You should be able to replace sep_string="^M"
with just "\r", I think.

> seems to split the array into lines.
> Arr = Arr.split(",")
> gives me a message, private method `split' called


First, don't start Ruby variable names with a capital letter. This
isn't your problem here, but it's still not a habit to get into. A
capital variable is a constant in Ruby.

The real problem here is that Arr is an Array, and you are calling a
String function on it, split(). Try:

Arr.first.split(",")

# ... or ...

Arr.map { |row| row.split(",") }

But it's better to use a real parser as Ara suggested.

> puts Arr.length gets me no response. I expected split to take the 1D
> array and transform it into a 2D array.
>
> What is the best way to get the excel CVS data into a Ruby 2D array
> for
> further analysis?


Try:

require "csv"
arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

It gives two-dimensional arrays:

Neo:~/Desktop$ cat data.csv
1,2,3
4,,5
Neo:~/Desktop$ ruby -r pp -r csv -e 'pp CSV.read("data.csv")'
[["1", "2", "3"], ["4", nil, "5"]]

> Question 2:
> -----------------
> Once I have a 2D array, what is the best way to find the index of the
> key word Trial (which starts the data I want).


Hmm, what about something like:

in_section = false

csv.each do |row|
if row.first == "Trial"
in_section = true
next
elsif in_section
# process row here...
end
end

Hope that gets you going.

James Edward Gray II


 
Reply With Quote
 
Gregory Brown
Guest
Posts: n/a
 
      11-21-2005
On 11/20/05, anne001 <(E-Mail Removed)> wrote:
> What is the best way to get the excel CVS data into a Ruby 2D array for
> further analysis?


This might be overkill and more useful when it is more complete, but
Ruby Reports has basic support for stuff like this, via DataSets.
---
require "rubygems"
require "ruportlib"

my_data =3D Report:ataSet.load("foo.csv")
---
this will create a DataSet filled with DataRows which can be indexed
ordinally or by field name. (The first row of the CSV is used to
define field names)

example:
---
my_data.each do |row|
puts row["name"]
end
---
would print the value of the "name" column if it existed.

both DataSet and DataRow are Enumerable, which might make finding your
data easier.

if you want to try this approach,

sudo gem install ruport

should get you the latest version

The rather sparse documentation is housed at:
http://ruport.rubyforge.org/docs/

You'd want to look at (and inspect the source on) Report:ataSet and
Report:ataRow

HTH,
Greg


 
Reply With Quote
 
Gregory Brown
Guest
Posts: n/a
 
      11-21-2005
On 11/21/05, Gregory Brown <(E-Mail Removed)> wrote:
> On 11/20/05, anne001 <(E-Mail Removed)> wrote:
> > What is the best way to get the excel CVS data into a Ruby 2D array for
> > further analysis?

>
> This might be overkill and more useful when it is more complete, but
> Ruby Reports has basic support for stuff like this, via DataSets.


As a side not, you do not need to install DBI or any database related
dependencies to use this set of features.


 
Reply With Quote
 
anne001
Guest
Posts: n/a
 
      11-21-2005
Thank you so much for your help
The data was collected by another lab. I don't know what they use. I
saved their excel file data into a csv file. I don't know if there is a
way of telling excel to do a better job. I am on panther, with
office 2004 for mac.

require "csv"
arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

I get an error:
/usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)

Ara's formulation does the same thing.

I think there is a way to replace special characters in emacs but what
I tried so far does not work.

I thought I could use readlines and a puts and then copy the ruby
output, which would not longer have the ^M, but somehow, ruby gets
mixed up, and suddenly stop parsing the lines, so I can't relie on ruby
readline either. It sounds like I need to figure out how to format the
input file better. And then I will be able to use your suggestions.

thank you

 
Reply With Quote
 
James Edward Gray II
Guest
Posts: n/a
 
      11-21-2005
On Nov 21, 2005, at 6:47 AM, anne001 wrote:

> Thank you so much for your help
> The data was collected by another lab. I don't know what they use. I
> saved their excel file data into a csv file. I don't know if there
> is a
> way of telling excel to do a better job. I am on panther, with
> office 2004 for mac.


Excel might be legacy enough that it's still using \r as a return
here, for the old Mac OS. Try this:

ruby -p -i.bak -e 'sub("\r", "\n")' your_csv_file_here.csv

> require "csv"
> arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))
>
> I get an error:
> /usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
> CSV::IllegalFormatError (CSV::IllegalFormatError)
>
> Ara's formulation does the same thing.


I think this is a line ending issue. The CSV format does not allow a
naked \r outside of a quoted field. Try my fix above and see if it
gets you going, or you can switch the line ending for CSV's readlines():

require "csv"
arr = CSV.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"), "\r")

If you're still having trouble and it's not a privacy concern, you
are welcome to email me the CSV file off list, and I will try to
figure out how to read it.

James Edward Gray II



 
Reply With Quote
 
Gene Tani
Guest
Posts: n/a
 
      11-21-2005

anne001 wrote:
> Thank you so much for your help
> The data was collected by another lab. I don't know what they use. I
> saved their excel file data into a csv file. I don't know if there is a
> way of telling excel to do a better job. I am on panther, with
> office 2004 for mac.
>
> require "csv"
> arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))
>
> I get an error:
> /usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
> CSV::IllegalFormatError (CSV::IllegalFormatError)
>
> Ara's formulation does the same thing.
>
> I think there is a way to replace special characters in emacs but what
> I tried so far does not work.
>
> I thought I could use readlines and a puts and then copy the ruby
> output, which would not longer have the ^M, but somehow, ruby gets
> mixed up, and suddenly stop parsing the lines, so I can't relie on ruby
> readline either. It sounds like I need to figure out how to format the
> input file better. And then I will be able to use your suggestions.
>
> thank you


There's a few ways to handle converting newlines between O/S:

- require 'ptools'; File#nl_convert here:
http://ruby-miscutils.sourceforge.net/ptools.html

- specify $/ input record separator and $\ output separators for
IO#gets, readline, each_line etc.

- Komodo has a Code / Clean Line Endings which will convert to the O/S
format you're
running it on

but there's no automatic conversion like python
open("filename.ext","U"), I think

 
Reply With Quote
 
Ara.T.Howard
Guest
Posts: n/a
 
      11-21-2005
On Mon, 21 Nov 2005, anne001 wrote:

> Thank you so much for your help
> The data was collected by another lab. I don't know what they use. I
> saved their excel file data into a csv file. I don't know if there is a
> way of telling excel to do a better job. I am on panther, with
> office 2004 for mac.
>
> require "csv"
> arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))
>
> I get an error:
> /usr/local/lib/ruby/1.8/csv.rb:607:in `get_row':
> CSV::IllegalFormatError (CSV::IllegalFormatError)
>
> Ara's formulation does the same thing.
>
> I think there is a way to replace special characters in emacs but what
> I tried so far does not work.
>
> I thought I could use readlines and a puts and then copy the ruby
> output, which would not longer have the ^M, but somehow, ruby gets
> mixed up, and suddenly stop parsing the lines, so I can't relie on ruby
> readline either. It sounds like I need to figure out how to format the
> input file better. And then I will be able to use your suggestions.


run dos2unix on it. if it still fails you'll need see why the file is badly
formatted.

regards.


-a
--
================================================== =============================
| ara [dot] t [dot] howard [at] gmail [dot] com
| all happiness comes from the desire for others to be happy. all misery
| comes from the desire for oneself to be happy.
| -- bodhicaryavatara
================================================== =============================

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Excel Parsing : extract values of cells with formulae Geeta Sachdeva Ruby 4 10-16-2007 09:21 PM
Eclipse and CVS, Eclipse wipes out CVS directories xyzzy12@hotmail.com Java 1 08-31-2007 10:19 AM
Eclipse 3.2 and CVS - where and how to get and install CVS? ddog Java 3 08-04-2007 06:41 PM
procs/blocks - blocks with procs, blocks with blocks? matt Ruby 1 08-06-2004 01:33 AM



Advertisments