Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > R and Perl

Reply
Thread Tools

R and Perl

 
 
ccc31807
Guest
Posts: n/a
 
      09-02-2011
Has anyone used R with Perl for statistical programming?

Has anyone used R with Perl to output graphical files?

Is it any more complicated than writing R scripts and calling the R
interpreter using system() or the like?

I'm about to embark on a major project with R, and really, really need
Perl to munge my data files. I would like to automate the entire
thing, but if I can't I can use Perl to generate the input data for R
and manually generate the output files.

Thanks, CC.
 
Reply With Quote
 
 
 
 
azrazer
Guest
Posts: n/a
 
      09-06-2011
Hello,
Le 03/09/2011 00:26, ccc31807 a écrit :
> Has anyone used R with Perl for statistical programming?
> Has anyone used R with Perl to output graphical files?

I think so, too.
>
> Is it any more complicated than writing R scripts and calling the R
> interpreter using system() or the like?

Using system raises no issues in my opinion...
You definitely can write your script and call R using the commandline.

> I'm about to embark on a major project with R, and really, really need
> Perl to munge my data files. I would like to automate the entire
> thing, but if I can't I can use Perl to generate the input data for R
> and manually generate the output files.

Could you be a bit more precise about what you want to do.
AFAIK, from experience, the best thing to do would be to format your
data using Perl without making any modification on your data (i.e. if
you have LONG+LARGE tables of numbers, don't make any mathematics on
them using perl) but just FORMAT them as a well-structured table.

Then do all the filtering, mathematical operations etc... on your
database using R.

(I am not saying that Perl is not suitable for such operations, but i
think it is better to launch your Perl script once, and then work on the
database using R, if it is the software you want to use ! Raw data
usually provides more information than modified data)

Could you be more precise about why you cannot use perl to generate the
input data for R ? --and if so, why calling system() is a problem ?--
> Thanks, CC.


cheers.

 
Reply With Quote
 
 
 
 
ccc31807
Guest
Posts: n/a
 
      09-07-2011
On Sep 6, 3:03*am, azrazer <(E-Mail Removed)> wrote:
> Could you be a bit more precise about what you want to do.


I have multiple data files that I will retrieve from a database query.
These will be on the order of 150K rows, and an indeterminate number
of columns. The columns will include both dates and status codes, and
I will need to build a data structure containing the cumulative count
of status codes over several months, day by day. Then, I need to build
graphical files with line charts.

This is currently done by hand in Excel, and I have been tasked with
automating the process.

Munging the data and getting the cumulative count per status code per
day is a snap in Perl, and while I've generated charts in Perl using
GD::Graph, using R is certainly a lot easier, and besides, I am
motivated to learn R.

> AFAIK, from experience, the best thing to do would be to format your
> data using Perl without making any modification on your data


The raw data needs to be processed. The 'data' that I will use will be
contained in hashes, the keys will be status codes, the sub keys will
be dates, and the values will be integers, sort of like this:

$hash{S}{20110601} => 10
$hash{S}{20110602} => 13
$hash{S}{20110603} => 21
$hash{S}{20110604} => 19
$hash{S}{20110605} => 25
$hash{S}{20110606} => 29
$hash{S}{20110607} => 28

So, I can print out the hash in an R compatible data frame and use it
directly to generate a PDF.

> Could you be more precise about why you cannot use perl to generate the
> input data for R ? --and if so, why calling system() is a problem


I will use Perl to munge the data and produce as output an input file
for R. I want to be able to push a button and have the computer do all
the work.

Thanks for your reply, CC.
 
Reply With Quote
 
azrazer
Guest
Posts: n/a
 
      09-08-2011
Le 08/09/2011 00:26, ccc31807 a écrit :
> On Sep 6, 3:03 am, azrazer<(E-Mail Removed)> wrote:
>> Could you be a bit more precise about what you want to do.

>
> I have multiple data files that I will retrieve from a database query.
> These will be on the order of 150K rows, and an indeterminate number
> of columns. The columns will include both dates and status codes, and
> I will need to build a data structure containing the cumulative count
> of status codes over several months, day by day. Then, I need to build
> graphical files with line charts.

Well yes, this is easily done using R, you just have to aggregate data
(don't you ?). (using aggregate/ddply)
>
> This is currently done by hand in Excel, and I have been tasked with
> automating the process.
>
> Munging the data and getting the cumulative count per status code per
> day is a snap in Perl, and while I've generated charts in Perl using
> GD::Graph, using R is certainly a lot easier, and besides, I am
> motivated to learn R.

Yes, don't worry this will be a piece of cake too, once your data is
well organised.
>
>> AFAIK, from experience, the best thing to do would be to format your
>> data using Perl without making any modification on your data

>
> The raw data needs to be processed. The 'data' that I will use will be
> contained in hashes, the keys will be status codes, the sub keys will
> be dates, and the values will be integers, sort of like this:
>
> $hash{S}{20110601} => 10
> $hash{S}{20110602} => 13
> $hash{S}{20110603} => 21
> $hash{S}{20110604} => 19
> $hash{S}{20110605} => 25
> $hash{S}{20110606} => 29
> $hash{S}{20110607} => 28
>
> So, I can print out the hash in an R compatible data frame and use it
> directly to generate a PDF.

Yup, just generate a CSV file that will be loaded by R and that will be
it, don't you think ?
>
>> Could you be more precise about why you cannot use perl to generate the
>> input data for R ? --and if so, why calling system() is a problem

>
> I will use Perl to munge the data and produce as output an input file
> for R. I want to be able to push a button and have the computer do all
> the work.

Looks like a decent way of doing things => let the computer work !
have fun,
>
> Thanks for your reply, CC.


 
Reply With Quote
 
Jon Du Kim
Guest
Posts: n/a
 
      09-09-2011
If you have existing R code that you would like to
interface with than some sort of perl/R bridge makes sense.
But, you do know that perl has a fantastically awesome
set of libraries known as Perl Data Language (PDL)?
http://pdl.perl.org/
I have used the PDL Stats modules and they work well
for what I was up to. Check them out too.
http://pdl-stats.sourceforge.net/
Not sure what you are using R for but you can keep it
all Perl if you want to...

On 9/2/11 6:26 PM, ccc31807 wrote:
> Has anyone used R with Perl for statistical programming?
>
> Has anyone used R with Perl to output graphical files?
>
> Is it any more complicated than writing R scripts and calling the R
> interpreter using system() or the like?
>
> I'm about to embark on a major project with R, and really, really need
> Perl to munge my data files. I would like to automate the entire
> thing, but if I can't I can use Perl to generate the input data for R
> and manually generate the output files.
>
> Thanks, CC.


 
Reply With Quote
 
Ted Byers
Guest
Posts: n/a
 
      09-14-2011
On Sep 7, 6:26*pm, ccc31807 <(E-Mail Removed)> wrote:
> On Sep 6, 3:03*am, azrazer <(E-Mail Removed)> wrote:
>
> > Could you be a bit more precise about what you want to do.

>
> I have multiple data files that I will retrieve from a database query.
> These will be on the order of 150K rows, and an indeterminate number
> of columns. The columns will include both dates and status codes, and
> I will need to build a data structure containing the cumulative count
> of status codes over several months, day by day. Then, I need to build
> graphical files with line charts.
>
> This is currently done by hand in Excel, and I have been tasked with
> automating the process.
>
> Munging the data and getting the cumulative count per status code per
> day is a snap in Perl, and while I've generated charts in Perl using
> GD::Graph, using R is certainly a lot easier, and besides, I am
> motivated to learn R.
>
> > AFAIK, from experience, the best thing to do would be to format your
> > data using Perl without making any modification on your data

>
> The raw data needs to be processed. The 'data' that I will use will be
> contained in hashes, the keys will be status codes, the sub keys will
> be dates, and the values will be integers, sort of like this:
>
> $hash{S}{20110601} => 10
> $hash{S}{20110602} => 13
> $hash{S}{20110603} => 21
> $hash{S}{20110604} => 19
> $hash{S}{20110605} => 25
> $hash{S}{20110606} => 29
> $hash{S}{20110607} => 28
>
> So, I can print out the hash in an R compatible data frame and use it
> directly to generate a PDF.
>
> > Could you be more precise about why you cannot use perl to generate the
> > input data for R ? --and if so, why calling system() is a problem

>
> I will use Perl to munge the data and produce as output an input file
> for R. I want to be able to push a button and have the computer do all
> the work.
>
> Thanks for your reply, CC.
>
>


Actually, while the other responses are correct, there is a simpler
way still. Well, actually two; but it may be blasphemy to say so in
this forum. Understand, as long as your DB is one of the common
ones (e.g. MS SQL Server, MySQL, PostgreSQL, &c.) there are drivers
that let your R script connect directly to the DB (equivalent to
Perl's DBI). There is therefore no need to waste time on making CSV
files. And, given that, you can either do any data manipluation using
SQL or you can load the raw data into R and use a selection of one of
its packages to do the sort of manipulations you'd otherwise do using
SQL. Either of these options will be faster than getting Perl
involved in some of the data manipulation. Trust me, I have tried it
in all variations (having perl get/manipulate the data, having the DB
do the manipulation up to the point where my models can do their
various analyses, to importing raw data directly from the DB into R
and having R do it all. In my experience, the latter turned out to be
the faastest. using SQL's data manipulation capability is faster if
the R script and the DB are on different machines communicating over a
slow network.

HTH

Ted

This reduces Perl to simplify invoking the R script (e.g., the only
way I could make my R programs scheduled tasks is to write a simple
perl script that starts it.)
 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      09-15-2011
On Sep 14, 2:16*pm, Ted Byers <(E-Mail Removed)> wrote:
> Actually, while the other responses are correct, there is a simpler
> way still. *Well, actually two; but it may be blasphemy to say so in
> this forum. * *Understand, as long as your DB is one of the common
> ones (e.g. MS SQL Server, MySQL, PostgreSQL, &c.) there are drivers
> that let your R script connect directly to the DB (equivalent to
> Perl's DBI).


My database is a Unidata database from IBM. Aside from the fact that
there isn't a DBD fir Pick, it uses a non-SQL query language,
UniQuery, and even aside from that is the fact that you really can't
manipulate data but just select it.

My challenge lies between the output file of my Perl script, the CSV
file, and the invocation of R. I haven't worked on this since my post,
but If the simplest way works, I'll keep it. 'Simpler' being defined
as having to write the least amount of code to get the output that I
need, which appears to be calling the R executable from system() or
the like.

>*Trust me, I have tried it
> in all variations (having perl get/manipulate the data, having the DB
> do the manipulation up to the point where my models can do their
> various analyses, to importing raw data directly from the DB into R
> and having R do it all. *In my experience, the latter turned out to be
> the faastest. *using SQL's data manipulation capability is faster if
> the R script and the DB are on different machines communicating over a
> slow network.


I can see how it would, however, I'm an old web guy, and I think in
terms of connecting the interface and the database with Perl scripts,
and I don't really have the motivation to change at this point. Who
knows, maybe I'll get another job and do my work like this.

Thanks, CC.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ActiveState Perl and MinGW [was: Perl 5.14 Windows Strawberry Perl 64 bits] Dilbert Perl Misc 0 11-10-2011 02:20 PM
FAQ 1.4 What are Perl 4, Perl 5, or Perl 6? PerlFAQ Server Perl Misc 0 02-27-2011 11:00 PM
FAQ 2.17 What is perl.com? Perl Mongers? pm.org? perl.org? cpan.org? PerlFAQ Server Perl Misc 0 02-03-2011 11:00 AM
FAQ 1.4 What are Perl 4, Perl 5, or Perl 6? PerlFAQ Server Perl Misc 0 01-23-2011 05:00 AM
Perl Help - Windows Perl script accessing a Unix perl Script dpackwood Perl 3 09-30-2003 02:56 AM



Advertisments