R and Perl

ccc31807 · Sep 2, 2011

Has anyone used R with Perl for statistical programming?

Has anyone used R with Perl to output graphical files?

Is it any more complicated than writing R scripts and calling the R
interpreter using system() or the like?

I'm about to embark on a major project with R, and really, really need
Perl to munge my data files. I would like to automate the entire
thing, but if I can't I can use Perl to generate the input data for R
and manually generate the output files.

Thanks, CC.

azrazer · Sep 6, 2011

Hello,
Le 03/09/2011 00:26, ccc31807 a écrit :

Has anyone used R with Perl for statistical programming?
Has anyone used R with Perl to output graphical files? I think so, too.

Is it any more complicated than writing R scripts and calling the R
interpreter using system() or the like?

Using system raises no issues in my opinion...
You definitely can write your script and call R using the commandline.

I'm about to embark on a major project with R, and really, really need
Perl to munge my data files. I would like to automate the entire
thing, but if I can't I can use Perl to generate the input data for R
and manually generate the output files.

Could you be a bit more precise about what you want to do.
AFAIK, from experience, the best thing to do would be to format your
data using Perl without making any modification on your data (i.e. if
you have LONG+LARGE tables of numbers, don't make any mathematics on
them using perl) but just FORMAT them as a well-structured table.

Then do all the filtering, mathematical operations etc... on your
database using R.

(I am not saying that Perl is not suitable for such operations, but i
think it is better to launch your Perl script once, and then work on the
database using R, if it is the software you want to use ! Raw data
usually provides more information than modified data)

Could you be more precise about why you cannot use perl to generate the
input data for R ? --and if so, why calling system() is a problem ?--

Thanks, CC.

cheers.

ccc31807 · Sep 7, 2011

Could you be a bit more precise about what you want to do.

I have multiple data files that I will retrieve from a database query.
These will be on the order of 150K rows, and an indeterminate number
of columns. The columns will include both dates and status codes, and
I will need to build a data structure containing the cumulative count
of status codes over several months, day by day. Then, I need to build
graphical files with line charts.

This is currently done by hand in Excel, and I have been tasked with
automating the process.

Munging the data and getting the cumulative count per status code per
day is a snap in Perl, and while I've generated charts in Perl using
GD::Graph, using R is certainly a lot easier, and besides, I am
motivated to learn R.

AFAIK, from experience, the best thing to do would be to format your
data using Perl without making any modification on your data

The raw data needs to be processed. The 'data' that I will use will be
contained in hashes, the keys will be status codes, the sub keys will
be dates, and the values will be integers, sort of like this:

$hash{S}{20110601} => 10
$hash{S}{20110602} => 13
$hash{S}{20110603} => 21
$hash{S}{20110604} => 19
$hash{S}{20110605} => 25
$hash{S}{20110606} => 29
$hash{S}{20110607} => 28

So, I can print out the hash in an R compatible data frame and use it
directly to generate a PDF.

Could you be more precise about why you cannot use perl to generate the
input data for R ? --and if so, why calling system() is a problem

I will use Perl to munge the data and produce as output an input file
for R. I want to be able to push a button and have the computer do all
the work.

Thanks for your reply, CC.

azrazer · Sep 8, 2011

Le 08/09/2011 00:26, ccc31807 a écrit :

I have multiple data files that I will retrieve from a database query.
These will be on the order of 150K rows, and an indeterminate number
of columns. The columns will include both dates and status codes, and
I will need to build a data structure containing the cumulative count
of status codes over several months, day by day. Then, I need to build
graphical files with line charts.

Well yes, this is easily done using R, you just have to aggregate data
(don't you ?). (using aggregate/ddply)

This is currently done by hand in Excel, and I have been tasked with
automating the process.

Munging the data and getting the cumulative count per status code per
day is a snap in Perl, and while I've generated charts in Perl using
GD::Graph, using R is certainly a lot easier, and besides, I am
motivated to learn R.

Yes, don't worry this will be a piece of cake too, once your data is
well organised.

The raw data needs to be processed. The 'data' that I will use will be
contained in hashes, the keys will be status codes, the sub keys will
be dates, and the values will be integers, sort of like this:

$hash{S}{20110601} => 10
$hash{S}{20110602} => 13
$hash{S}{20110603} => 21
$hash{S}{20110604} => 19
$hash{S}{20110605} => 25
$hash{S}{20110606} => 29
$hash{S}{20110607} => 28

So, I can print out the hash in an R compatible data frame and use it
directly to generate a PDF.

Yup, just generate a CSV file that will be loaded by R and that will be
it, don't you think ?

I will use Perl to munge the data and produce as output an input file
for R. I want to be able to push a button and have the computer do all
the work.

Looks like a decent way of doing things => let the computer work !

have fun,

Jon Du Kim · Sep 9, 2011

If you have existing R code that you would like to
interface with than some sort of perl/R bridge makes sense.
But, you do know that perl has a fantastically awesome
set of libraries known as Perl Data Language (PDL)?
http://pdl.perl.org/
I have used the PDL Stats modules and they work well
for what I was up to. Check them out too.
http://pdl-stats.sourceforge.net/
Not sure what you are using R for but you can keep it
all Perl if you want to...

Ted Byers · Sep 14, 2011

I have multiple data files that I will retrieve from a database query.
These will be on the order of 150K rows, and an indeterminate number
of columns. The columns will include both dates and status codes, and
I will need to build a data structure containing the cumulative count
of status codes over several months, day by day. Then, I need to build
graphical files with line charts.

This is currently done by hand in Excel, and I have been tasked with
automating the process.

Munging the data and getting the cumulative count per status code per
day is a snap in Perl, and while I've generated charts in Perl using
GD::Graph, using R is certainly a lot easier, and besides, I am
motivated to learn R.

The raw data needs to be processed. The 'data' that I will use will be
contained in hashes, the keys will be status codes, the sub keys will
be dates, and the values will be integers, sort of like this:

$hash{S}{20110601} => 10
$hash{S}{20110602} => 13
$hash{S}{20110603} => 21
$hash{S}{20110604} => 19
$hash{S}{20110605} => 25
$hash{S}{20110606} => 29
$hash{S}{20110607} => 28

So, I can print out the hash in an R compatible data frame and use it
directly to generate a PDF.

I will use Perl to munge the data and produce as output an input file
for R. I want to be able to push a button and have the computer do all
the work.

Thanks for your reply, CC.

Actually, while the other responses are correct, there is a simpler
way still. Well, actually two; but it may be blasphemy to say so in
this forum. ;-) Understand, as long as your DB is one of the common
ones (e.g. MS SQL Server, MySQL, PostgreSQL, &c.) there are drivers
that let your R script connect directly to the DB (equivalent to
Perl's DBI). There is therefore no need to waste time on making CSV
files. And, given that, you can either do any data manipluation using
SQL or you can load the raw data into R and use a selection of one of
its packages to do the sort of manipulations you'd otherwise do using
SQL. Either of these options will be faster than getting Perl
involved in some of the data manipulation. Trust me, I have tried it
in all variations (having perl get/manipulate the data, having the DB
do the manipulation up to the point where my models can do their
various analyses, to importing raw data directly from the DB into R
and having R do it all. In my experience, the latter turned out to be
the faastest. using SQL's data manipulation capability is faster if
the R script and the DB are on different machines communicating over a
slow network.

HTH

Ted

This reduces Perl to simplify invoking the R script (e.g., the only
way I could make my R programs scheduled tasks is to write a simple
perl script that starts it.)

ccc31807 · Sep 15, 2011

Actually, while the other responses are correct, there is a simpler
way still. Well, actually two; but it may be blasphemy to say so in
this forum. ;-) Understand, as long as your DB is one of the common
ones (e.g. MS SQL Server, MySQL, PostgreSQL, &c.) there are drivers
that let your R script connect directly to the DB (equivalent to
Perl's DBI).

My database is a Unidata database from IBM. Aside from the fact that
there isn't a DBD fir Pick, it uses a non-SQL query language,
UniQuery, and even aside from that is the fact that you really can't
manipulate data but just select it.

My challenge lies between the output file of my Perl script, the CSV
file, and the invocation of R. I haven't worked on this since my post,
but If the simplest way works, I'll keep it. 'Simpler' being defined
as having to write the least amount of code to get the output that I
need, which appears to be calling the R executable from system() or
the like.

Trust me, I have tried it
in all variations (having perl get/manipulate the data, having the DB
do the manipulation up to the point where my models can do their
various analyses, to importing raw data directly from the DB into R
and having R do it all. In my experience, the latter turned out to be
the faastest. using SQL's data manipulation capability is faster if
the R script and the DB are on different machines communicating over a
slow network.

I can see how it would, however, I'm an old web guy, and I think in
terms of connecting the interface and the database with Perl scripts,
and I don't really have the motivation to change at this point. Who
knows, maybe I'll get another job and do my work like this.

Thanks, CC.

PDF, Excel, LaTeX, and possibly R and sweave	7	Nov 1, 2013
filllable PDFs with Perl	1	Jun 4, 2014
Perl 5.20 and CGI	13	Jun 3, 2014
Clever implementation for s///r	5	Apr 16, 2012
[CM] OStatic on Package Mgmt & Perl	1	Mar 6, 2014
Final chapter of "Learn PHP, MySQL and JavaScript"	3	Jun 4, 2024
Create and Preview HTML & PDF with Custom Encryption and Micro Cloud Storage	0	Nov 12, 2024
deploying perl applets	1	Sep 26, 2012

R and Perl

ccc31807

azrazer

ccc31807

azrazer

Jon Du Kim

Ted Byers

ccc31807

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads