Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Perl bioinformatics

Reply
Thread Tools

Perl bioinformatics

 
 
ccc31807
Guest
Posts: n/a
 
      10-26-2009
I'm not changing jobs, but I've been contacted about some contract
opportunities that (reportedly) are difficult but seem easy enough to
me, manipulating genome files to produce various kinds of reports,
graphs, etc. I have zero experience in this, so I'm just wondering ...

1. What are the career opportunities in bioinformatics using Perl?

2. Looking for books, I found the following:
a. Beginning Perl for Bioinformatics by James Tisdall
b. Mastering Perl for Bioinformatics by James D. Tisdall
c. Building Bioinformatics Solutions: with Perl, R and MySQL by
Conrad Bessant**
d. Perl Programming for Biologists by D. Curtis Jamison
e. Genomic Perl: From Bioinformatics Basics to Working Code by Rex A.
Dwyer

Looking at the tables of contents, reviews, and reader comments, I
believe that c. is probably the best value, but it's real hard to tell
without buying and reading the book. Anybody have any experiences
using any of these books? I'd like to conserve both time and money by
starting with the 'best' book.

Thanks, CC.
 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      10-26-2009
ccc31807 <> wrote:
>I'm not changing jobs, but I've been contacted about some contract
>opportunities that (reportedly) are difficult but seem easy enough to
>me, manipulating genome files to produce various kinds of reports,
>graphs, etc. I have zero experience in this, so I'm just wondering ...


The usual problem is the huge volume of data that needs processing.
Therefore typically the standard algorithms don't work any more and you
need a really strong background in data processing.
Perl is not necessariy the best choice here. Perl's powerful features
make it easy to write code that seems to do the job, but it won't scale
from the small test samples to the huge actual data set where you really
need special methods and optimizations.

A little while ago there was someone posting questions here regularly
about how to deal with genom sequences. If don't know if he is still
around, but maybe you can check the archives and contact him.

jue
 
Reply With Quote
 
 
 
 
Bradley K. Sherman
Guest
Posts: n/a
 
      10-26-2009
In article <0f055c16-6bca-4c4d-94d8->,
ccc31807 <> wrote:
>
>Looking at the tables of contents, reviews, and reader comments, I
>believe that c. is probably the best value, but it's real hard to tell
>without buying and reading the book. Anybody have any experiences
>using any of these books? I'd like to conserve both time and money by
>starting with the 'best' book.
>


The 'best' book is the one that engages you. It's hard to
predict.

For $22.95 you can get access to *all* the O'Reilly books
<http://my.safaribooksonline.com/>
including several on bioinformatics. There's a free trial!

You might want to check the used book stores for a textbook like
_The Molecular Biology of the Gene_, so that you can pick up some
biology.

--bks

 
Reply With Quote
 
Bradley K. Sherman
Guest
Posts: n/a
 
      10-26-2009
In article <>,
Jürgen Exner <> wrote:
> ...
>The usual problem is the huge volume of data that needs processing.
>Therefore typically the standard algorithms don't work any more and you
>need a really strong background in data processing.
>Perl is not necessariy the best choice here. Perl's powerful features
>make it easy to write code that seems to do the job, but it won't scale
>from the small test samples to the huge actual data set where you really
>need special methods and optimizations.
> ...


This is not really fair. Most of bioinformatics is data wrangling
and Perl is exactly the right choice for that.

See, e.g.
<http://www.foo.be/docs/tpj/issues/vol1_2/tpj0102-0001.html>

--bks

 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      10-26-2009
On Oct 26, 10:45*am, b...@panix.com (Bradley K. Sherman) wrote:
> >The usual problem is the huge volume of data that needs processing.
> >Therefore typically the standard algorithms don't work any more and you
> >need a really strong background in data processing.


>
> This is not really fair. *Most of bioinformatics is data wrangling
> and Perl is exactly the right choice for that.


In my day job, I deal with data files on the order of several hundred
thousand records. The scripts I write to produce reports from these
data files sometimes take a second (or several seconds) to run. The
data file I have for the bioinformatics project is much larger, but is
a lot simpler (it's a dotplot file).

Sometimes, data files can be so huge that the script just breaks.
Sometimes, the script just runs longer than you might expect.
Obviously, the longer time really isn't a problem ... there's no
difference between a script that runs in microseconds and one that
runs in minutes (say, between 60 and 120) ... as long as the script
runs to completion.

I'm sympathetic to jue's observation about the scaling problem, but
after having looked at the data, the fact that it's genomic or
biological is totally irrelevant. It's really the amount of data
rather than the kind of data that seems to be significant.

You seem to have a handle on what's going on. Is using Perl for
bioinformatics totally off the wall, or a reasonable option for data
mangling?

CC
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      10-26-2009
>>>>> "JE" == Jürgen Exner <> writes:

JE> ccc31807 <> wrote:
>> I'm not changing jobs, but I've been contacted about some contract
>> opportunities that (reportedly) are difficult but seem easy enough to
>> me, manipulating genome files to produce various kinds of reports,
>> graphs, etc. I have zero experience in this, so I'm just wondering ...


JE> The usual problem is the huge volume of data that needs processing.
JE> Therefore typically the standard algorithms don't work any more and you
JE> need a really strong background in data processing.
JE> Perl is not necessariy the best choice here. Perl's powerful features
JE> make it easy to write code that seems to do the job, but it won't scale
JE> from the small test samples to the huge actual data set where you really
JE> need special methods and optimizations.

JE> A little while ago there was someone posting questions here regularly
JE> about how to deal with genom sequences. If don't know if he is still
JE> around, but maybe you can check the archives and contact him.

i will disagree on this. first off, perl is major in the biotech world
for several reasons. one it is the best at text processing and most
large genetic files are just plain text formats. secondly, there is
large package called bioperl (with its own mailing list and community)
that does tons of standard things on those files and more. finally, if
you look back a bit, there is a great article called 'how perl saved the
human genome project'. when that project was initially running it was
distributed over many labs worldwide. and they created many new
incompatible file formats for the data. the author of cgi.pm (who is
really an MD and genetic researcher) designed perl modules to convert
those formats to a common set of core formats so they could easily
exchange data. so perl has a strong tie to the biotech industry that is
not likely to be broken for a long while.

as for jobs, i don't see many leads in that industry but they are
usually looking for direct experience in it (hard to get from the
outside) and/or higher degrees in related fields because you would be
working in such an environment where you need it.

so if the OP can learn enough from books and practice to get a job in
the field, i say go for it. there many be other hurdles to jump but i
can't predict what they will be.

uri
perlhunter.com (so i know something about the perl job market)

--
Uri Guttman ------ -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
Bradley K. Sherman
Guest
Posts: n/a
 
      10-26-2009
In article <56bcf5e0-d0cf-4de0-bbef->,
ccc31807 <> wrote:
> ...
>You seem to have a handle on what's going on. Is using Perl for
>bioinformatics totally off the wall, or a reasonable option for data
>mangling?
>


I think that Perl is the primary language for bioinformatics.
I can't back that up with numbers but I have been working in
bioinformatics since 1992. Some of the younger bioinformaticians
might want to make a case for Python, but I'm skeptical.

My philosophy is to use Perl until it becomes necessary to
write something in C. It rarely becomes necessary.

Learning databases and statistics are also of great importance.

--bks

 
Reply With Quote
 
Jochen Lehmeier
Guest
Posts: n/a
 
      10-26-2009
On Mon, 26 Oct 2009 17:00:49 +0100, ccc31807 <> wrote:

> You seem to have a handle on what's going on. Is using Perl for
> bioinformatics totally off the wall, or a reasonable option for data
> mangling?


I have no idea about bioinformatics, but Perl is easy enough that you
should be able to get a book, jot down a quick & dirty test script and
just sic it on your biggest and meanest data set.

Then you get a quick handle on how long basic stuff takes. If it works
fast enough, fine; if not, feel free to ask here. And if you find that
it's just not the right tool, then you won't have lost much.

IMO, the deal breaker will be if you have to handle data in an O(n^2)
fashion (or worse), i.e. where one would really use some very special
index structure, especially if the whole data set does not fit into RAM.

Good luck!
 
Reply With Quote
 
Keith Bradnam
Guest
Posts: n/a
 
      10-26-2009
On Oct 26, 7:17*am, ccc31807 <carte...@gmail.com> wrote:
> I'm not changing jobs, but I've been contacted about some contract
> opportunities that (reportedly) are difficult but seem easy enough to
> me, manipulating genome files to produce various kinds of reports,
> graphs, etc. I have zero experience in this, so I'm just wondering ...
>
> 1. What are the career opportunities in bioinformatics using Perl?
>
> 2. Looking for books, I found the following:
> *a. Beginning Perl for Bioinformatics by James Tisdall
> *b. Mastering Perl for Bioinformatics by James D. Tisdall
> *c. Building Bioinformatics Solutions: with Perl, R and MySQL by
> Conrad Bessant**
> *d. Perl Programming for Biologists by D. Curtis Jamison
> *e. Genomic Perl: From Bioinformatics Basics to Working Code by Rex A.
> Dwyer
>
> Looking at the tables of contents, reviews, and reader comments, I
> believe that c. is probably the best value, but it's real hard to tell
> without buying and reading the book. Anybody have any experiences
> using any of these books? I'd like to conserve both time and money by
> starting with the 'best' book.
>
> Thanks, CC.


I co-teach a Unix & Perl course at UC Davis that is aimed at teaching
graduate students how to learn the basics of Perl in a biological
context. We have specifically tried to assume no prior knowledge of
programming as many people who take our course are new to this.

We have made our course materials (data & documentation) freely
available to anyone else who is interested:

http://korflab.ucdavis.edu/Unix_and_Perl/index.html

There is a corresponding Google Group for discussion of issues arising
from the course. We also make regular updates to the documentation.
Hope this might be of use to you.

Keith
 
Reply With Quote
 
Xho Jingleheimerschmidt
Guest
Posts: n/a
 
      10-27-2009
Jürgen Exner wrote:
> ccc31807 <> wrote:
>> I'm not changing jobs, but I've been contacted about some contract
>> opportunities that (reportedly) are difficult but seem easy enough to
>> me, manipulating genome files to produce various kinds of reports,
>> graphs, etc. I have zero experience in this, so I'm just wondering ...

>
> The usual problem is the huge volume of data that needs processing.
> Therefore typically the standard algorithms don't work any more and you
> need a really strong background in data processing.


Isn't that exactly Perl's strength?

> Perl is not necessariy the best choice here. Perl's powerful features
> make it easy to write code that seems to do the job, but it won't scale
> from the small test samples to the huge actual data set where you really
> need special methods and optimizations.


If you think about scalability as you write the code, Perl will not
present any special scalability issues versus other languages. If you
do not think about scalability, no language choice will protect you.

I certainly would not implement a heavy duty multiple alignment
algorithm directly in Perl, but I certainly might (and have) implement
things like that in Inline::C or just link pre-existing C code in via
XS, using Perl to handle the book-keeping, memory management, IPC,
pre-processing and parsing, post-processing, packing, unpacking, etc.

Based on the description of "produce various kinds of reports", I
wouldn't think they expect this to cover Smith-Waterman type of things
anyway, but only the kind of reports that are very similar to what you
would find in non-bioinformatics type work.

Xho
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Using PERL to solve a specific bioinformatics problem michaelzhao Perl Misc 0 06-21-2007 05:10 PM
Perl Developer/Java/Bioinformatics Philadelphia PA Carolyn Perl Misc 1 09-13-2005 05:58 PM
has anybody write bioinformatics programme with perl hugo Perl Misc 4 08-17-2004 11:01 AM
ANN: Bioinformatics Open Source Conference Andrew Dalke Python 0 04-08-2004 09:58 PM
Any online books for perl and bioinformatics? GeekBeak Perl 0 12-04-2003 09:53 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57