Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > python in parallel for pattern discovery in genome data

Reply
Thread Tools

python in parallel for pattern discovery in genome data

 
 
BalyanM
Guest
Posts: n/a
 
      07-30-2003
Hi,

I am new to python.I am using it on redhat linux 9.
I am interested to run python on a sun machine(SunE420R,os=solaris)
with 4 cpu's for a pattern discovery/search program on biological
sequence(genomic sequence).I want to write the python code so that it
utilizes all the 4 cpu's.Moreover do i need some other libraries.
Kindly advice.

Thanks

Sincerely,

Manoj

--

************************************************** **************
Manoj Balyan
Scientist- Bioinformatics
Centre for Cellular and Molecular Biology(CCMB)
Uppal Road,
Hyderabad-500007
Andhra Pradesh,INDIA
TEl:+91-040-27192772,27160222,27192777
FAX:+91-040-27160591,27160311
EMAIL:,
manoj_balyan@hotmail
WWW:http://www.ccmb.res.in
************************************************** *************
If you weep for the setting sun,you will miss the stars:Tagore
************************************************** *************




 
Reply With Quote
 
 
 
 
Stephan Diehl
Guest
Posts: n/a
 
      07-30-2003
BalyanM wrote:

> Hi,
>
> I am new to python.I am using it on redhat linux 9.
> I am interested to run python on a sun machine(SunE420R,os=solaris)
> with 4 cpu's for a pattern discovery/search program on biological
> sequence(genomic sequence).I want to write the python code so that it
> utilizes all the 4 cpu's.Moreover do i need some other libraries.
> Kindly advice.
>
> Thanks
>
> Sincerely,
>
> Manoj
>


Just a normal python interpreter won't help any, because of the GIL (Global
Interpreter Lock).
Just from your description, the following module might be something for you:
http://poshmodule.sourceforge.net/
It allows object sharing between differnet python processes.
As I have never worked with it, I can't say, if it's any good.

Stephan
 
Reply With Quote
 
 
 
 
Andrew Dalke
Guest
Posts: n/a
 
      07-31-2003
BalyanM:
> I am interested to run python on a sun machine(SunE420R,os=solaris)
> with 4 cpu's for a pattern discovery/search program on biological
> sequence(genomic sequence).I want to write the python code so that it
> utilizes all the 4 cpu's.


*oomphh*

There's a lot of details buried in your lines.

It looks like you will be writing your own pattern matching code.
Why? There are plenty of tools for that already. A quick web
search finds http://genome.imb-jena.de/seqanal.html and many
of those tools are freely available.

Okay, suppose you do have the tool or library for it. Do you
want to do high throughput searches? Then you can just break
your N jobs into N/4 parts, one per machine. Easiest way in
Python is to run 4 Python programs, each with a little server going
(see the xmlrpc module for an example) and have your code
call them (see Aahz's excellent example of master/slave
programming using threads). Other options for the communications
are Twisted and Pyro.

You will not be able to do this with one Python process because
Python has what's called the "global interpreter lock" that
prevents core Python from effectively using multiple processors.
You can write a C extension which does the search and gives
up the lock, but I you seem to want to do this in raw Python.

(The suggestion to look at POSH won't work - it has some
Intel-specific assembly instructions in the C extension.)

Depending on the type of pattern search, you instead can assign
1/4 of the genome to each process, with overlap if needed. This
will speed up a single search, which is good for interactivity.

These work for a single "user" of the code. Might you have
many people trying to do pattern searches? If so, you may
need some way to throttle how many searches are done per
machine. For in-house use this likely isn't a problem - besides,
you should get your code working first.

There are other approaches. You could use shared memory or
CORBA for the communications, or PVM or MPI. Still, given
your experience, you should:
1) get your algorithm working on one machine
2) get it working as a client/server using XML-RPC (see the
SimpleXMLRPCServer and xmlrpclib modules),
3) get your client to work with multiple servers,
using multiple threads in the client

(It's a bit of my experience too - I really should try Pyro
for this sort of work. Well, I need a break so maybe I'll
try it out tonight

There are a lot of skills to learn before it all works, so don't
get too discouraged too quickly.

Andrew



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Parallel in, Parallel out shift register Vivek Menon VHDL 0 06-10-2011 10:15 PM
Parallel in, Parallel out shift register Vivek Menon VHDL 5 06-08-2011 03:56 PM
Parallel port control with USB->Parallel converter Soren Python 4 02-14-2008 03:18 PM
How do you get feed discovery to work? I go to web pages I know has feeds, but the feed discovery button is disabled. Help! Tim Bryant Computer Support 1 02-13-2007 05:01 AM
New object-oriented parallel pattern matching algorithm bpontius@greateastsoftware.com C++ 10 10-31-2005 09:24 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57