Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > DNA String Compression For Storing in Data Structure

Reply
Thread Tools

DNA String Compression For Storing in Data Structure

 
 
Gundala Viswanath
Guest
Posts: n/a
 
      01-17-2009
Hi all,

I am new in C/C++. I am wondering if there is any
existing implementation to compress such string in
shorter format (e.g. 64 base).

AAAAAAAAAAAAGTCGCGCCGCCGCGGGGAGGAA

The reason I want to do this is because there are ~10millions of such
tags I want to process forming a matrix. There fore I need
to compress such a string for handling.

For example the implementation in R will give this:

> seq2id("AAAAAAAAAAAAGTCGCGCCGCCGCGGGGAGGAA")

[1] "IAAAAtmWWaooA

The R code can be viewed here: http://dpaste.com/110009/

But I am not sure how to implement this in C/C++.
Thanks before hand.


- GV
 
Reply With Quote
 
 
 
 
Gert-Jan de Vos
Guest
Posts: n/a
 
      01-17-2009
On Jan 17, 8:12*am, Gundala Viswanath <(E-Mail Removed)> wrote:
> I am new in C/C++. I am wondering if there is any
> existing implementation to compress such string in
> shorter format (e.g. 64 base).
>
> AAAAAAAAAAAAGTCGCGCCGCCGCGGGGAGGAA


I am no expert in DNA but I understand there are only 4 possible
symbols: A,C,G,T. In that case 2 bits are enough to encode each
symbol. This would make a 2 bit encoded sequence 4 times smaller than
the equivalent char string. A fixed 2 bit/symbol also makes it quite
easy to index a sequence at random positions and insert/extract
symbols. I suggest you make a class that uses a vector<unsigned> to
store the encoded symbol bits and give it a vector like interface to
index individual symbols as plain chars.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Cloning PCR DNA cyber science Python 0 09-11-2009 10:24 AM
Organize large DNA txt files thomasvangurp@gmail.com Python 5 03-20-2009 05:27 PM
~ **DNA,RNA Related Bio Technology** ~ sano Digital Photography 0 07-13-2007 10:34 AM



Advertisments