Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   concatenate fasta file (http://www.velocityreviews.com/forums/t714719-concatenate-fasta-file.html)

PeroMHC 02-12-2010 04:06 PM

concatenate fasta file
 
Hi All, I have a simple problem that I hope somebody can help with. I
have an input file (a fasta file) that I need to edit..

Input file format

>name 1

tactcatacatac
>name 2

acggtggcat
>name 3

gggtaccacgtt

I need to concatenate the sequences.. make them look like

>concatenated

tactcatacatacacggtggcatgggtaccacgtt

thanks. Matt

Roy Smith 02-12-2010 04:23 PM

Re: concatenate fasta file
 
In article
<62a50def-e391-4585-9a23-fb91f2e2edc8@b9g2000pri.googlegroups.com>,
PeroMHC <macmanes@gmail.com> wrote:

> Hi All, I have a simple problem that I hope somebody can help with. I
> have an input file (a fasta file) that I need to edit..
>
> Input file format
>
> >name 1

> tactcatacatac
> >name 2

> acggtggcat
> >name 3

> gggtaccacgtt
>
> I need to concatenate the sequences.. make them look like
>
> >concatenated

> tactcatacatacacggtggcatgggtaccacgtt
>
> thanks. Matt


Some quick ideas. First, try something along the lines of (not tested):

data=[]
for line in sys.stdin:
if line.startswith('>'):
continue
data.append(line.strip())
print ''.join(data)

Second, check out http://biopython.org/wiki/Main_Page. I'm sure somebody
has solved this problem before.

Jean-Michel Pichavant 02-12-2010 04:49 PM

Re: concatenate fasta file
 
PeroMHC wrote:
> Hi All, I have a simple problem that I hope somebody can help with. I
> have an input file (a fasta file) that I need to edit..
>
> Input file format
>
>
>> name 1
>>

> tactcatacatac
>
>> name 2
>>

> acggtggcat
>
>> name 3
>>

> gggtaccacgtt
>
> I need to concatenate the sequences.. make them look like
>
>
>> concatenated
>>

> tactcatacatacacggtggcatgggtaccacgtt
>
> thanks. Matt
>

A solution using regexp:

found = []
for line in open('seqfile.txt'):
found += re.findall('^[acgtACGT]+$', line)

print found
> ['tactcatacatac', 'acggtggcat', 'gggtaccacgtt']


print ''.join(found)
> 'tactcatacatacacggtggcatgggtaccacgtt'



JM

Grant Edwards 02-13-2010 03:14 PM

Re: concatenate fasta file
 
On 2010-02-12, PeroMHC <macmanes@gmail.com> wrote:
> Hi All, I have a simple problem that I hope somebody can help with. I
> have an input file (a fasta file) that I need to edit..
>
> Input file format
>
>>name 1

> tactcatacatac
>>name 2

> acggtggcat
>>name 3

> gggtaccacgtt
>
> I need to concatenate the sequences.. make them look like
>
>>concatenated

> tactcatacatacacggtggcatgggtaccacgtt


(echo "concantenated>"; grep '^ [actg]*$' inputfile | tr -d '\n'; echo) > outputfile

--
Grant



All times are GMT. The time now is 10:54 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.