Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > concatenate fasta file

Reply
Thread Tools

concatenate fasta file

 
 
PeroMHC
Guest
Posts: n/a
 
      02-12-2010
Hi All, I have a simple problem that I hope somebody can help with. I
have an input file (a fasta file) that I need to edit..

Input file format

>name 1

tactcatacatac
>name 2

acggtggcat
>name 3

gggtaccacgtt

I need to concatenate the sequences.. make them look like

>concatenated

tactcatacatacacggtggcatgggtaccacgtt

thanks. Matt
 
Reply With Quote
 
 
 
 
Roy Smith
Guest
Posts: n/a
 
      02-12-2010
In article
<62a50def-e391-4585-9a23->,
PeroMHC <> wrote:

> Hi All, I have a simple problem that I hope somebody can help with. I
> have an input file (a fasta file) that I need to edit..
>
> Input file format
>
> >name 1

> tactcatacatac
> >name 2

> acggtggcat
> >name 3

> gggtaccacgtt
>
> I need to concatenate the sequences.. make them look like
>
> >concatenated

> tactcatacatacacggtggcatgggtaccacgtt
>
> thanks. Matt


Some quick ideas. First, try something along the lines of (not tested):

data=[]
for line in sys.stdin:
if line.startswith('>'):
continue
data.append(line.strip())
print ''.join(data)

Second, check out http://biopython.org/wiki/Main_Page. I'm sure somebody
has solved this problem before.
 
Reply With Quote
 
 
 
 
Jean-Michel Pichavant
Guest
Posts: n/a
 
      02-12-2010
PeroMHC wrote:
> Hi All, I have a simple problem that I hope somebody can help with. I
> have an input file (a fasta file) that I need to edit..
>
> Input file format
>
>
>> name 1
>>

> tactcatacatac
>
>> name 2
>>

> acggtggcat
>
>> name 3
>>

> gggtaccacgtt
>
> I need to concatenate the sequences.. make them look like
>
>
>> concatenated
>>

> tactcatacatacacggtggcatgggtaccacgtt
>
> thanks. Matt
>

A solution using regexp:

found = []
for line in open('seqfile.txt'):
found += re.findall('^[acgtACGT]+$', line)

print found
> ['tactcatacatac', 'acggtggcat', 'gggtaccacgtt']


print ''.join(found)
> 'tactcatacatacacggtggcatgggtaccacgtt'



JM
 
Reply With Quote
 
Grant Edwards
Guest
Posts: n/a
 
      02-13-2010
On 2010-02-12, PeroMHC <> wrote:
> Hi All, I have a simple problem that I hope somebody can help with. I
> have an input file (a fasta file) that I need to edit..
>
> Input file format
>
>>name 1

> tactcatacatac
>>name 2

> acggtggcat
>>name 3

> gggtaccacgtt
>
> I need to concatenate the sequences.. make them look like
>
>>concatenated

> tactcatacatacacggtggcatgggtaccacgtt


(echo "concantenated>"; grep '^ [actg]*$' inputfile | tr -d '\n'; echo) > outputfile

--
Grant

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Concatenate/De-Concatenate Carlos VHDL 10 10-24-2012 01:04 PM
concatenate file-like objects -> file-like object kgk Python 1 07-11-2007 06:17 AM
convert protein fasta stream into harsh table zhong.huang@gmail.com Perl Misc 9 03-01-2006 07:57 PM
What strategy for random accession of records in massive FASTA file? Chris Lasher Python 26 01-16-2005 12:22 AM
Need to concatenate all files in a dir together into one file and read the first 225 characters from each file into another file. Tony Perl Misc 5 04-19-2004 03:28 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57