Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Memory error due to the huge/huge input file size

Reply
Thread Tools

Memory error due to the huge/huge input file size

 
 
tejsupra@gmail.com
Guest
Posts: n/a
 
      11-10-2008
Hello Everyone,

I need to read a .csv file which has a size of 2.26 GB . And I wrote a
Python script , where I need to read this file. And my Computer has 2
GB RAM Please see the code as follows:

"""
This program has been developed to retrieve all the promoter sequences
for the specified
list of genes in the given cluster

So, this program will act as a substitute to the whole EZRetrieve
system

Input arguments:

1) Cluster.txt or DowRatClust161718bwithDummy.txt
2) TransProCrossReferenceAndSequences.csv -> This is the file that has
all the promoter sequences
3) -2000
4) 500
"""

import time
import csv
import sys
import linecache
import re
from sets import Set
import gc

print time.localtime()

fileInputHandler = open(sys.argv[1],"r")
line = fileInputHandler.readline()

refSeqIDsinTransPro = []
promoterSequencesinTransPro = []
reader2 = csv.reader(open(sys.argv[2],"rb"))
reader2_list = []
reader2_list.extend(reader2)

for data2 in reader2_list:
refSeqIDsinTransPro.append(data2[3])
for data2 in reader2_list:
promoterSequencesinTransPro.append(data2[4])

while line:
l = line.rstrip('\n')
for j in range(1,len(refSeqIDsinTransPro)):
found = re.search(l,refSeqIDsinTransPro[j])
if found:
"""promoterSequencesinTransPro[j] """
print l

line = fileInputHandler.readline()


fileInputHandler.close()


The error that I got is given as follows:
Traceback (most recent call last):
File "RefSeqsToPromoterSequences.py", line 31, in <module>
reader2_list.extend(reader2)
MemoryError

I understand that the issue is Memory error and it is caused because
of the line reader2_list.extend(reader2). Is there any other
alternative method in reading the .csv file line by line?

sincerely,
Suprabhath
 
Reply With Quote
 
 
 
 
James Mills
Guest
Posts: n/a
 
      11-10-2008
On Tue, Nov 11, 2008 at 7:47 AM, <> wrote:
> refSeqIDsinTransPro = []
> promoterSequencesinTransPro = []
> reader2 = csv.reader(open(sys.argv[2],"rb"))
> reader2_list = []
> reader2_list.extend(reader2)


Without testing, this looks like you're reading the _ENTIRE_
input stream into memory! Try this:

def readCSV(file):

if type(file) == str:
fd = open(file, "rU")
else:
fd = file

sniffer = csv.Sniffer()
dialect = sniffer.sniff(fd.readline())
fd.seek(0)

reader = csv.reader(fd, dialect)
for line in reader:
yield line

for line in readCSV(open("foo.csv", "r")):
...

--JamesMills

--
--
-- "Problems are solved by method"
 
Reply With Quote
 
 
 
 
John Machin
Guest
Posts: n/a
 
      11-10-2008
On Nov 11, 8:47*am, tejsu...@gmail.com wrote:

> import linecache


Why???

> reader2 = csv.reader(open(sys.argv[2],"rb"))
> reader2_list = []
> reader2_list.extend(reader2)
>
> for data2 in reader2_list:
> * *refSeqIDsinTransPro.append(data2[3])
> for data2 in reader2_list:
> * *promoterSequencesinTransPro.append(data2[4])



All you need to do is replace the above by:

reader2 = csv.reader(open(sys.argv[2],"rb"))

for data2 in reader2:
refSeqIDsinTransPro.append(data2[3])
promoterSequencesinTransPro.append(data2[4])
 
Reply With Quote
 
tejsupra@gmail.com
Guest
Posts: n/a
 
      11-20-2008
On Nov 10, 4:47*pm, tejsu...@gmail.com wrote:
> Hello Everyone,
>
> I need to read a .csv file which has a size of 2.26 GB . And I wrote a
> Python script , where I need to read this file. And my Computer has 2
> GB RAM Please see the code as follows:
>
> """
> This program has been developed to retrieve all the promoter sequences
> for the specified
> list of genes in the given cluster
>
> So, this program will act as a substitute to the whole EZRetrieve
> system
>
> Input arguments:
>
> 1) Cluster.txt or DowRatClust161718bwithDummy.txt
> 2) TransProCrossReferenceAndSequences.csv -> This is the file that has
> all the promoter sequences
> 3) -2000
> 4) 500
> """
>
> import time
> import csv
> import sys
> import linecache
> import re
> from sets import Set
> import gc
>
> print time.localtime()
>
> fileInputHandler = open(sys.argv[1],"r")
> line = fileInputHandler.readline()
>
> refSeqIDsinTransPro = []
> promoterSequencesinTransPro = []
> reader2 = csv.reader(open(sys.argv[2],"rb"))
> reader2_list = []
> reader2_list.extend(reader2)
>
> for data2 in reader2_list:
> * *refSeqIDsinTransPro.append(data2[3])
> for data2 in reader2_list:
> * *promoterSequencesinTransPro.append(data2[4])
>
> while line:
> * *l = line.rstrip('\n')
> * *for j in range(1,len(refSeqIDsinTransPro)):
> * * * found = re.search(l,refSeqIDsinTransPro[j])
> * * * if found:
> * * * * *"""promoterSequencesinTransPro[j] *"""
> * * * * *print l
>
> * *line = fileInputHandler.readline()
>
> fileInputHandler.close()
>
> The error that I got is given as follows:
> Traceback (most recent call last):
> * File "RefSeqsToPromoterSequences.py", line 31, in <module>
> * * reader2_list.extend(reader2)
> MemoryError
>
> I understand that the issue is Memory error and it is caused because
> of the *line reader2_list.extend(reader2). Is there any other
> alternative method in reading the .csv file *line by line?
>
> sincerely,
> Suprabhath


Thanks a Lot James Mills. It worked

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Memory error due to big input file Dave Angel Python 4 07-14-2009 03:49 AM
Application Pool crashing due to excesive memory use or ASP error - ASP problem mettá ASP General 2 11-08-2008 10:17 AM
How to due with "warning LNK4075: ignoring '/INCREMENTAL' due to Fresh C++ 2 04-22-2008 09:03 PM
mega pixels, file size, image size, and print size - Adobe Evangelists Frank ess Digital Photography 0 11-14-2006 05:08 PM
[HELP] <GC: Reduced max java heap size to .... bytes due to memory contraints> ] Leon Java 2 06-15-2005 09:26 PM



Advertisments