Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > Processing large files with TextFieldParser

Reply
Thread Tools

Processing large files with TextFieldParser

 
 
Jon Spivey
Guest
Posts: n/a
 
      11-30-2009
ASP.net 3.5/SQL Server 2008

I've got a large (1.1m rows) csv file which needs parsing and sticking into
sql server - the job needs doing every day and if anything the csv will get
larger over time. At present I'm using a TextFieldParser to parse the csv
line by line and add to the database. This fails probably 2 times in 3, if
it's going to fall over it's usually at around 200,000 lines. Looking for
suggestions as to how to do this robustly, on a shared server which doesn't
allow bulk insert. Fair to assume the server is a factor in failure but I
can't upgrade just yet.

Would I be better breaking the csv into say 5 seperate files then processing
each individually or processing in chunks, eg

if TextFieldParser.LineNumber < 200,000 then
process first chunk
end if

if TextFieldParser.LineNumber > 200000 and TextFieldParser.LineNumber
<400000 then
process next chunk
end if

etc.

Or something else entirely?

Cheers,
Jon




 
Reply With Quote
 
 
 
 
Gregory A. Beamer
Guest
Posts: n/a
 
      11-30-2009
"Jon Spivey" <(E-Mail Removed)> wrote in
news:u#Orc$(E-Mail Removed):

> I've got a large (1.1m rows) csv file which needs parsing and sticking
> into sql server - the job needs doing every day and if anything the
> csv will get larger over time. At present I'm using a TextFieldParser
> to parse the csv line by line and add to the database. This fails
> probably 2 times in 3, if it's going to fall over it's usually at
> around 200,000 lines. Looking for suggestions as to how to do this
> robustly, on a shared server which doesn't allow bulk insert. Fair to
> assume the server is a factor in failure but I can't upgrade just yet.


Having done this numerous times, I find the best way is to use a
StreamReader and read in line by line, esp. with large files, as trying
to store everything in memory (whether DataSet or objects) ends up
unwieldy.

With a good regex, you can divide out the elements, even if there is a
text delimiter (usually some form of quote). I have written my own, but
I would not be surprised if there are others.

Another direction to conquer this, as you are storing in SQL Server, is
to use SSIS (or DTS in older versions). SSIS has the ability to read a
CSV file.

If this is a file format you can set up a BCP file for, you can bulk
load the items into SQL Server, as well. NOTE that this will not work if
you have to manipulate the CSV flat file into multiple tables, however.

In the past, I architected a system that had multiple GB files that had
to be manipulated. The solution was to leave the data in flat files and
manipulate out into files that mimicked SQL Server tables. I then
incremented the IDENTITY values and seeded the flat files. This required
many passes and some file sorts to get things into SQL Server, so it is
overkill if the file is very predictable and/or does not require
extensive manipulation.

Peace and Grace,


--
Gregory A. Beamer (MVP)

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
 
Reply With Quote
 
 
 
 
Jon Spivey
Guest
Posts: n/a
 
      12-01-2009
Hi Greg,
Thanks for your reply. I've gone with a streamreader as you suggested,
tested with the 1st csv and it worked perfectly. Going to test it a few more
times to be sure but it certainly seems to be the solution.

Cheers,
Jon

"Gregory A. Beamer" <(E-Mail Removed)> wrote in message
news:Xns9CD37DC4031Bgbworld@207.46.248.16...
> "Jon Spivey" <(E-Mail Removed)> wrote in
> news:u#Orc$(E-Mail Removed):
>
>> I've got a large (1.1m rows) csv file which needs parsing and sticking
>> into sql server - the job needs doing every day and if anything the
>> csv will get larger over time. At present I'm using a TextFieldParser
>> to parse the csv line by line and add to the database. This fails
>> probably 2 times in 3, if it's going to fall over it's usually at
>> around 200,000 lines. Looking for suggestions as to how to do this
>> robustly, on a shared server which doesn't allow bulk insert. Fair to
>> assume the server is a factor in failure but I can't upgrade just yet.

>
> Having done this numerous times, I find the best way is to use a
> StreamReader and read in line by line, esp. with large files, as trying
> to store everything in memory (whether DataSet or objects) ends up
> unwieldy.
>
> With a good regex, you can divide out the elements, even if there is a
> text delimiter (usually some form of quote). I have written my own, but
> I would not be surprised if there are others.
>
> Another direction to conquer this, as you are storing in SQL Server, is
> to use SSIS (or DTS in older versions). SSIS has the ability to read a
> CSV file.
>
> If this is a file format you can set up a BCP file for, you can bulk
> load the items into SQL Server, as well. NOTE that this will not work if
> you have to manipulate the CSV flat file into multiple tables, however.
>
> In the past, I architected a system that had multiple GB files that had
> to be manipulated. The solution was to leave the data in flat files and
> manipulate out into files that mimicked SQL Server tables. I then
> incremented the IDENTITY values and seeded the flat files. This required
> many passes and some file sorts to get things into SQL Server, so it is
> overkill if the file is very predictable and/or does not require
> extensive manipulation.
>
> Peace and Grace,
>
>
> --
> Gregory A. Beamer (MVP)
>
> Twitter: @gbworld
> Blog: http://gregorybeamer.spaces.live.com
>
> *******************************************
> | Think outside the box! |
> *******************************************



 
Reply With Quote
 
Gregory A. Beamer
Guest
Posts: n/a
 
      12-01-2009
"Jon Spivey" <(E-Mail Removed)> wrote in
news:(E-Mail Removed):

> Thanks for your reply. I've gone with a streamreader as you suggested,
> tested with the 1st csv and it worked perfectly. Going to test it a
> few more times to be sure but it certainly seems to be the solution.


The stream only has the overhead of the buffer, so it works very well when
working with data that can be streamed. When you are working with files,
you generally work one row at a time, so it is a perfect solution in the
cases where you are simply grabbing records and putting them elsewhere.

Peace and Grace,

--
Gregory A. Beamer (MVP)

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Quicken copy time when processing large files Clement Ow Ruby 0 04-04-2008 10:16 AM
processing large files Andreja Perl Misc 12 04-10-2006 03:13 AM
Backing Up Large Files..Or A Large Amount Of Files Scott D. Weber For Unuathorized Thoughts Inc. Computer Support 1 09-19-2003 07:28 PM
Script "terminates" when processing large numbers of files Scott Stark Perl Misc 1 08-03-2003 09:15 PM
Processing file input for large files[100+ MB] - Performance suggestions? Maxim ASP .Net 0 07-07-2003 05:31 AM



Advertisments