Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Changing seperator in large CSV files?

Reply
Thread Tools

Changing seperator in large CSV files?

 
 
dumbledad@gmail.com
Guest
Posts: n/a
 
      01-18-2006
Hi All,

I have a series of large CSV files (the largest is several GB) that I
need to load into SQL. To perform the load I'm hoping to use the SQL
command BULK INSERT. But I have a problem, I have lines in the CSV file
like the second one here:

12,some text,34
56,"some text, with a comma in",78

The BULK INSERT command fails to understand the significance of the
quotation marks and treats the second comma in the second line as a
delimiter.

Thus I would like to replace each occurrence of a comma in the files,
when and only when used as a separator, with a different separator ("/"
for example). Does anyone have a Perl script they can share with me
that will do this, i.e. that will find and replace commas which are not
enclosed in text within quote marks?

Cheers,

Tim.

 
Reply With Quote
 
 
 
 
usenet@DavidFilmer.com
Guest
Posts: n/a
 
      01-18-2006
wrote:
> Thus I would like to replace each occurrence of a comma in the files...


You might benefit from a recent thread in this group:

http://tinyurl.com/9ut7w

The thread discussed an aspect of Damian Conway's (outstanding) book,
"Perl Best Practices," but it just happens to also address your
question.

 
Reply With Quote
 
 
 
 
Dr.Ruud
Guest
Posts: n/a
 
      01-18-2006
schreef:

> The BULK INSERT command fails to understand the significance of the
> quotation marks and treats the second comma in the second line as a
> delimiter.


Which SQL-environment?

http://dev.mysql.com/doc/refman/5.0/en/load-data.html

I prefer Tab as the field delimiter.

--
Affijn, Ruud

"Gewoon is een tijger."
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      01-18-2006
"" <> wrote:
> Hi All,
>
> I have a series of large CSV files (the largest is several GB) that I
> need to load into SQL.


SQL is a language. You cannot insert data into a language. What is the
SQL-processing database server you are trying to use?


> To perform the load I'm hoping to use the SQL
> command BULK INSERT. But I have a problem, I have lines in the CSV file
> like the second one here:
>
> 12,some text,34
> 56,"some text, with a comma in",78
>
> The BULK INSERT command fails to understand the significance of the
> quotation marks and treats the second comma in the second line as a
> delimiter.


I don't know "BULK INSERT", but I do know Oracles sqlldr and MySQL's
"LOAD DATA" or mysqlimport. Both of them allow you specify a "Fields
enclosed by" character, such they can take your large data files just as
they are. I would be very surprised if "BULK INSERT" is not similarly
configurable.

> Thus I would like to replace each occurrence of a comma in the files,
> when and only when used as a separator, with a different separator ("/"
> for example). Does anyone have a Perl script they can share with me
> that will do this, i.e. that will find and replace commas which are not
> enclosed in text within quote marks?


Text::CSV_XS

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      01-18-2006
<> wrote:


> i.e. that will find and replace commas which are not
> enclosed in text within quote marks?



Your Question is Asked Frequently, though it's not as easy to
find as it ought to be:

How can I split a [character] delimited string except when inside
[character]?


--
Tad McClellan SGML consulting
Perl programming
Fort Worth, Texas
 
Reply With Quote
 
dumbledad@gmail.com
Guest
Posts: n/a
 
      01-18-2006
Hi All,

Thanks all for your help, that certainly gives me scripts to try, a
library to pick over, new search terms to try, and I've ordered the
recommended book by Conway.

Xho asked:
>>> I don't know "BULK INSERT", but I do know Oracles sqlldr and MySQL's "LOAD DATA" or mysqlimport. Both of them allow you specify a "Fields enclosed by" character, such they can take your large data files just as they are. I would be very surprised if "BULK INSERT" is not similarly configurable <<<


I'm using Microsoft's SQL Server 2005. It has a tool for doing this
called SSIS, but because of problems with my installation I wanted to
get back to basics and do it in SQL using BULK INSERT. Unfortunately it
looks like the tantalisingly useful "fields enclosed by" character that
Xho refers to is not configurable:
http://msdn.microsoft.com/library/de...ba-bz_4fec.asp
( http://tinyurl.com/55ysl )

Cheers,

Tim.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ASP.NET 2.0 menu seperator lstuyck73@gmail.com ASP .Net 3 07-16-2008 04:24 PM
Thousand Seperator ewanfisher@gmail.com Python 5 03-14-2008 06:20 PM
Gridview Seperator Bar Based on Data Values xmldig ASP .Net 0 11-30-2005 07:07 PM
Seperator............. grasshopper Firefox 6 01-13-2005 02:46 PM
input record seperator (equivalent of "$|" of perl) les_ander@yahoo.com Python 35 01-04-2005 01:38 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57