Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > tab delimited file processing problem

Reply
Thread Tools

tab delimited file processing problem

 
 
Domenico Discepola
Guest
Posts: n/a
 
      07-17-2003
Hi all. I have constructed a script that uses Win32::OLE to save an Excel
workbook as a tab-delimited text file (TSV file). This works fine. My next
step is to perform formatting on each field per line in the TSV file while
retaining the # of fields. The problem lies with "empty" cells in the 1st
column of the Excel file.

Example Excel file row:
col A's value=<empty>
col B's value = "1"
col C's value = "2"
<end of row>

When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
/^\t12$/ (using regex notation). In other words, I lose the existence of
col A (which I need).

I was thinking of the following solution:
s/^\t/\s\t/, $my_line;
but could there be a 'better' way to handle it?

Any suggestions on how to best solve this problem would be appreciated.

Thanks in advance.
 
Reply With Quote
 
 
 
 
Greg Bacon
Guest
Posts: n/a
 
      07-17-2003
In article <> ,
Domenico Discepola <> wrote:

: [...]
: Example Excel file row:
: col A's value=<empty>
: col B's value = "1"
: col C's value = "2"
: <end of row>
:
: When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
: the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
: /^\t12$/ (using regex notation). In other words, I lose the existence of
: col A (which I need).

How have you lost column A? Consider the example below:

C:\Temp>type try
#! perl

use warnings;
use strict;

use Data:umper;

my $data = "\t12";
my @fields = split /\t/, $data;

print Dumper \@fields;

C:\Temp>perl try
$VAR1 = [
'',
'12'
];

Are you sure there wasn't a TAB between the 1 and the 2? Even so,
you're still happy; note that the first element of @fields is empty:

#! perl

use warnings;
use strict;

use Data:umper;

my $data = "\t1\t2";
my @fields = split /\t/, $data;

print Dumper \@fields;

C:\Temp>perl try
$VAR1 = [
'',
'1',
'2'
];

How were you trying to extract the fields in your TSV file?

Technical side note: what you're calling tab-delimited is really
tab-separated. Using [TAB] to make things stand out, a tab-delimited
record would look like

[TAB]field_1[TAB]field_2[TAB]...[TAB]field_n[TAB]

Tab-*separated*, however, would look like

field_1[TAB]field_2[TAB]...[TAB]field_n

Hope this helps,
Greg
--
It remains true today as it did in fascist Italy, socialist Germany, New
Deal America, and socialist Russia: freedom has no greater opponents than
those who despise and demonize commercial society.
-- Lew Rockwell
 
Reply With Quote
 
 
 
 
Domenico Discepola
Guest
Posts: n/a
 
      07-18-2003

"Greg Bacon" <> wrote in message
news:...
> In article <> ,
> Domenico Discepola <> wrote:
>
> : [...]
> : Example Excel file row:
> : col A's value=<empty>
> : col B's value = "1"
> : col C's value = "2"
> : <end of row>
> :
> : When you use Win32::OLE to "tell" Excel to save this as a TSV file

(using
> : the SaveAs method), a hex-dump of the resultant TSV file reveals row1

as:
> : /^\t12$/ (using regex notation). In other words, I lose the existence

of
> : col A (which I need).
>
> How have you lost column A? Consider the example below:
>
> C:\Temp>type try
> #! perl
>
> use warnings;
> use strict;
>
> use Data:umper;
>
> my $data = "\t12";
> my @fields = split /\t/, $data;
>
> print Dumper \@fields;
>
> C:\Temp>perl try
> $VAR1 = [
> '',
> '12'
> ];
>
> Are you sure there wasn't a TAB between the 1 and the 2? Even so,
> you're still happy; note that the first element of @fields is empty:
>
> #! perl
>
> use warnings;
> use strict;
>
> use Data:umper;
>
> my $data = "\t1\t2";
> my @fields = split /\t/, $data;
>
> print Dumper \@fields;
>
> C:\Temp>perl try
> $VAR1 = [
> '',
> '1',
> '2'
> ];
>
> How were you trying to extract the fields in your TSV file?
>
> Technical side note: what you're calling tab-delimited is really
> tab-separated. Using [TAB] to make things stand out, a tab-delimited
> record would look like
>
> [TAB]field_1[TAB]field_2[TAB]...[TAB]field_n[TAB]
>
> Tab-*separated*, however, would look like
>
> field_1[TAB]field_2[TAB]...[TAB]field_n
>
> Hope this helps,
> Greg
> --
> It remains true today as it did in fascist Italy, socialist Germany, New
> Deal America, and socialist Russia: freedom has no greater opponents than
> those who despise and demonize commercial society.
> -- Lew Rockwell


1st off, you're right - my regex describing my output from Excel was
incorrect - it should have read (as you suggested) /^\t1\t2$/
2nd, I did mean tab separated (instead of tab delimited). 3rd, thanks for
mentioning the Data:umper module, I hadn't heard of that - I'll check it
out. Right off the bat it seems to help diagnosing my problem.






 
Reply With Quote
 
Bart Lateur
Guest
Posts: n/a
 
      07-18-2003
Domenico Discepola wrote:

>Example Excel file row:
>col A's value=<empty>
>col B's value = "1"
>col C's value = "2"
><end of row>
>
>When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
>the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
>/^\t12$/ (using regex notation). In other words, I lose the existence of
>col A (which I need).


No you don't. The first column comes before the first tab.

--
Bart.
 
Reply With Quote
 
Bill Smith
Guest
Posts: n/a
 
      07-18-2003

"Domenico Discepola" <> wrote in message
news: m...
> Hi all. I have constructed a script that uses Win32::OLE to save an

Excel
> workbook as a tab-delimited text file (TSV file). This works fine.

My next
> step is to perform formatting on each field per line in the TSV file

while
> retaining the # of fields. The problem lies with "empty" cells in the

1st
> column of the Excel file.


I remember having similar problems in the distant past. I used native
EXCEL commands to create the TSV file. Details of the file format
depended on the version of EXCEL. This is probably not a problem to
you, but beware!

Bill



 
Reply With Quote
 
Domenico Discepola
Guest
Posts: n/a
 
      07-18-2003
"Bill Smith" <> wrote in message
newsxWRa.31700$ et...
>
> "Domenico Discepola" <> wrote in message
> news: m...
> > Hi all. I have constructed a script that uses Win32::OLE to save an

> Excel
> > workbook as a tab-delimited text file (TSV file). This works fine.

> My next
> > step is to perform formatting on each field per line in the TSV file

> while
> > retaining the # of fields. The problem lies with "empty" cells in the

> 1st
> > column of the Excel file.

>
> I remember having similar problems in the distant past. I used native
> EXCEL commands to create the TSV file. Details of the file format
> depended on the version of EXCEL. This is probably not a problem to
> you, but beware!
>
> Bill
>
>


As an added note, if one particular row in Excel has fewer columns than
columns in surrounding rows, Excel will not pad the remaining columns with
tabs... In other words, you can end up with a TSV file having a different
number of fields:

Example Excel file:
row1: x, y, z
row2: a, b, c, d

resultant TSV file (in regex format):
row1: /^x\ty\tz$/
row2: /^a\tb\tc\td$/

So, as you see, if you are trying to manipulate a TSV file and expect the
same number of fields per line, you must account for it programmatically as
Excel does not do it for you.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl/unix script to convert a fixed width file to a tab delimited file Srikant Perl Misc 1 09-29-2007 02:32 PM
convert non-delimited to delimited RyanL Python 6 08-28-2007 12:06 AM
Problem reading tab delimited file iamreallynewhere@sbcglobal.net Java 5 05-10-2005 05:39 PM
Is there any perl script for converting XML file to tab-delimited file ? gongwuming@hotmail.com Perl Misc 1 04-14-2004 05:15 PM
Problem while downloading a tab delimited file to excel using Servlet Niraj Agarwal Java 3 01-13-2004 07:27 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57