Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Text::ParseWords

Reply
Thread Tools

Text::ParseWords

 
 
ccc31807
Guest
Posts: n/a
 
      03-30-2010
See the script and output below. The problem is that DATA contains a
single quote in the name O'Toole. Is there any way to get this to
work? Or do I have to roll my own?

Or (horrors) do I have to munge DATA to escape every single quote?

Thanks, CC.

---------------------script---------------------
use strict;
use warnings;
use Text:arseWords;

while (<DATA>)
{
chomp;
#my ($id, $first, $last, $csz) = split /,/;
my ($id, $first, $last, $csz) = parse_line(',', 0, $_);
#my ($id, $first, $last, $csz) = quotewords(',', 0, $_);
###my ($id, $first, $last, $csz) = shellwords(',', 1, $_); never works
###my ($id, $first, $last, $csz) = nested_quotewords(',', 1, $_);
never works
print "$id, $first, $last, $csz\n";
}

exit(0);

__DATA__
1234,John,Smith,"New York, NY"
2345,Karl,Tomas,"Boston, MA"
98765,Sean,O'Toole,"Dublin, Ireland"
34567,Lewis,Uberville,"Nashville, TN"

---------------output---------------------------------

D:\PerlLearn\ParseWords>perl test_1.plx
1234, John, Smith, New York, NY
2345, Karl, Tomas, Boston, MA
Use of uninitialized value in concatenation (.) or string at
test_1.plx line 13,
<DATA> line 3.
Use of uninitialized value in concatenation (.) or string at
test_1.plx line 13,
<DATA> line 3.
Use of uninitialized value in concatenation (.) or string at
test_1.plx line 13,
<DATA> line 3.
Use of uninitialized value in concatenation (.) or string at
test_1.plx line 13,
<DATA> line 3.
, , ,
34567, Lewis, Uberville, Nashville, TN
 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      03-30-2010
ccc31807 <(E-Mail Removed)> wrote:
>See the script and output below. The problem is that DATA contains a
>single quote in the name O'Toole. Is there any way to get this to
>work? Or do I have to roll my own?
>
>Or (horrors) do I have to munge DATA to escape every single quote?
>
>Thanks, CC.
>
>---------------------script---------------------
>use Text:arseWords;

[...]
>
>__DATA__
>1234,John,Smith,"New York, NY"
>2345,Karl,Tomas,"Boston, MA"
>98765,Sean,O'Toole,"Dublin, Ireland"
>34567,Lewis,Uberville,"Nashville, TN"


This looks like a standard CSV format. Is there a specific reason why
you are not using one of the existing CSV modules to parse this data?

jue
 
Reply With Quote
 
 
 
 
ccc31807
Guest
Posts: n/a
 
      03-30-2010
On Mar 30, 10:48*am, Jürgen Exner <(E-Mail Removed)> wrote:
> This looks like a standard CSV format. Is there a specific reason why
> you are not using one of the existing CSV modules to parse this data?


This runs on a server that isn't mine. I provided the script, and the
user who runs the script noticed the error (and it is an error). I am
constrained by the Perl distribution on this particular machine, which
is ActiveState 5.8.something which includes Text:arseWords.

In desperation I had done what Tad suggested, substituting the
apostrophe for \\', but I thought it was a hack. It worked well
enough, but I still don't like it, which is why I posted this morning.
At least someone else thinks it's a viable solution, which is a small
comfort.

Thanks for the suggestions, Tad, Ben, and jue.

CC.
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      03-30-2010
ccc31807 <(E-Mail Removed)> wrote:
>On Mar 30, 10:48*am, Jürgen Exner <(E-Mail Removed)> wrote:
>> This looks like a standard CSV format. Is there a specific reason why
>> you are not using one of the existing CSV modules to parse this data?

>
>This runs on a server that isn't mine. I provided the script, and the
>user who runs the script noticed the error (and it is an error). I am
>constrained by the Perl distribution on this particular machine, which
>is ActiveState 5.8.something which includes Text:arseWords.


Then I would (in this order)
- try (with the help of that user) to persuade the admin of that machine
to install the module
- have that user install the module in his user space
- ship the module together with my script to be copied into the same
directory and loaded from there
- include (at last the relevant portion of) the module verbatim as
source code in my script

jue
 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      03-30-2010
On Mar 30, 12:02*pm, Jürgen Exner <(E-Mail Removed)> wrote:
> Then I would (in this order)
> - try (with the help of that user) to persuade the admin of that machine
> to install the module


I have discovered that, to the usual Windows admin, the command 'ppm'
is as terrifying as the command 'brick_server'.

> - have that user install the module in his user space


I don't think that the user has privileges to install software, but
this is a good idea.

> - ship the module together with my script to be copied into the same
> directory and loaded from there


Good idea.

> - include (at last the relevant portion of) the module verbatim as
> source code in my script


Also a good idea. I often try to do stuff the hard way, mostly as a
learning exercise, and I have been known to shamelessly copy code from
other people, including PM shipped with Perl. I've wondered about the
ethics of this, but my conscience is eased by the facts that (1) I
don't claim authorship, (2) I don't make commercial use of the
software, and (3) the source is freely available for appropriate uses.
Unfortunately, I find some of the code is above my present ability to
understand (which is why I do this as a learning exercise, and yes, I
do learn from it.)

CC.
 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      03-30-2010
ccc31807 <(E-Mail Removed)> writes:

> On Mar 30, 12:02Â*pm, Jürgen Exner <(E-Mail Removed)> wrote:


[ Missing Perl module ]

>> - have that user install the module in his user space

>
> I don't think that the user has privileges to install software, but
> this is a good idea.


A user can *always* install a module in a directory he has access to.

>> - ship the module together with my script to be copied into the same
>> directory and loaded from there

>
> Good idea.
>
>> - include (at last the relevant portion of) the module verbatim as
>> source code in my script

>
> Also a good idea.


No. It's and option, but there is a reason why it's listed last.

> I often try to do stuff the hard way, mostly as a
> learning exercise, and I have been known to shamelessly copy code from
> other people, including PM shipped with Perl. I've wondered about the
> ethics of this, but my conscience is eased by the facts that (1) I
> don't claim authorship, (2) I don't make commercial use of the
> software, and (3) the source is freely available for appropriate uses.
> Unfortunately, I find some of the code is above my present ability to
> understand (which is why I do this as a learning exercise, and yes, I
> do learn from it.)


It's called cargo cult coding, at least that's how it sounds. While it's
not bad to copy a piece of code verbatim out of a context that you can't
use directly at least make sure you understand what it's doing.

--
John Bokma j3b

Hacking & Hiking in Mexico - http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      03-30-2010
On Tue, 30 Mar 2010 09:47:24 -0500, Tad McClellan <(E-Mail Removed)> wrote:

>ccc31807 <(E-Mail Removed)> wrote:
>> See the script and output below. The problem is that DATA contains a
>> single quote in the name O'Toole. Is there any way to get this to
>> work? Or do I have to roll my own?
>>
>> Or (horrors) do I have to munge DATA to escape every single quote?

>
>
>Where is the horror in that?
>
>
>> ---------------------script---------------------
>> use strict;
>> use warnings;
>> use Text:arseWords;
>>
>> while (<DATA>)
>> {
>> chomp;
>> #my ($id, $first, $last, $csz) = split /,/;

>
>
> s/'/\\'/g; # that doesn't seem horrible to me...
> # unless you have single-quoted 'strings' in DATA

^^^^
98765,Sean,O'Toole,"O'Dublin, Ireland"

Thats a big restriction there, hardly a workaround solution.

Its too bad though, with a little extra work,
they could have got it right.

-sln

========================
Output:

c:\temp>perl parse_line.pl

1234, John, Smith, "New York, NY"
2345, Karl, Tomas, "Boston, MA"
98765, Sean, O'Toole, "Dublin, Ireland"
34567, Lewis, Uberville, "Nashville, TN"

c:\temp>

## parse_line.pl
##
use strict;
use warnings;

my $PERL_SINGLE_QUOTE = 0;

use strict;
use warnings;
#use Text:arseWords;

print "\n";
while (<DATA>)
{
chomp;
my ($id, $first, $last, $csz) = parse_line(',', 1, $_);
print "$id, $first, $last, $csz\n";
}

exit(0);


## -----------------------------------------
## sub parse_line()
## Copyright @ 4/30/2010, by sln
## All rights reserved
## -----------------------------------------
sub parse_line {
my($delimiter, $keep, $line) = @_;
my($word, @pieces);

no warnings 'uninitialized'; # we will be testing undef strings

while (length($line)) {
# This pattern is optimised to be stack conservative on older perls.
# Do not refactor without being careful and testing it on very long strings.
# See Perl bug #42980 for an example of a stack busting input.
$line =~ s/^
(?:
(?:
# double quoted string
(") # $quote
((?>[^\\"]*(?:\\.[^\\"]*)*))" # $quoted
| # --OR--
# singe quoted string
(') # $quote
((?>[^\\']*(?:\\.[^\\']*)*))' # $quoted
| # --OR--
# unquoted string
( # $unquoted
(?:\\.|[^\\"'])*?
)
# followed by
( # $delim
\Z(?!\n) # EOL
| # --OR--
(?-x:$delimiter) # delimiter
| # --OR--
(?!^)(?=["']) # a quote
)
)
| # --OR--
(['"]) # $unquoted quote
)
//xs or return; # extended layout
my ($quote, $quoted, $unquoted, $delim) = (($1 ? ($1,$2) : ($3,$4)), ($5 ? $5 : $7), $6);


return() unless( defined($quote) || length($unquoted) || length($delim));

if ($keep) {
$quoted = "$quote$quoted$quote";
}
else {
$unquoted =~ s/\\(.)/$1/sg;
if (defined $quote) {
$quoted =~ s/\\(.)/$1/sg if ($quote eq '"');
$quoted =~ s/\\([\\'])/$1/g if ( $PERL_SINGLE_QUOTE && $quote eq "'");
}
}
$word .= substr($line, 0, 0); # leave results tainted
$word .= defined $quote ? $quoted : $unquoted;

if (length($delim)) {
push(@pieces, $word);
push(@pieces, $delim) if ($keep eq 'delimiters');
undef $word;
}
if (!length($line)) {
push(@pieces, $word);
}
}
return(@pieces);
}

__DATA__
1234,John,Smith,"New York, NY"
2345,Karl,Tomas,"Boston, MA"
98765,Sean,O'Toole,"Dublin, Ireland"
34567,Lewis,Uberville,"Nashville, TN"

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments