Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Probably a dumb s/// question.

Reply
Thread Tools

Probably a dumb s/// question.

 
 
Tad McClellan
Guest
Posts: n/a
 
      03-17-2005
Ted Zlatanov <(E-Mail Removed)> wrote:
> On 16 Mar 2005, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
>
>> my $string = 'the quick brown fox jumped over the lazy dogs.';
>> my $String = join ' ', map {ucfirst lc} split ' ', $string;
>>
>> That forces your string to lower case first then capitalizes the first
>> letter. It won't preserve whitespace though.

>
> It's probably better to do something like this:
>
> perl -p -e's/(\w+)/ucfirst($1)/eg'



Try it on this string:

it's a wonderful life!


--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
 
 
 
Ted Zlatanov
Guest
Posts: n/a
 
      03-17-2005
On Wed, 16 Mar 2005, (E-Mail Removed) wrote:

Ted Zlatanov <(E-Mail Removed)> wrote:

>> It's probably better to do something like this:
>>
>> perl -p -e's/(\w+)/ucfirst($1)/eg'

>
> Try it on this string:
>
> it's a wonderful life!


Like I said, Text::Capitalize is better than any of the other
solutions, unless the OP really did mean the original requirements.
Natural language-such as it is- can,and often IS very tricky!

Ted
 
Reply With Quote
 
 
 
 
Ted Zlatanov
Guest
Posts: n/a
 
      03-17-2005
On Wed, 16 Mar 2005, (E-Mail Removed) wrote:

> \u _is_ ucfirst(), it respects locales too.


I see this in perldoc perlre:

" \u uppercase next char (think vi)
....
If "use locale" is in effect, the case map used by "\l", "\L",
"\u" and "\U" is taken from the current locale. See
perllocale."

This is not, however, the same as ucfirst. It's
uc($char1) . $rest
not
ucfirst($char1 . $rest)

In fact, ucfirst() should do a Titlecase, as it's called in Unicode,
although I don't know if the internal Perl implementation does exactly
that. Titlecase is not the uppercasing of the first character,
although in English it usually works that way.

Why is this important? If you look at the Unicode standard, there is
a good example:
(from http://www.unicode.org/reports/tr21/tr21-3.html)

"Characters may also have different case mappings, depending on the context.

For example, 03A3 capital sigma lowercases to 03C3 small sigma if it
is followed by another letter, but lowercases to 03C2 small final
sigma if it is not."

and later

"Converting to Titlecase

Map each character to its titlecase or lowercase. If the preceeding
letter is cased, chose the lowercase mapping; otherwise chose the
titlecase mapping (in most cases, this will be the same as the
uppercase, but not always)."

The Unicode standard feels strongly enough about titlecasing to define
different terms for it and to explicitly warn about just uppercasing
the first character. That's why I brought it up. Sorry I didn't
expand on it earlier.

Sorry to get into technicalities like this, but I feel the point is
important for anyone interested in Unicode programming. Often, things
that seem obvious in English are radically different in other
languages and writing systems.

> For this application, we don't need to ensure that it is a letter at all:
>
> s/(^|\s)(.)/$1\u$2/g;


Yes, noted by others too. As I said, Text::Capitalize is probably
what the OP wants. My point was just that [a-z] is a good sign of trouble.

Ted
 
Reply With Quote
 
Joe Smith
Guest
Posts: n/a
 
      03-28-2005
Mark Healey wrote:

> the quick brown fox jumped over the lazy dogs.
> to
> The Quick Brown Fox Jumped Over the Lazy Dogs.


After doing the Titlecase conversions as others have mentioned,
you then have to do some postprocessing to undo some of it.

To lowercase 'the' when it is not at the beginning of the line:
s/(\s(of|at|the|and)\b/\l$1/g;
But that does not do the right thing if there is an end-of-sentence
period (as opposed to an end-of-abbreviation period) preceding 'the'.

-Joe
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
probably a dumb question, but I cant figure it out.... Justin Java 2 10-09-2006 12:40 AM
Probably A really Dumb Enveloping Printing Question Rose Bush Computer Support 7 03-31-2005 09:38 PM
Probably a dumb s/// question. Mark Healey Perl 2 03-16-2005 04:51 PM
A Really Dumb (Probably) OE Question Pop Aye Computer Support 4 11-06-2004 05:27 AM
Dumb, dumb dumb Qestion David Napierkowski Digital Photography 6 10-31-2004 11:14 PM



Advertisments