Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Why is this sub removing newlines??

Reply
Thread Tools

Why is this sub removing newlines??

 
 
John Black
Guest
Posts: n/a
 
      12-05-2013
This sub is just supposed to strip off whitespace (at both the beginning and end of a
string). But its also stripping off newlines at the end of the string! Why would that be?
\s does not include newline, right?

sub trim()
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}

John Black

 
Reply With Quote
 
 
 
 
Rainer Weikusat
Guest
Posts: n/a
 
      12-05-2013
John Black <(E-Mail Removed)> writes:
> This sub is just supposed to strip off whitespace (at both the beginning and end of a
> string). But its also stripping off newlines at the end of the string! Why would that be?
> \s does not include newline, right?
>
> sub trim()
> {
> my $string = shift;
> $string =~ s/^\s+//;
> $string =~ s/\s+$//;
> return $string;
> }


[rw@sable]~#perl -e 'print "\n" =~ /\s/, "\n"'
1
 
Reply With Quote
 
 
 
 
Charlton Wilbur
Guest
Posts: n/a
 
      12-05-2013
>>>>> "JB" == John Black <(E-Mail Removed)> writes:

JB> \s does not include newline, right?

perldoc perlrecharclass:

"\s" matches any single character considered whitespace.

and the following table:

0x00009 CHARACTER TABULATION h s
0x0000a LINE FEED (LF) vs
0x0000b LINE TABULATION v
0x0000c FORM FEED (FF) vs
0x0000d CARRIAGE RETURN (CR) vs
0x00020 SPACE h s
0x00085 NEXT LINE (NEL) vs [1]
0x000a0 NO-BREAK SPACE h s [1]

So yes, newline *is* considered whitespace.

Charlton


--
Charlton Wilbur
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
hymie!
Guest
Posts: n/a
 
      12-05-2013
In our last episode, the evil Dr. Lacto had captured our hero,
John Black <(E-Mail Removed)>, who said:
>\s does not include newline, right?


perldoc perlre

"\s" means the five characters "[ \f\n\r\t]"

--hymie! http://lactose.homelinux.net/~hymie (E-Mail Removed)
-------------------------------------------------------------------------------
 
Reply With Quote
 
Jim Gibson
Guest
Posts: n/a
 
      12-05-2013
In article <(E-Mail Removed)-september.org>,
John Black <(E-Mail Removed)> wrote:

> This sub is just supposed to strip off whitespace (at both the beginning and
> end of a
> string). But its also stripping off newlines at the end of the string! Why
> would that be?
> \s does not include newline, right?


'perldoc perlre' contains these excerpts:

Character Classes and other Special Escapes
....
In addition, Perl defines the following:

Sequence Note Description
....
\s [3] Match a whitespace character
....
[3] See "Backslash sequences" in perlrecharclass for details.
(end)

Following that reference to 'perldoc perlrecharclass' yields:

Whitespace

"\s" matches any single character that is considered whitespace. The
exact set of characters matched by "\s" depends on whether the source
string is in UTF-8 format and the locale or EBCDIC code page that is in
effect. If it's in UTF-8 format, "\s" matches what is considered
whitespace in the Unicode database; the complete list is in the table
below. Otherwise, if there is a locale or EBCDIC code page in effect,
"\s" matches whatever is considered whitespace by the current locale or
EBCDIC code page. Without a locale or EBCDIC code page, "\s" matches
the horizontal tab ("\t"), the newline ("\n"), the form feed ("\f"),
the carriage return ("\r"), and the space. (Note that it doesn't match
the vertical tab, "\cK".) Perhaps the most notable possible surprise
is that "\s" matches a non-breaking space only if the non-breaking
space is in a UTF-8 encoded string or the locale or EBCDIC code page
that is in effect has that character. See "Locale, EBCDIC, Unicode and
UTF-8".
(end)

So, yes, \s does include the newline.

--
Jim Gibson
 
Reply With Quote
 
gamo
Guest
Posts: n/a
 
      12-05-2013
El 05/12/13 19:50, John Black escribió:
> This sub is just supposed to strip off whitespace (at both the beginning and end of a
> string). But its also stripping off newlines at the end of the string! Why would that be?
> \s does not include newline, right?
>
> sub trim()
> {
> my $string = shift;
> $string =~ s/^\s+//;
> $string =~ s/\s+$//;
> return $string;
> }
>
> John Black
>


This is absurd, but maybe do just what you want to do:

:~/test$ cat test.trim
#!/usr/bin/perl -W

$s = " only this:
";
print trim($s);


sub trim{
my $string = shift;
my $space = ' ';
$string =~ s/$space+//;
$string = reverse $string;
$string =~ s/$space+//;
$string = reverse $string;
return $string;
}

:~/test$ perl test.trim
only this:
:~/test$

Best regards

 
Reply With Quote
 
John Black
Guest
Posts: n/a
 
      12-05-2013
In article <(E-Mail Removed)>, (E-Mail Removed) says...
>
> On 05/12/13 18:50, John Black wrote:
> > \s does not include newline, right?

>
> John, I would have agreed with you. Plainly we're both wrong, as the
> follow-ups, not to mention the documentation, have shown, but what is it
> we're (mis)remembering? There's some circumstance in which newline \n
> behaves differently from the other white space characters.


Now that I see that \s includes vertical and horizontal types of characters, it makes more
sense. Up to this point, I've been using \s as a shortcut for spaces or tabs. I'll have to
keep this in mind - I had wanted that trim function to not strip the newlines (and not add
any either if there wasn't one). Should not be hard to workaround. Thanks all.

John Black
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      12-05-2013
Henry Law <(E-Mail Removed)> writes:
> On 05/12/13 18:50, John Black wrote:
>> \s does not include newline, right?

>
> John, I would have agreed with you. Plainly we're both wrong, as the
> follow-ups, not to mention the documentation, have shown, but what is
> it we're (mis)remembering? There's some circumstance in which newline
> \n behaves differently from the other white space characters.


Guess: There's a circumstance where it behaves differently from other
characters, namely, a . won't match \n unless the s-flag is used
together with the match operator.
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      12-05-2013
John Black <(E-Mail Removed)> wrote:
>In article <(E-Mail Removed)>, (E-Mail Removed) says...
>>
>> On 05/12/13 18:50, John Black wrote:
>> > \s does not include newline, right?

>>
>> John, I would have agreed with you. Plainly we're both wrong, as the
>> follow-ups, not to mention the documentation, have shown, but what is it
>> we're (mis)remembering? There's some circumstance in which newline \n
>> behaves differently from the other white space characters.

>
>Now that I see that \s includes vertical and horizontal types of characters, it makes more
>sense. Up to this point,


Try looking at it from a programming language point of view. Most modern
programming languages are free-format, i.e. in the program code a single
space is as good as 20 tabs or as 5 newlines. Therefore there is some
sense in including all of them in \s.

jue
 
Reply With Quote
 
Jim Gibson
Guest
Posts: n/a
 
      12-06-2013
In article <(E-Mail Removed)>, Ben Morrow
<(E-Mail Removed)> wrote:

> Quoth Jim Gibson <(E-Mail Removed)>:
> >
> > Following that reference to 'perldoc perlrecharclass' yields:
> >
> > Whitespace
> >

<snipped>

> That's a pretty old copy of that documentation. Since 5.14 the Unicode
> Bug has been fixed, and character-class matching no longer depends on
> the internal format of the string.
>
> Ben


Thanks. It's from 5.12.4, which is what I am using.

--
Jim Gibson
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why sfml does not play the file inside a function in this python code? cheirasacan@gmail.com Python 6 05-07-2013 08:02 PM
Why does this incorrect CRTP static_cast compile? kfrank29.c@gmail.com C++ 2 04-25-2013 01:38 PM
Death To Sub-Sub-Sub-Directories! Lawrence D'Oliveiro Java 92 05-20-2011 06:50 AM
Recognising Sub-Items and sub-sub items using xslt Ben XML 2 09-19-2007 09:35 AM
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM



Advertisments