![]() |
How to get length of string? length() problems
Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL); # remove all HTML TAGs...blah blah blah @sentences = split(/\./, $html)); then I'm trying to determine the number of characters in the sentence. However, although when I print the sentences they look fine, when I use length($sentence[0]) I get values in the hundreds for small sentences. Most documentation I found said "length() returns the number of chars" however, some said "length() returns the number of bytes". To get the number of chars in this case, can I just divide by 8 or something? Thanks for your help. Mitchua |
Re: How to get length of string? length() problems
"Mitchua" <mitchua@yahoo.com> wrote in message
news:V5XQa.53675$1aB1.35315@news02.bloor.is.net.ca ble.rogers.com... > Simplified a bit, I'm parsing HTML documents to get sentences e.g. > my $html = get($URL); > # remove all HTML TAGs...blah blah blah > @sentences = split(/\./, $html)); > then I'm trying to determine the number of characters in the sentence. > However, although when I print the sentences they look fine, when I use > length($sentence[0]) I get values in the hundreds for small sentences. Most > documentation I found said "length() returns the number of chars" however, > some said "length() returns the number of bytes". To get the number of > chars in this case, can I just divide by 8 or something? > Would something like sprintf("%20s", $sentence[0]) work to crop the sentence to 20 characters? --Mitchua |
Re: How to get length of string? length() problems
Mitchua wrote:
> "Mitchua" <mitchua@yahoo.com> wrote in message > news:V5XQa.53675$1aB1.35315@news02.bloor.is.net.ca ble.rogers.com... >> Simplified a bit, I'm parsing HTML documents to get sentences e.g. >> my $html = get($URL); >> # remove all HTML TAGs...blah blah blah >> @sentences = split(/\./, $html)); >> then I'm trying to determine the number of characters in the sentence. >> However, although when I print the sentences they look fine, when I use >> length($sentence[0]) I get values in the hundreds for small sentences. > Most >> documentation I found said "length() returns the number of chars" >> however, >> some said "length() returns the number of bytes". To get the number of >> chars in this case, can I just divide by 8 or something? >> > > Would something like sprintf("%20s", $sentence[0]) work to crop the > sentence to 20 characters? > > --Mitchua perldoc -f length: "length EXPR length Returns the length in characters of the value of EXPR..." BUT length() returns the length in bytes when the bytes pragma is used, eg: $x = chr(400); print "Length is ", length $x, "\n"; # "Length is 1" printf "Contents are %vd\n", $x; # "Contents are 400" { use bytes; print "Length is ", length $x, "\n"; # "Length is 2" printf "Contents are %vd\n", $x; # "Contents are 198.144" } perldoc bytes for more info. Cheers, -- Rich scriptyrich@yahoo.co.uk |
Re: How to get length of string? length() problems
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 "Mitchua" <mitchua@yahoo.com> wrote in news:V5XQa.53675$1aB1.35315@news02.bloor.is.net.ca ble.rogers.com: > Simplified a bit, I'm parsing HTML documents to get sentences e.g. > my $html = get($URL); > # remove all HTML TAGs...blah blah blah > @sentences = split(/\./, $html)); > then I'm trying to determine the number of characters in the sentence. > However, although when I print the sentences they look fine, when I > use length($sentence[0]) I get values in the hundreds for small > sentences. Most documentation I found said "length() returns the > number of chars" however, some said "length() returns the number of > bytes". To get the number of chars in this case, can I just divide by > 8 or something? Only if your characters are 8 bytes wide! Do you have an example of input data that exhibits this length() discrepancy? Can you include the output of something like: print "[[[$string]]] ", length($string), "\n"; - -- Eric $_ = reverse sort qw p ekca lre Js reh ts p, $/.r, map $_.$", qw e p h tona e; print -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com> iQA/AwUBPxTBu2PeouIeTNHoEQIJcgCeNrC1lDNYKBtdGsL5Bw0bxd IM2BMAnRAr vTZutckih5KT81pj/63k5mDZ =1LLa -----END PGP SIGNATURE----- |
Re: How to get length of string? length() problems
"Eric J. Roode" <REMOVEsdnCAPS@comcast.net> wrote in message news:Xns93B9EB73EF613sdn.comcast@206.127.4.25... > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > "Mitchua" <mitchua@yahoo.com> wrote in > news:V5XQa.53675$1aB1.35315@news02.bloor.is.net.ca ble.rogers.com: > > > Simplified a bit, I'm parsing HTML documents to get sentences e.g. > > my $html = get($URL); > > # remove all HTML TAGs...blah blah blah > > @sentences = split(/\./, $html)); > > then I'm trying to determine the number of characters in the sentence. > > However, although when I print the sentences they look fine, when I > > use length($sentence[0]) I get values in the hundreds for small > > sentences. Most documentation I found said "length() returns the > > number of chars" however, some said "length() returns the number of > > bytes". To get the number of chars in this case, can I just divide by > > 8 or something? > > Only if your characters are 8 bytes wide! > > Do you have an example of input data that exhibits this length() > discrepancy? Checkout Rich's reply. My problem was that I was using length($sentence) instead of length $sentence. Once I changed that, it was all good. Thanks for the reply. Mitchua |
Re: How to get length of string? length() problems
"Mitchua" <mitchua@yahoo.com> wrote in
news:7XkRa.89580$sI91.77734@news04.bloor.is.net.ca ble.rogers.com: > Checkout Rich's reply. My problem was that I was using > length($sentence) instead of length $sentence. Once I changed that, > it was all good. Thanks for the reply. Hmmm. I fail to see how that could possibly make a difference. But hey, whatever works is good. -- Eric $_ = reverse sort qw p ekca lre Js reh ts p, $/.r, map $_.$", qw e p h tona e; print |
| All times are GMT. The time now is 02:12 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.