Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Best way to search for a string which has N% in a character class?

Reply
Thread Tools

Best way to search for a string which has N% in a character class?

 
 
Peng Yu
Guest
Posts: n/a
 
      03-02-2012
Hi,

Suppose that I want to search for a substring which has say 50%
letters are in a letter class say [A-D]. Note that there is some
ambiguity at the two ends of the substring. But other than that, this
problem is well defined.

It seems that this problem can not (or can not easily, please let me
know if there is a way) be formulated in regex. Since perl is strong
in processing string, I think that there might be a good way to search
for such strings in perl. Does anybody have some good way in search
this type of substring?

Regards,
Peng
 
Reply With Quote
 
 
 
 
J. Gleixner
Guest
Posts: n/a
 
      03-02-2012
On 03/02/12 10:29, Peng Yu wrote:
> Hi,
>
> Suppose that I want to search for a substring which has say 50%
> letters are in a letter class say [A-D]. Note that there is some
> ambiguity at the two ends of the substring. But other than that, this
> problem is well defined.
>
> It seems that this problem can not (or can not easily, please let me
> know if there is a way) be formulated in regex. Since perl is strong
> in processing string, I think that there might be a good way to search
> for such strings in perl. Does anybody have some good way in search
> this type of substring?


What have you tried?????????????????

Using 'tr' and 'length' would probably help you.

From perldoc perlop:

y/SEARCHLIST/REPLACEMENTLIST/cds
[...]Transliterates all occurrences of the characters found in the
search list with the corresponding character in the replacement list.
It returns the number of characters replaced or deleted.

Using that you can get the number of characters in the class.
e.g. $cnt = tr/[A-D]/[A-D]/;

Using 'length' you can find how many characters are in the string.

perldoc -f length

Divide one by the other, multiply by 100 and you have the percent.
 
Reply With Quote
 
 
 
 
Tim McDaniel
Guest
Posts: n/a
 
      03-02-2012
In article <4f510c5c$0$75670$(E-Mail Removed)>,
J. Gleixner <(E-Mail Removed)> wrote:
>On 03/02/12 10:29, Peng Yu wrote:
>> Suppose that I want to search for a substring which has say 50%
>> letters are in a letter class say [A-D]. Note that there is some
>> ambiguity at the two ends of the substring. But other than that,
>> this problem is well defined.
>>
>> It seems that this problem can not (or can not easily, please let
>> me know if there is a way) be formulated in regex. Since perl is
>> strong in processing string, I think that there might be a good way
>> to search for such strings in perl. Does anybody have some good way
>> in search this type of substring?

>
>What have you tried?????????????????
>
>Using 'tr' and 'length' would probably help you.
>
> From perldoc perlop:
>
> y/SEARCHLIST/REPLACEMENTLIST/cds
> [...]Transliterates all occurrences of the characters found in the
>search list with the corresponding character in the replacement list.
>It returns the number of characters replaced or deleted.
>
>Using that you can get the number of characters in the class.
>e.g. $cnt = tr/[A-D]/[A-D]/;


"man perlop" continues

Note that "tr" does not do regular expression character classes
such as "\d" or "[:lower:]". The <tr> operator is not equivalent
to the tr(1) utility. If you want to map strings between
lower/upper cases, see "lc" in perlfunc and "uc" in perlfunc, and
in general consider using the "s" operator if you need regular
expressions.

The expression
tr/[A-D]/[A-D]/;
will translate [ to [ and ] to ], so they will be included in the
count. A-D works because that's a special case in tr. Also,

If the "/d" modifier is used, the REPLACEMENTLIST is always
interpreted exactly as specified. Otherwise, if the
REPLACEMENTLIST is shorter than the SEARCHLIST, the final
character is replicated till it is long enough. If the
REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This
latter is useful for counting characters in a class or for
squashing character sequences in a class.

So if you really want a range of characters like A thru D,
tr/A-D//
works. If you want all digits, or all alphabetics, or some other
character class, you need to use s/// instead.

--
Tim McDaniel, http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
J. Gleixner
Guest
Posts: n/a
 
      03-02-2012
On 03/02/12 13:06, Tim McDaniel wrote:
> In article<4f510c5c$0$75670$(E-Mail Removed)>,
> J. Gleixner<(E-Mail Removed)> wrote:
>> On 03/02/12 10:29, Peng Yu wrote:
>>> Suppose that I want to search for a substring which has say 50%
>>> letters are in a letter class say [A-D]. Note that there is some
>>> ambiguity at the two ends of the substring. But other than that,
>>> this problem is well defined.
>>>
>>> It seems that this problem can not (or can not easily, please let
>>> me know if there is a way) be formulated in regex. Since perl is
>>> strong in processing string, I think that there might be a good way
>>> to search for such strings in perl. Does anybody have some good way
>>> in search this type of substring?

>>
>> What have you tried?????????????????
>>
>> Using 'tr' and 'length' would probably help you.

[...]
> So if you really want a range of characters like A thru D,
> tr/A-D//
> works. If you want all digits, or all alphabetics, or some other
> character class, you need to use s/// instead.
>


Thanks for the correction.
 
Reply With Quote
 
Peng Yu
Guest
Posts: n/a
 
      03-02-2012
On Mar 2, 12:07*pm, "J. Gleixner" <(E-Mail Removed)>
wrote:
> On 03/02/12 10:29, Peng Yu wrote:
>
> > Hi,

>
> > Suppose that I want to search for a substring which has say 50%
> > letters are in a letter class say [A-D]. Note that there is some
> > ambiguity at the two ends of the substring. But other than that, this
> > problem is well defined.

>
> > It seems that this problem can not (or can not easily, please let me
> > know if there is a way) be formulated in regex. Since perl is strong
> > in processing string, I think that there might be a good way to search
> > for such strings in perl. Does anybody have some good way in search
> > this type of substring?

>
> What have you tried?????????????????
>
> Using 'tr' and 'length' would probably help you.
>
> *From perldoc perlop:
>
> * y/SEARCHLIST/REPLACEMENTLIST/cds
> * * *[...]Transliterates all occurrences of the characters found inthe
> search list with the corresponding character in the replacement list.
> It returns the number of characters replaced or deleted.
>
> Using that you can get the number of characters in the class.
> e.g. $cnt = tr/[A-D]/[A-D]/;
>
> Using 'length' you can find how many characters are in the string.
>
> perldoc -f length
>
> Divide one by the other, multiply by 100 and you have the percent.


I don't think that you understand my question.

Suppose that I have a string $str which the concatenation of $str1,
$str2 and $str3, where both $str1 and $str3 have less than 50% of [A-
D] and $str2 have more than 50% of [A-D].

I need to discovered from $str where $str2 starts and ends. I don't
see how tr and length alone can address this question.
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      03-03-2012
On Fri, 2 Mar 2012 12:53:18 -0800 (PST), Peng Yu <(E-Mail Removed)> wrote:

>On Mar 2, 12:07*pm, "J. Gleixner" <(E-Mail Removed)>
>wrote:
>> On 03/02/12 10:29, Peng Yu wrote:

[snip]
>> Using 'tr' and 'length' would probably help you.
>>

[snip]
>>
>> Divide one by the other, multiply by 100 and you have the percent.

>
>I don't think that you understand my question.
>
>Suppose that I have a string $str which the concatenation of $str1,
>$str2 and $str3, where both $str1 and $str3 have less than 50% of [A-
>D] and $str2 have more than 50% of [A-D].
>
>I need to discovered from $str where $str2 starts and ends. I don't
>see how tr and length alone can address this question.


%50 of what? Without boundry conditions, the type of regex solution
your thinking of is impossible.

The way you state your problem is that [A-D] can exist randomly
in sequence or between [^A-D] characters.

The the only thing you state as known is the total length of random
length strings after cattenation and before the %50 over/under content
of each.

You can slide a regex frame over the final string but ther is not enough
information about boundry conditions to get real information.
There is just more unknowns than there are equations.

For instance,
- if the length of each substring were the same it could be
solved, but this way would not need a regex.
- if the [A-D] were adjacent, still the start/end could not be
determined, only the knowledge that this match of > %50 is in
the substring that needs to be found, but still no begin/end information
about it.

I think it was a nice try though, futile, but nice.

-sln



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert string with control character in caret notation to realcontrol character string. Bart Vandewoestyne C Programming 8 09-25-2012 12:41 PM
DocumentBuilder object is not able to parse a XML String which has a nodename which contains forward slash! Ed Java 6 08-02-2007 03:29 PM
8 bit character string to 16 bit character string Brand Bogard C Programming 8 05-28-2006 05:05 PM
Best Way to Replace string character =?Utf-8?B?UmFlZCBTYXdhbGhh?= ASP .Net 3 05-18-2005 01:00 PM
search within a search within a search - looking for better way...my script times out Abby Lee ASP General 5 08-02-2004 04:01 PM



Advertisments