Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > complex regex

Reply
Thread Tools

complex regex

 
 
carlbernardi@gmail.com
Guest
Posts: n/a
 
      10-10-2007
HI,

I am new to java.util.regex package which I am using to detect each
time the javascript tag occurs in an html file and delete it. I tried
using the following code to find examples such as the ones below but
instead it finds the first occurrence of "<" and the last occurrence
of ">" which is not what I am looking for.

<script>
<script src="script.js">
</script>

String mat = "<html><script><p><font></script>";
String pat = "<*[\\x00-\\x7f]*jscript*[a-z0-9]*>";
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(mat);
while(matcher.find()){
System.out.println("Match: "+matcher.group()+"
Start:"+matcher.start()+" End:"+ matcher.end());
}

output:
Match: <html><script><p><font><script> Start:0 End:39

i would be looking for an out put of:
Match: <script> Start:6 End:18
Match: <script> Start:27 End:18

Appreciate any input,

Carl

 
Reply With Quote
 
 
 
 
carlbernardi@gmail.com
Guest
Posts: n/a
 
      10-10-2007
Funny, I think I found my answer. This way seamed to do the trick. Is
it possible to do the same thing with just Matcher.replaceAll()?


String mat = "(<html><script><p><font><script>";
String pat = "<[^>]*>";
StringBuffer sb = new StringBuffer(mat);
StringBuffer sb2 = new StringBuffer(mat);
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(mat);
int start,end = 0;
int newStart = 0;
while(matcher.find()){
start = matcher.start();
end = matcher.end();
System.out.println("old string ---
"+sb.substring(matcher.start(),matcher.end()).toSt ring());
if(sb.substring(start,end).indexOf("script") > -1){
System.out.println("new string --- "+sb2.delete(start-
newStart,end-newStart).toString());
newStart = sb.length() - sb2.length();
}
System.out.println(start+" "+end+" "+newStart);
}


On Oct 9, 9:59 pm, "carlberna...@gmail.com" <carlberna...@gmail.com>
wrote:
> HI,
>
> I am new to java.util.regex package which I am using to detect each
> time the javascript tag occurs in an html file and delete it. I tried
> using the following code to find examples such as the ones below but
> instead it finds the first occurrence of "<" and the last occurrence
> of ">" which is not what I am looking for.
>
> <script>
> <script src="script.js">
> </script>
>
> String mat = "<html><script><p><font></script>";
> String pat = "<*[\\x00-\\x7f]*jscript*[a-z0-9]*>";
> Pattern pattern = Pattern.compile(pat);
> Matcher matcher = pattern.matcher(mat);
> while(matcher.find()){
> System.out.println("Match: "+matcher.group()+"
> Start:"+matcher.start()+" End:"+ matcher.end());
> }
>
> output:
> Match: <html><script><p><font><script> Start:0 End:39
>
> i would be looking for an out put of:
> Match: <script> Start:6 End:18
> Match: <script> Start:27 End:18
>
> Appreciate any input,
>
> Carl



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How complex is complex? Kottiyath Python 22 03-28-2009 10:11 PM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
wsdl2java: method parameter a complex type that extends another complex type Robert Mark Bram Java 0 02-04-2007 10:06 AM
[XML Schema] Content type of complex type definition with complex content Stanimir Stamenkov XML 2 10-25-2005 10:16 AM
For expert on complex loops (reposted) - complex looping problem news.amnet.net.au Java 1 04-13-2004 07:10 AM



Advertisments