Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > parse HTML

Reply
Thread Tools

parse HTML

 
 
VitaminB
Guest
Posts: n/a
 
      04-25-2006
Hello,

I want to parse a HTML document to get all URLs of the frames in a
frameset. I get a "NullPointer Exception" in the System.out.println...

Thanks a lot for you help.

Regards,
Marcus


##################
Java Code:
##################

URL urlobj = new URL(str);

HttpURLConnection uc = null;
uc = (HttpURLConnection)urlobj.openConnection();
uc.setUseCaches(false);
DataInputStream is = new DataInputStream(uc.getInputStream());

HTMLEditorKit hKit = new HTMLEditorKit();
HTMLDocument hDoc = new HTMLDocument();
hKit.read(is, hDoc, 0);
HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);

AttributeSet attSet = it.getAttributes();
String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
System.out.println(s);





##################
Beispiel HTML-Seite:
##################

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN">
<html>
<head>

<script language="JavaScript" type="text/javascript">
<!--
self._domino_name = "_Main";
// -->
</script>
</head>

<frameset cols="45%,55%">

<frame
src="/Test/HET/PerformanceTestDB.nsf/ContentDeliveryMeasurement?OpenForm">


<frameset rows="1*,1*">

<frame src="/Test/HET/PerformanceTestDB.nsf/DocsInserted?OpenView">

<frame name="docPreviewFrame"
src="/Test/HET/PerformanceTestDB.nsf/select?OpenForm">
</frameset>
</frameset>
</html>

 
Reply With Quote
 
 
 
 
Amfur Kilnem
Guest
Posts: n/a
 
      04-25-2006

"VitaminB" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...
> AttributeSet attSet = it.getAttributes();
> String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
> System.out.println(s);


attSet.getAttribute must've returned null.



 
Reply With Quote
 
 
 
 
VitaminB
Guest
Posts: n/a
 
      04-25-2006
Why?

 
Reply With Quote
 
Oliver Wong
Guest
Posts: n/a
 
      04-25-2006
"VitaminB" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...
>
> I want to parse a HTML document to get all URLs of the frames in a
> frameset. I get a "NullPointer Exception" in the System.out.println...

[...]
>
> ##################
> Java Code:
> ##################
>
> URL urlobj = new URL(str);
>
> HttpURLConnection uc = null;
> uc = (HttpURLConnection)urlobj.openConnection();
> uc.setUseCaches(false);
> DataInputStream is = new DataInputStream(uc.getInputStream());
>
> HTMLEditorKit hKit = new HTMLEditorKit();
> HTMLDocument hDoc = new HTMLDocument();
> hKit.read(is, hDoc, 0);
> HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);
>
> AttributeSet attSet = it.getAttributes();
> String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
> System.out.println(s);


I don't see how you could have gotten an NPE from the
System.out.printlnt statement. Are you sure you didn't get it from the line
above, or possibly somewhere else? See the section titled "If you get an
error message, repeat it exactly." at
http://riters.com/JINX/index.cgi/Sug...n_20Newsgroups

- Oliver

 
Reply With Quote
 
VitaminB
Guest
Posts: n/a
 
      04-25-2006
OK, now I worked on my code and get anothere exception. But I similary
don't know why.

Here is the failure:
javax.swing.text.ChangedCharSetException
at
javax.swing.text.html.parser.DocumentParser.handle EmptyTag(DocumentParser.java:19
at javax.swing.text.html.parser.Parser.startTag(Parse r.java:401)
at javax.swing.text.html.parser.Parser.parseTag(Parse r.java:1875)
at javax.swing.text.html.parser.Parser.parseContent(P arser.java:1910)
at javax.swing.text.html.parser.Parser.parse(Parser.j ava:2076)
at
javax.swing.text.html.parser.DocumentParser.parse( DocumentParser.java:135)
at
javax.swing.text.html.parser.ParserDelegator.parse (ParserDelegator.java:107)
at javax.swing.text.html.HTMLEditorKit.read(HTMLEdito rKit.java:262)
at javax.swing.text.DefaultEditorKit.read(DefaultEdit orKit.java:163)
at Stress.urlRequest(Stress.java:76)
at Stress.run(Stress.java:40)



Here are the code:

public long[] urlRequest(String str) {
Cal starttime = new Cal();
long[] read = new long[2];
try {
int c = 0;
byte[] rc = new byte[1024];
URL urlobj = new URL(str);


HTTPRequest request = new HTTPRequest(str, user, pass);
DataInputStream is = new DataInputStream( request.get() );

HTMLEditorKit hKit = new HTMLEditorKit();
HTMLDocument hDoc = new HTMLDocument();
hKit.read(is, hDoc, 0);

HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);
it.next();
AttributeSet attSet = it.getAttributes();
String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
System.out.println(s);


//System.out.println(attSet.getAttributeCount());



while (( c = is.read(rc)) != -1 ) {
read[0] = read[0] + c;
}
Cal endtime = new Cal();
read[1] = endtime.getTimeInMillis() -
starttime.getTimeInMillis();
return read;
}
catch ( Exception e ) {
e.printStackTrace();
}
return read;
}

}

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
optparse: parse v. parse! ?? 7stud -- Ruby 3 02-20-2008 05:20 AM
How to parse a string like C program parse the command line string? linzhenhua1205@163.com C Programming 19 03-15-2005 07:41 PM
The best way to parse an html file? =?Utf-8?B?U3RlcGhhbmU=?= ASP .Net 1 10-09-2004 02:04 PM
parse inside of html tags jjliu Perl 3 10-11-2003 11:34 AM
[TABLE NOT SHOWN] problem with HTML::Parse Mitchua Perl 3 07-13-2003 11:38 PM



Advertisments