Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > ASP General > Parsing text file with ASP

Reply
Thread Tools

Parsing text file with ASP

 
 
SROSeaner
Guest
Posts: n/a
 
      09-26-2004
I have a text file that is the result of using XMLHTTP object to pull back a
page of search results from a search engine.

So I have the entire results page in HTML, and want to break out each hit
result from the text file as a unique item and do what I want with each hit
result.

Is there any suggested algorithms or any other techniques I could be
directed to?
 
Reply With Quote
 
 
 
 
Ray Costanzo [MVP]
Guest
Posts: n/a
 
      09-26-2004
What exactly is a "hit result?" As far as what you want to do, it'd all
depend on what the html looks like and how consistent it remains. Do you
have control over this remote source? Or is it some other site that can
change on any given day without any forewarning?

Ray at home

"SROSeaner" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
>I have a text file that is the result of using XMLHTTP object to pull back
>a
> page of search results from a search engine.
>
> So I have the entire results page in HTML, and want to break out each hit
> result from the text file as a unique item and do what I want with each
> hit
> result.
>
> Is there any suggested algorithms or any other techniques I could be
> directed to?



 
Reply With Quote
 
 
 
 
SROSeaner
Guest
Posts: n/a
 
      09-28-2004
Actually, all I really need to do is pull out any text in the HTML text that
is a web site address, so, in the form of http://www._____.__ or starting
with www.

I think I know how to find that, by using InStr and passing it http: (for
example) as the text to look for, but, that will only give me the starting
point of the address correct?

"Ray Costanzo [MVP]" wrote:

> What exactly is a "hit result?" As far as what you want to do, it'd all
> depend on what the html looks like and how consistent it remains. Do you
> have control over this remote source? Or is it some other site that can
> change on any given day without any forewarning?
>
> Ray at home
>
> "SROSeaner" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
> >I have a text file that is the result of using XMLHTTP object to pull back
> >a
> > page of search results from a search engine.
> >
> > So I have the entire results page in HTML, and want to break out each hit
> > result from the text file as a unique item and do what I want with each
> > hit
> > result.
> >
> > Is there any suggested algorithms or any other techniques I could be
> > directed to?

>
>
>

 
Reply With Quote
 
Ray Costanzo [MVP]
Guest
Posts: n/a
 
      09-28-2004
Yes, that'd give you the starting point. The best you can do is have your
code make an educated guess about things when you have no idea what kind of
data will be thrown at it.

If the string contains:

<a href="http://something.com">click me</a>, should it be ignored because
there's no WWW? Should your code assume that as soon as it finds a ", then
then that is the end of the domain? What about a carriage return? What
about a < character? What about when it's in a sentence in the document,
eg.

Most Web site addresses start with http://www.

Should that be found?

There are lots of variables to deal with, and all you can really do is hope
for accuracy.

Ray at work


"SROSeaner" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Actually, all I really need to do is pull out any text in the HTML text
> that
> is a web site address, so, in the form of http://www._____.__ or starting
> with www.
>
> I think I know how to find that, by using InStr and passing it http: (for
> example) as the text to look for, but, that will only give me the starting
> point of the address correct?
>
> "Ray Costanzo [MVP]" wrote:
>
>> What exactly is a "hit result?" As far as what you want to do, it'd all
>> depend on what the html looks like and how consistent it remains. Do you
>> have control over this remote source? Or is it some other site that can
>> change on any given day without any forewarning?
>>
>> Ray at home
>>
>> "SROSeaner" <(E-Mail Removed)> wrote in message
>> news:(E-Mail Removed)...
>> >I have a text file that is the result of using XMLHTTP object to pull
>> >back
>> >a
>> > page of search results from a search engine.
>> >
>> > So I have the entire results page in HTML, and want to break out each
>> > hit
>> > result from the text file as a unique item and do what I want with each
>> > hit
>> > result.
>> >
>> > Is there any suggested algorithms or any other techniques I could be
>> > directed to?

>>
>>
>>



 
Reply With Quote
 
Patrice
Guest
Posts: n/a
 
      09-28-2004
You have DOM parsers available but your code will break if the architecture
of the page change. I would rather use an API or a "service" if
available....

Patrice

--

"SROSeaner" <(E-Mail Removed)> a écrit dans le message de
news:(E-Mail Removed)...
> Actually, all I really need to do is pull out any text in the HTML text

that
> is a web site address, so, in the form of http://www._____.__ or starting
> with www.
>
> I think I know how to find that, by using InStr and passing it http: (for
> example) as the text to look for, but, that will only give me the starting
> point of the address correct?
>
> "Ray Costanzo [MVP]" wrote:
>
> > What exactly is a "hit result?" As far as what you want to do, it'd all
> > depend on what the html looks like and how consistent it remains. Do

you
> > have control over this remote source? Or is it some other site that can
> > change on any given day without any forewarning?
> >
> > Ray at home
> >
> > "SROSeaner" <(E-Mail Removed)> wrote in message
> > news:(E-Mail Removed)...
> > >I have a text file that is the result of using XMLHTTP object to pull

back
> > >a
> > > page of search results from a search engine.
> > >
> > > So I have the entire results page in HTML, and want to break out each

hit
> > > result from the text file as a unique item and do what I want with

each
> > > hit
> > > result.
> > >
> > > Is there any suggested algorithms or any other techniques I could be
> > > directed to?

> >
> >
> >



 
Reply With Quote
 
SROSeaner
Guest
Posts: n/a
 
      09-28-2004
Thanks for your help guys. I figure I will just have to code it in a way to
take care of all the variables in such a situation.

"Patrice" wrote:

> You have DOM parsers available but your code will break if the architecture
> of the page change. I would rather use an API or a "service" if
> available....
>
> Patrice
>
> --
>
> "SROSeaner" <(E-Mail Removed)> a écrit dans le message de
> news:(E-Mail Removed)...
> > Actually, all I really need to do is pull out any text in the HTML text

> that
> > is a web site address, so, in the form of http://www._____.__ or starting
> > with www.
> >
> > I think I know how to find that, by using InStr and passing it http: (for
> > example) as the text to look for, but, that will only give me the starting
> > point of the address correct?
> >
> > "Ray Costanzo [MVP]" wrote:
> >
> > > What exactly is a "hit result?" As far as what you want to do, it'd all
> > > depend on what the html looks like and how consistent it remains. Do

> you
> > > have control over this remote source? Or is it some other site that can
> > > change on any given day without any forewarning?
> > >
> > > Ray at home
> > >
> > > "SROSeaner" <(E-Mail Removed)> wrote in message
> > > news:(E-Mail Removed)...
> > > >I have a text file that is the result of using XMLHTTP object to pull

> back
> > > >a
> > > > page of search results from a search engine.
> > > >
> > > > So I have the entire results page in HTML, and want to break out each

> hit
> > > > result from the text file as a unique item and do what I want with

> each
> > > > hit
> > > > result.
> > > >
> > > > Is there any suggested algorithms or any other techniques I could be
> > > > directed to?
> > >
> > >
> > >

>
>
>

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
SAX parsing problem, when element contains text like "[text]" Kai Schlamp Java 1 03-27-2008 08:36 PM
In file parsing, taking the first few characters of a text file after a readfile or streamreader file read... .Net Sports ASP .Net 11 01-17-2006 12:44 AM
Assistance parsing text file using Text::CSV_XS Domenico Discepola Perl Misc 6 09-02-2004 03:55 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments