Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   wikipedia parser... (http://www.velocityreviews.com/forums/t753695-wikipedia-parser.html)

boris 09-06-2011 10:03 PM

wikipedia parser...
 
hi all,
can anyone recommend working! wikipedia parser (to text or html). I've
tried to get wikitext working, but it looks like it has some problems..
thanks.

Travers Naran 09-07-2011 04:11 AM

Re: wikipedia parser...
 
On 06/09/2011 3:03 PM, boris wrote:
> hi all,
> can anyone recommend working! wikipedia parser (to text or html). I've
> tried to get wikitext working, but it looks like it has some problems..
> thanks.


So when you tried "java wikitext parser" in Google, what results did you
get and which parsers did you try?


Roedy Green 09-08-2011 04:29 AM

Re: wikipedia parser...
 
On Tue, 06 Sep 2011 18:03:29 -0400, boris
<boris@localhost.localdomain> wrote, quoted or indirectly quoted
someone who said :

>hi all,
>can anyone recommend working! wikipedia parser (to text or html). I've
>tried to get wikitext working, but it looks like it has some problems..
>thanks.


If there is only a set of things you are trying to extract, just
download the page with http://mindprod.com/products1.html#HTTP

Then pick out what you want with regexes and indexOf.

See http://mindprod.com/jgloss/regex.html
--
Roedy Green Canadian Mind Products
http://mindprod.com
The modern conservative is engaged in one of man's oldest exercises in moral philosophy; that is,
the search for a superior moral justification for selfishness.
~ John Kenneth Galbraith (born: 1908-10-15 died: 2006-04-29 at age: 97)

Roedy Green 09-08-2011 04:31 AM

Re: wikipedia parser...
 
On Tue, 06 Sep 2011 18:03:29 -0400, boris
<boris@localhost.localdomain> wrote, quoted or indirectly quoted
someone who said :

>hi all,
>can anyone recommend working! wikipedia parser (to text or html). I've
>tried to get wikitext working, but it looks like it has some problems..
>thanks.


have a look at http://mindprod.com/applet/americantax.html
There are screenscrapers for each state to extract sales tax
information. You can use one as a starting point for what you need.
--
Roedy Green Canadian Mind Products
http://mindprod.com
The modern conservative is engaged in one of man's oldest exercises in moral philosophy; that is,
the search for a superior moral justification for selfishness.
~ John Kenneth Galbraith (born: 1908-10-15 died: 2006-04-29 at age: 97)

Arne Vajh°j 09-08-2011 11:57 PM

Re: wikipedia parser...
 
On 9/8/2011 12:29 AM, Roedy Green wrote:
> On Tue, 06 Sep 2011 18:03:29 -0400, boris
> <boris@localhost.localdomain> wrote, quoted or indirectly quoted
> someone who said :
>> can anyone recommend working! wikipedia parser (to text or html). I've
>> tried to get wikitext working, but it looks like it has some problems..
>> thanks.

>
> If there is only a set of things you are trying to extract, just
> download the page with http://mindprod.com/products1.html#HTTP


He is not saying anything about needing help with HTTP
requests.

> Then pick out what you want with regexes and indexOf.
>
> See http://mindprod.com/jgloss/regex.html


It will obviously work, but it is a DIY way.

Arne


Arne Vajh°j 09-08-2011 11:58 PM

Re: wikipedia parser...
 
On 9/8/2011 12:31 AM, Roedy Green wrote:
> On Tue, 06 Sep 2011 18:03:29 -0400, boris
> <boris@localhost.localdomain> wrote, quoted or indirectly quoted
> someone who said :
>> can anyone recommend working! wikipedia parser (to text or html). I've
>> tried to get wikitext working, but it looks like it has some problems..
>> thanks.

>
> have a look at http://mindprod.com/applet/americantax.html
> There are screenscrapers for each state to extract sales tax
> information. You can use one as a starting point for what you need.


Does any of them use wiki markup?

Arne


All times are GMT. The time now is 01:31 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.