Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Open source web crawler with mysql integration

Reply
Thread Tools

Re: Open source web crawler with mysql integration

 
 
Philip Semanchuk
Guest
Posts: n/a
 
      04-10-2009

On Apr 10, 2009, at 12:33 PM, bruce wrote:

> phillip...
>
> lots of code is opened source "as is"!!!
>
> when you get right down to it, a good deal of "open source" code from
> sourceforge/hotscritps/freshmeat/etc.. is pretty poor, but it is open
> sourced.
>
> you could simply toss your code out into the open source pool, and
> not be
> worried about supporting it, or even touching it again...


You're right, and I like your enthusiasm. But I don't want to invite
people to use my code if it's just going to be frustrating to 90% of
them. It's bad for my reputation. And for the reputation of open
source in general, but I'm more concerned about me.




> -----Original Message-----
> From: python-list-bounces+bedouglas=
> [mailtoython-list-bounces+bedouglas=]On
> Behalf
> Of Philip Semanchuk
> Sent: Friday, April 10, 2009 8:10 AM
> To: Python (General)
> Subject: Re: Open source web crawler with mysql integration
>
>
>
> On Apr 10, 2009, at 10:28 AM, Support Desk wrote:
>
>> Sounds Interesting. When its done would you care to share it?

>
> Hi Michael,
> The coding has been done (as much as software is ever "done") for a
> couple of years now. It's mothballed now, sitting on my hard drive.
> The problem with open sourcing it isn't that the code is incomplete,
> the problem is that it's insufficiently documented, features a
> byzantine install procedure and contains a lot of code & assumptions
> that were relevant to my business but would not be of interest to most
> people looking to download a general-purpose spider. I'd love to open
> source it and if someone wants to pay me to make it open source-able,
> let's talk! But if I have to do it on my own time for free it will be
> a while (maybe never, although I hope not) before I can make the time.
>
> Regards
> Philip
>
>
>
>
>> -----Original Message-----
>> From: Philip Semanchuk [mailto]
>> Sent: Thursday, April 09, 2009 9:46 PM
>> To: Python
>> Subject: Re: Open source web crawler with mysql integration
>>
>>
>> On Apr 9, 2009, at 7:37 PM, Daniel Fetchinson wrote:
>>
>>>> I'm looking for a crawler that can spider my site and toss the
>>>> results
>>>> into mysql so, in turn, that database can be indexed by Sphinx
>>>> Search.
>>>>
>>>> Since I don't want to reinvent the wheel, is anyone aware of any
>>>> open
>>>> source projects or code snippets that can already handle this?
>>>
>>> Have a look at http://nikitathespider.com/python/

>>
>>
>> As the author of Nikita, I can say that (a) she used Postgres and (b)
>> the code wasn't open sourced except for a couple of small parts. The
>> service is now defunct. It wasn't making money. Ideally I'd like to
>> open source the code one day, but it would take a lot of
>> documentation
>> work to make it installable by others, and I won't have the time to
>> do
>> that for the foreseeable future.
>>
>> At the URL provided there's a nice module for parsing robots.txt
>> files
>> (better than the one in the standard library IMHO) but that's about
>> it.
>>
>> FYI, I wrote my spider in Python because I couldn't find a decent one
>> written in Python. There's Nutch, but that's not Python (Java I
>> think).
>>
>> Good luck
>> Philip
>>
>>
>>

>
> --
> http://mail.python.org/mailman/listinfo/python-list
>


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Web Crawler Paul Morrison Java 3 06-30-2012 04:17 PM
Open source web crawler with mysql integration dhenews Python 8 04-11-2009 05:45 AM
RE: Open source web crawler with mysql integration bruce Python 0 04-10-2009 04:33 PM
Open Source Conference in Japan: Open Source Realize Forum 2005 pat eyler Ruby 1 03-05-2005 03:50 AM
Web Crawler Hans Computer Support 1 07-20-2003 03:20 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57