Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > web crawler in python or C?

Reply
Thread Tools

web crawler in python or C?

 
 
abhinav
Guest
Posts: n/a
 
      02-16-2006
Hi guys.I have to implement a topical crawler as a part of my
project.What language should i implement
C or Python?Python though has fast development cycle but my concern is
speed also.I want to strke a balance between development speed and
crawler speed.Since Python is an interpreted language it is rather
slow.The crawler which will be working on huge set of pages should be
as fast as possible.One possible implementation would be implementing
partly in C and partly in Python so that i can have best of both
worlds.But i don't know to approach about it.Can anyone guide me on
what part should i implement in C and what should be in Python?

 
Reply With Quote
 
 
 
 
websnarf@gmail.com
Guest
Posts: n/a
 
      02-16-2006
abhinav wrote:
> Hi guys.I have to implement a topical crawler as a part of my
> project.What language should i implement
> C or Python?Python though has fast development cycle but my concern is
> speed also. I want to strke a balance between development speed and
> crawler speed.


Web crawling is an inherently network limited activity. The way to
speed up crawling is through parallel downloading. The language
performance is not going to have a relevant effect. Python does not
support multithreading, but it does support weak coroutines. (Of
course, C does not support any kind of multithreading, except by
platform specific extensions -- but these extensions are widespread.)

For the problem of parsing and handling data structures for this
activity, however, Python is *FAR* superior to C in terms of
development speed.

> [...] Since Python is an interpreted language it is rather
> slow.The crawler which will be working on huge set of pages should be
> as fast as possible.One possible implementation would be implementing
> partly in C and partly in Python so that i can have best of both
> worlds. But i don't know to approach about it.Can anyone guide me on
> what part should i implement in C and what should be in Python?


Actually, I have, in fact, done it this way myself in the past (before
Python had weak coroutines.) The way I did it is I wrote a
command-line tool for pulling down a collection of URLs from a control
file in C (the URLs would be downloaded in a multithreaded manner),
then I drove this tool from a Python program. Asymptotically, this
pegs my download bandwidth for the majority of the runtime, thus making
it basically within striking distance of theoretically optimal.

The problem is that you've picked completely the wrong newsgroup to ask
this question. Unfortunately, there is not clue to this fact from the
name of this newsgroup. This is actually a newsgroup that discusses
only the ANSI/ISO C standard as it exists, and none of platform
specific extensions (including sockets, and multithreading). Nor is
the discussion of the development of real applications considered
on-topic in this newsgroup. Neither is performance considered on topic
-- by the standard, apparently you can't know even the *relative* speed
of anything in C. comp.programming would probaby have been a better
place to post this.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: web crawler in python Philip Semanchuk Python 0 12-10-2009 01:24 PM
Web crawler on python yura Python 1 10-30-2008 10:25 PM
Web crawler on python sonich Python 4 10-28-2008 05:22 PM
Web Crawler - Python or Perl? disappearedng@gmail.com Python 11 06-22-2008 05:47 PM
web crawler in python or C? abhinav Python 13 02-20-2006 09:07 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57