Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Async Client with 1K connections?

Reply
Thread Tools

Async Client with 1K connections?

 
 
Aahz
Guest
Posts: n/a
 
      02-11-2004
In article <>,
William Chang <> wrote:
>
>One of the intended uses is indeed a next-gen web spider. I did the
>math, and yes I will need about 10 cutting-edge PCs to spider like
>you-know-who.


Note that while you-know-who makes extensive use of Python, I don't
think they're using it for spidering/searching. I do have some
background writing a spider in Python, using Verity's engine for
indexing/retrieval, but we were using threading rather than
asyncore-style operations.
--
Aahz () <*> http://www.pythoncraft.com/

"Argue for your limitations, and sure enough they're yours." --Richard Bach
 
Reply With Quote
 
 
 
 
William Chang
Guest
Posts: n/a
 
      02-13-2004
(Aahz) wrote:
> Note that while you-know-who makes extensive use of Python, I don't
> think they're using it for spidering/searching. I do have some
> background writing a spider in Python, using Verity's engine for
> indexing/retrieval, but we were using threading rather than
> asyncore-style operations.


Interesting, did you try maxing out the number of threads/connections?
On an UltraSparc with hardware thread/lwp support, a thousand threads
can co-exist reliably, at least for computations and disk I/O. Linux
is another matter entirely.

--William
 
Reply With Quote
 
 
 
 
William Chang
Guest
Posts: n/a
 
      02-13-2004
Paul Rubin <http://> wrote:
> "William Chang" <> writes:
> > ... Throughput per PC would be on
> > the order of 1MB/s assuming 200x5KB downloads/sec using 1-2000
> > simultaneous connections. (That's 17M pages per day per PC.)

>
> That's orders of magnitude less than you-know-who.


Do you know how frequently you-know-who refreshes its entire index? A year
ago things were pretty dire, easily over 10% dead links, if I recall correctly.
10 PCs at 17M/day each will refresh 3B pages in 18 days, easily world-class.

> ... Also, don't forget
> how many queries you have to take from users, and the amount of disk seeks
> needed for each one.


Sure, that's what I do. However, spidering and querying are independent tasks,
generally speaking.

> 10 MB of internet connectivity is at least a few K$/month all by itself.


Yes, $2500 to be specific.

There's no reason to be intimidated (if I may use that word) by you-know-who's
marketing message (80,000 machines). Back in '96 Infoseek could handle 10M
queries per day on a single Sun E4000 with 8CPU (<200Mhz), 4GB, 20x4GB RAID.
Sure the WWW is much bigger now, but so are the disk drives!

-- William
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Async client for PostgreSQL? Laszlo Nagy Python 2 09-01-2012 11:57 PM
Re: Async client for PostgreSQL? Werner Thie Python 0 09-01-2012 05:18 PM
Newbie: async mode dedicated versus async mode interactive!! Pink_Floyd Cisco 4 06-16-2006 12:16 AM
Async webservice call in async webpage (.Net 2.0) does not return Steven ASP .Net Web Services 0 11-30-2005 01:06 AM
problem with async chat client in windows Jonas Python 3 01-05-2004 07:24 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57