Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > design choice: multi-threaded / asynchronous wxpython client?

Reply
Thread Tools

design choice: multi-threaded / asynchronous wxpython client?

 
 
bullockbefriending bard
Guest
Posts: n/a
 
      04-27-2008
I am a complete ignoramus and newbie when it comes to designing and
coding networked clients (or servers for that matter). I have a copy
of Goerzen (Foundations of Python Network Programming) and once
pointed in the best direction should be able to follow my nose and get
things sorted... but I am not quite sure which is the best path to
take and would be grateful for advice from networking gurus.

I am writing a program to display horse racing tote odds in a desktop
client program. I have access to an HTTP (open one of several URLs,
and I get back an XML doc with some data... not XML-RPC.) source of
XML data which I am able to parse and munge with no difficulty at all.
I have written and successfully tested a simple command line program
which allows me to repeatedly poll the server and parse the XML. Easy
enough, but the real world production complications are:

1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming
race... I should query for this perhaps every 150s to be safe. But for
the upcoming race, I must not miss any updates and should query every
~7s to be safe. So... in the middle of a race meeting the situation
might be:
race 1 (race done with, no-longer querying), race 2 (race done with,
no longer querying) race 3 (about to start, data on server for this
race updating every 15s, my client querying every 7s), races 4-8 (data
on server for these races updating every 5 mins, my client querying
every 2.5 mins)

2) After a race has started and betting is cut off and there are
consequently no more tote updates for that race (it is possible to
determine when this occurs precisely because of an attribute in the
XML data), I need to stop querying (say) race 3 every 7s and remove
race 4 from the 150s query group and begin querying its data every 7s.

3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.

My initial thought was to have two threads for the different update
polling cycles. In addition I would probably need another thread to
handle UI stuff, and perhaps another for dealing with file/DB data
write out. But, I wonder if using Twisted is a better idea? I will
still need to handle some threading myself, but (I think) only for
keeping wxpython happy by doing all this other stuff off the main
thread + perhaps also persisting received data in yet another thread.

I have zero experience with these kinds of design choices and would be
very happy if those with experience could point out the pros and cons
of each (synchronous/multithreaded, or Twisted) for dealing with the
two differing sample rates problem outlined above.

Many TIA!




 
Reply With Quote
 
 
 
 
Eric Wertman
Guest
Posts: n/a
 
      04-27-2008
HI, that does look like a lot of fun... You might consider breaking
that into 2 separate programs. Write one that's threaded to keep a db
updated properly, and write a completely separate one to handle
displaying data from your db. This would allow you to later change or
add a web interface without having to muck with the code that handles
data.
 
Reply With Quote
 
 
 
 
David
Guest
Posts: n/a
 
      04-27-2008
>
> 1) The data for the race about to start updates every (say) 15
> seconds, and the data for earlier and later races updates only every
> (say) 5 minutes. There is no point for me to be hammering the server
> with requests every 15 seconds for data for races after the upcoming


Try using an HTTP HEAD instruction instead to check if the data has
changed since last time.
 
Reply With Quote
 
bullockbefriending bard
Guest
Posts: n/a
 
      04-27-2008
On Apr 27, 10:05*pm, "Eric Wertman" <(E-Mail Removed)> wrote:
> HI, that does look like a lot of fun... You might consider breaking
> that into 2 separate programs. *Write one that's threaded to keep a db
> updated properly, and write a completely separate one to handle
> displaying data from your db. *This would allow you to later change or
> add a web interface without having to muck with the code that handles
> data.


Thanks for the good point. It certainly is a lot of 'fun'. One of
those jobs which at first looks easy (XML, very simple to parse data),
but a few gotchas in the real-time nature of the beast.

After thinking about your idea more, I am sure this decoupling of
functions and making everything DB-centric can simplify a lot of
issues. I quite like the idea of persisting pickled or YAML data along
with the raw XML (for archival purposes + occurs to me I might be able
to do something with XSLT to get it directly into screen viewable form
without too much work) to a DB and then having a client program which
queries most recent time-stamped data for display.

A further complication is that at a later point, I will want to do
real-time time series prediction on all this data (viz. predicting
actual starting prices at post time x minutes in the future). Assuming
I can quickly (enough) retrieve the relevant last n tote data samples
from the database in order to do this, then it will indeed be much
simpler to make things much more DB-centric... as opposed to
maintaining all this state/history in program data structures and
updating it in real time.
 
Reply With Quote
 
Jorge Godoy
Guest
Posts: n/a
 
      04-27-2008
bullockbefriending bard wrote:

> A further complication is that at a later point, I will want to do
> real-time time series prediction on all this data (viz. predicting
> actual starting prices at post time x minutes in the future). Assuming
> I can quickly (enough) retrieve the relevant last n tote data samples
> from the database in order to do this, then it will indeed be much
> simpler to make things much more DB-centric... as opposed to
> maintaining all this state/history in program data structures and
> updating it in real time.


If instead of storing XML and YAML you store the data points, you can do
everything from inside the database.

PostgreSQL supports Python stored procedures / functions and also support
using R in the same way, for manipulating data. Then you can work with
everything and just retrieve the resulting information.

You might try storing the raw data and the XML / YAML, but I believe that
keeping those sync'ed might cause you some extra work.


 
Reply With Quote
 
bullockbefriending bard
Guest
Posts: n/a
 
      04-27-2008
On Apr 27, 10:10*pm, David <(E-Mail Removed)> wrote:
> > *1) The data for the race about to start updates every (say) 15
> > *seconds, and the data for earlier and later races updates only every
> > *(say) 5 minutes. There is *no point for me to be hammering the server
> > *with requests every 15 seconds for data for races after the upcoming

>
> Try using an HTTP HEAD instruction instead to check if the data has
> changed since last time.


Thanks for the suggestion... am I going about this the right way here?

import urllib2
request = urllib2.Request("http://get-rich.quick.com")
request.get_method = lambda: "HEAD"
http_file = urllib2.urlopen(request)

print http_file.headers

->>>
Age: 0
Date: Sun, 27 Apr 2008 16:07:11 GMT
Content-Length: 521
Content-Type: text/xml; charset=utf-8
Expires: Sun, 27 Apr 2008 16:07:41 GMT
Cache-Control: public, max-age=30, must-revalidate
Connection: close
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Via: 1.1 jcbw-nc3 (NetCache NetApp/5.5R4D6)

Date is the time of the server response and not last data update. Data
is definitely time of server response to my request and bears no
relation to when the live XML data was updated. I know this for a fact
because right now there is no active race meeting and any data still
available is static and many hours old. I would not feel confident
rejecting incoming data as duplicate based only on same content length
criterion. Am I missing something here?

Actually there doesn't seem to be too much difficulty performance-wise
in fetching and parsing (minidom) the XML data and checking the
internal (it's an attribute) update time stamp in the parsed doc. If
timings got really tight, presumably I could more quickly check each
doc's time stamp with SAX (time stamp comes early in data as one might
reasonably expect) before deciding whether to go the whole hog with
minidom if the time stamp has in fact changed since I last polled the
server.

But if there is something I don't get about HTTP HEAD approach, please
let me know as a simple check like this would obviously be a good
thing for me.
 
Reply With Quote
 
Jorge Godoy
Guest
Posts: n/a
 
      04-27-2008
bullockbefriending bard wrote:

> 3) I need to dump this data (for all races, not just current about to
> start race) to text files, store it as BLOBs in a DB *and* update real
> time display in a wxpython windowed client.


Why in a BLOB? Why not into specific data types and normalized tables? You
can also save the BLOB for backup or auditing, but this won't allow you to
use your DB to the best of its capabilities... It will just act as a data
container, the same as a network share (which would not penalize you too
much to have connections open/closed).
 
Reply With Quote
 
Jarkko Torppa
Guest
Posts: n/a
 
      04-27-2008
On 2008-04-27, David <(E-Mail Removed)> wrote:
>>
>> 1) The data for the race about to start updates every (say) 15
>> seconds, and the data for earlier and later races updates only every
>> (say) 5 minutes. There is no point for me to be hammering the server
>> with requests every 15 seconds for data for races after the upcoming

>
> Try using an HTTP HEAD instruction instead to check if the data has
> changed since last time.


Get If-Modified-Since is still better
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html 14.25)

--
Jarkko Torppa
 
Reply With Quote
 
BJörn Lindqvist
Guest
Posts: n/a
 
      04-27-2008
I think twisted is overkill for this problem. Threading, elementtree
and urllib should more than suffice. One thread polling the server for
each race with the desired polling interval. Each time some data is
treated, that thread sends a signal containing information about what
changed. The gui listens to the signal and will, if needed, update
itself with the new information. The database handler also listens to
the signal and updates the db.



2008/4/27, bullockbefriending bard <(E-Mail Removed)>:
> I am a complete ignoramus and newbie when it comes to designing and
> coding networked clients (or servers for that matter). I have a copy
> of Goerzen (Foundations of Python Network Programming) and once
> pointed in the best direction should be able to follow my nose and get
> things sorted... but I am not quite sure which is the best path to
> take and would be grateful for advice from networking gurus.
>
> I am writing a program to display horse racing tote odds in a desktop
> client program. I have access to an HTTP (open one of several URLs,
> and I get back an XML doc with some data... not XML-RPC.) source of
> XML data which I am able to parse and munge with no difficulty at all.
> I have written and successfully tested a simple command line program
> which allows me to repeatedly poll the server and parse the XML. Easy
> enough, but the real world production complications are:
>
> 1) The data for the race about to start updates every (say) 15
> seconds, and the data for earlier and later races updates only every
> (say) 5 minutes. There is no point for me to be hammering the server
> with requests every 15 seconds for data for races after the upcoming
> race... I should query for this perhaps every 150s to be safe. But for
> the upcoming race, I must not miss any updates and should query every
> ~7s to be safe. So... in the middle of a race meeting the situation
> might be:
> race 1 (race done with, no-longer querying), race 2 (race done with,
> no longer querying) race 3 (about to start, data on server for this
> race updating every 15s, my client querying every 7s), races 4-8 (data
> on server for these races updating every 5 mins, my client querying
> every 2.5 mins)
>
> 2) After a race has started and betting is cut off and there are
> consequently no more tote updates for that race (it is possible to
> determine when this occurs precisely because of an attribute in the
> XML data), I need to stop querying (say) race 3 every 7s and remove
> race 4 from the 150s query group and begin querying its data every 7s.
>
> 3) I need to dump this data (for all races, not just current about to
> start race) to text files, store it as BLOBs in a DB *and* update real
> time display in a wxpython windowed client.
>
> My initial thought was to have two threads for the different update
> polling cycles. In addition I would probably need another thread to
> handle UI stuff, and perhaps another for dealing with file/DB data
> write out. But, I wonder if using Twisted is a better idea? I will
> still need to handle some threading myself, but (I think) only for
> keeping wxpython happy by doing all this other stuff off the main
> thread + perhaps also persisting received data in yet another thread.
>
> I have zero experience with these kinds of design choices and would be
> very happy if those with experience could point out the pros and cons
> of each (synchronous/multithreaded, or Twisted) for dealing with the
> two differing sample rates problem outlined above.
>
> Many TIA!
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>



--
mvh Björn
 
Reply With Quote
 
bullockbefriending bard
Guest
Posts: n/a
 
      04-27-2008
On Apr 27, 11:12*pm, Jorge Godoy <(E-Mail Removed)> wrote:
> bullockbefriending bard wrote:
> > A further complication is that at a later point, I will want to do
> > real-time time series prediction on all this data (viz. predicting
> > actual starting prices at post time x minutes in the future). Assuming
> > I can quickly (enough) retrieve the relevant last n tote data samples
> > from the database in order to do this, then it will indeed be much
> > simpler to make things much more DB-centric... as opposed to
> > maintaining all this state/history in program data structures and
> > updating it in real time.

>
> If instead of storing XML and YAML you store the data points, you can do
> everything from inside the database.
>
> PostgreSQL supports Python stored procedures / functions and also support
> using R in the same way, for manipulating data. *Then you can work with
> everything and just retrieve the resulting information.
>
> You might try storing the raw data and the XML / YAML, but I believe that
> keeping those sync'ed might cause you some extra work.


Tempting thought, but one of the problems with this kind of horse
racing tote data is that a lot of it is for combinations of runners
rather than single runners. Whilst there might be (say) 14 horses in a
race, there are 91 quinella price combinations (1-2 through 13-14,
i.e. the 2-subsets of range(1, 15)) and 364 trio price combinations.
It is not really practical (I suspect) to have database tables with
columns for that many combinations?

I certainly DO have a horror of having my XML / whatever else formats
getting out of sync. I also have to worry about the tote company later
changing their XML format. From that viewpoint, there is indeed a lot
to be said for storing the tote data as numbers in tables.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[wxPython-users] Web based applications are possible with wxPython? Ruben Charles Python 6 10-25-2005 09:41 PM
asynchronous comunication, wxPython and threads. Zunbeltz Izaola Python 6 06-22-2005 02:32 PM
wxPython - wx package (new style wxPython?) Logan Python 5 12-11-2003 04:12 PM
[PY GUI] interest function in python GUI(wxpython,pyqt) program.wxpython,pyqt ulysses Python 4 10-22-2003 03:28 PM
wxPython looses function "wxPython.wx.miscc" Anand Python 1 07-23-2003 01:59 AM



Advertisments