![]() |
Writing an emulator in python - implementation questions (forperformance)
Hi. I'm trying to port (just for fun), my old Sinclair Spectrum emulator, ASpectrum, from C to Python + pygame. Although the Sinclair Spectrum has a simple Z80 8 bit 3.5Mhz microprocessor, and no aditional hardware (excluding the +2/+3 model's AYsound chip), I'm not sure if my loved scripted language, python, will be fast enought to emulate the Sinclair Spectrum at 100% speed. There are Java Spectrum emulators available, so it should be possible. Anyway, this message is directed to prepare the DESIGN so that the code can run as fast as possible. I mean, I want to know the best way to do some emulation tasks before starting to write a single line of code. My questions: GLOBAL VARIABLES VS OBJECTS: ================================== I need the emulation to be as fastest as possible. In my C program I have an struct called "Z80_Processor" that contains all the registers, memory structures, and so on. I pass that struct to the Z80Decode(), Z80Execute() or Z80Dissassemble() functions, i.e.. This allows me (in my C emulator) to emulate multiple z80 processors If I want. As Python is scripted and I imagine that emulation will be slower than the emulator written in C, I've thought of not creating a Z80_Processor *object* and declare global variables such as reg_A, reg_B, reg_PC, array main_memory[] and so on, and let the z80 functions to directly access that global variables. I'm doing this to avoid OOP's extra processing... but this makes the program less modular. Do you think using processor "objects" would make the execution slower, or I'm doing right using global variables and avoiding objects in this type of program? Should I start writing all the code with a Z80CPU object and if performance is low, just remove the "object" layer and declare it as globals, or I should go directly for globals? HIGH AND LOW PART OF REGISTERS: ================================= - In C, I have the following structs and code for emulating registers: typedef union { struct { unsigned char l, h; } B; unsigned short W; } eword; eword reg_A; This means that reg_A is a 16 bit "variable" that I can directly access with reg_A.w=value, and I can access also the LOW BYTE and HIGH BYTES with reg_A.B.h and reg_A.B.l. And more importante, changing W modifies l and h, and changing l or h modifies W. How can I implement this in Python, I mean, define a 16 byte variable so that high and low bytes can be accessed separately and changing W, H or L affects the entire variable? I would like to avoid doing BIT masks to get or change HIGH or LOW parts of a variable and let the compiled code to do it by itself. I know I can write an "object" with set and get methods that implement that (that could be done in C too), but for emulation, I need the FASTEST implementation possible (something like the C's Union trick). MEMORY (ARRAYS): =========================== To emulate Spectrum's memory in C, I have a 64KB array like: unsigned char memory_array[65535]. Then I can access memory_array[reg_PC] to fetch the next opcode (or data for an opcode) and just increment reg_PC. Is python's array module the best (and fastest) implementation to "emulate" the memory? MEMORY (PAGES): ============================= The Sinclair Spectrum 8 bit computer can address 64KB of memory and that memory is based on 16KB pages (so it can see 4 pages simultaneously, where page 0 is always ROM). Programs can change "pages" to point to aditional 16KB pages in 128KB memory models. I don't know how to emulate paging in python... My first approach would be to have eight 16KB arrays, and "memcpy()" memory to the main 64KB array when the user calls page swapping. I mean (C + pseudocode): main_memory[65536]; memory_blocks[8][16384]; // Initial settings current_pages[4] = [0, 1, 2, 3] // User swaps last memory page (3) to block 7, so I: page_to_swap_from = 3 page_to_map = 7 // Save the contents of current page (page being unmapped): memcpy( main_memory, // Source 16384*page_to_swap_from, // Starting at memory_blocks[current_pages[page_to_swap_from], // To 16384 ); // 16K // Now map page 7 to memory block 3: memcpy( memory_blocks[page_to_map], // Source 0, // Starting at main_memory[page_to_swap_from*16384], // To 16384 ); // 16K current_pages[page_to_swap_from] = page_to_map; Memcpy is very fast in C, but I don't know if doing the same in python with arrays would be fast enough, or if there is a better approach to simulate paging of 16KB blocks in a 64KB memory windows (4 mappable memory blocks). Maybe another approach based in pointers or something like that? |
Re: Writing an emulator in python - implementation questions (forperformance)
On Nov 12, 4:35*am, Santiago Romero <srom...@gmail.com> wrote:
> *Hi. > > *I'm trying to port (just for fun), my old Sinclair Spectrum emulator, > ASpectrum, from C to Python + pygame. The answer to your question is, "Use numpy". More details below. [snip] > *Should I start writing all the code with a Z80CPU object and if > performance is low, just remove the "object" layer and declare it as > globals, Yes, but I'd suggest to index a numpy array rather than structures. See below. [snip] > *How can I implement this in Python, I mean, define a 16 byte variable > so that high and low bytes can be accessed separately and changing W, > H or L affects the entire variable? I would like to avoid doing BIT > masks to get or change HIGH or LOW parts of a variable and let the > compiled code to do it by itself. You can do clever memory slicing like this with numpy. For instance: breg = numpy.zeros((16,),numpy.uint8) wreg = numpy.ndarray((8,),numpy.uint16,breg) This causes breg and wreg to share the same 16 bytes of memory. You can define constants to access specific registers: R1L = 1 R1H = 2 R1 = 1 breg[R1H] = 2 print wreg[R1] [snip] > *Is python's array module the best (and fastest) implementation to > "emulate" the memory? I'd use numpy for this as well. (I doubt the Z80 had a 16-bit bus, but if it did you could use the same memory sharing trick I showed you with the registers to simulate word reads and writes.) Note that, when performing operations on single values, neither numpy nor array module are necessarily a lot faster than Python lists, might even be slower. But they use a LOT less memory, which is important for largish arrays. [snip] > *The Sinclair Spectrum 8 bit computer can address 64KB of memory and > that memory is based on 16KB pages (so it can see 4 pages > simultaneously, where page 0 is always ROM). Programs can change > "pages" to point to aditional 16KB pages in 128KB memory models. > > *I don't know how to emulate paging in python... numpy again. This would mean you'd have to fiddle with addresses a bit, but that shouldn't be too big a deal. Create the physical memory: mem = numpy.zeros((128*1024,),numpy.uint8) Then create the pages. (This is a regular Python list containing numpy slices. numpy slices share memory so there is no copying of underlying data.) page = [mem[0:16*1024], mem[16*1024:32*1024], mem[32*1024:48*1024], mem[48*1024:64*1024]] To access the byte at address 42432, you'd have use bit operations to get a page number and index (2 and 9664 in this case), then you can access the memory like this: page[2][9664] = 33 p = page[3][99] To swap a page, reassign the slice of main memory, page[2] = mem[96*1024:112*1024] Now, accessing address 42432 will access a byte from a different page. If you don't want to fiddle with indirect pages and would just rather copy memory around when a page swap occurs, you can do that, too. (Assigning to a slice copies the data rather than shares.) I don't know if it's as fast as memset but it should be pretty quick. .. Hope these brief suggestions help. If you don't want third party libraries, then numpy will be of no use. But I guess if you're using pygame third party modules are ok. So get numpy, it'll make things a lot easier. It can be a bit daunting to learn, though. Carl Banks |
Re: Writing an emulator in python - implementation questions (forperformance)
> > *I'm trying to port (just for fun), my old Sinclair Spectrum emulator,
> > ASpectrum, from C to Python + pygame. > > The answer to your question is, "Use numpy". *More details below. Let's see :-) > > *How can I implement this in Python, I mean, define a 16 byte variable > > so that high and low bytes can be accessed separately and changing W, > > H or L affects the entire variable? I would like to avoid doing BIT > > masks to get or change HIGH or LOW parts of a variable and let the > > compiled code to do it by itself. > > You can do clever memory slicing like this with numpy. *For instance: > > breg = numpy.zeros((16,),numpy.uint8) > wreg = numpy.ndarray((8,),numpy.uint16,breg) > > This causes breg and wreg to share the same 16 bytes of memory. *You > can define constants to access specific registers: > > R1L = 1 > R1H = 2 > R1 = 1 > > breg[R1H] = 2 > print wreg[R1] And how about speed? Assuming a 16 bit register named BC which contains 2 8 bit regiters (B and C)... Will the above be faster than shifts and bit operations (<<, and, >> ) with new B and C values to "recalculate" BC when reading or changing either B, C or BC? > > *Is python's array module the best (and fastest) implementation to > > "emulate" the memory? > > I'd use numpy for this as well. *(I doubt the Z80 had a 16-bit bus, > but if it did you could use the same memory sharing trick I showed you > with the registers to simulate word reads and writes.) Z80 has a 16 bit ADDRESS bus, 8 bit DATA bus. This means you can address from 0 to 65535 memory cells of 8 bytes. Z80 has 16 bit bus operations, but currently in C I write 16 bit values as two 8 bit consecutive values without using (unsigned short *) pointers. But it seems that numpy would allow me to do it better than C in this case... > Note that, when performing operations on single values, neither numpy > nor array module are necessarily a lot faster than Python lists, might > even be slower. *But they use a LOT less memory, which is important > for largish arrays. Well, we're talking about a 128KB 1-byte array, that's the maximum memory size a Sinclair Spectrum can have, and always by using page swapping of 16KB blocks in a 64KB addressable space... If you really think that python lists can be faster that numpy or array modules, let me know. Maybe I'll make a "benchmark test", by making some millions of read / write operations and timing the results. I wan't to write the emulator as "pythonic" as possible... > > *I don't know how to emulate paging in python... > > numpy again. *This would mean you'd have to fiddle with addresses a > bit, but that shouldn't be too big a deal. *Create the physical > memory: > > mem = numpy.zeros((128*1024,),numpy.uint8) A 128K array of zeroes... > Then create the pages. *(This is a regular Python list containing > numpy slices. numpy slices share memory so there is no copying of > underlying data.) > > page = [mem[0:16*1024], > * * * * mem[16*1024:32*1024], > * * * * mem[32*1024:48*1024], > * * * * mem[48*1024:64*1024]] Those are just like pointers to the "mem" numpy array, pointing to concrete start indexes, aren't they? > To access the byte at address 42432, you'd have use bit operations to > get a page number and index (2 and 9664 in this case), then you can > access the memory like this: Do you mean: page = address / 16384 index = address MOD 16384 ? Or, better, with: page = address >> 14 index = address & 16383 ? > page[2][9664] = 33 > p = page[3][99] > > To swap a page, reassign the slice of main memory, > > page[2] = mem[96*1024:112*1024] > > Now, accessing address 42432 will access a byte from a different page. But the above calculations (page, index) wouldn't be slower than a simple play 64KB numpy array (and make no paging mechanism when reading or writing) and copy memory slices when page swappings are requested? > If you don't want to fiddle with indirect pages and would just rather > copy memory around when a page swap occurs, you can do that, too. > (Assigning to a slice copies the data rather than shares.) I don't > know if it's as fast as memset but it should be pretty quick. That's what I was talking about. With "page and index" I'm slowing down EACH memory read and write, and that includes opcode and data fetching... With "memory copying in page swaps", memory is always read and write quickly, and if "slice copies" are fast enough, the emulation will be >100% of speed (I hope) for a 3.5Mhz system ... > Hope these brief suggestions help. *If you don't want third party > libraries, then numpy will be of no use. *But I guess if you're using > pygame third party modules are ok. *So get numpy, it'll make things a > lot easier. *It can be a bit daunting to learn, though. Yes, you've been very helpful! How about numpy portability? Is array more portable? And finally, do you think I'm doing right by using global variables for registers, memory, and so, or should I put ALL into a single object and pass it to functions? Ummm ... I think I'm going to do some tests with some "for i in range (1,100000000000)" + time :-) Thanks a lot for your help :-) Any other suggestion is welcome. |
Re: Writing an emulator in python - implementation questions (forperformance)
Santiago Romero wrote: >>> I'm trying to port (just for fun), my old Sinclair Spectrum emulator, >>> A >> <snip> > Do you mean: > > page =ddress / 16384 > index =ddress MOD 16384 > > ? > > Or, better, with: > > page =ddress >> 14 > index =ddress & 16383 > > ? > <snip> > How about page, index = divmod(address, 16384) |
Re: Writing an emulator in python - implementation questions (forperformance)
> How about > * * page, index = divmod(address, 16384) Surely, much better and faster :-) Thanks a lot. |
Re: Writing an emulator in python - implementation questions (forperformance)
> You can do clever memory slicing like this with numpy. *For instance:
> > breg = numpy.zeros((16,),numpy.uint8) > wreg = numpy.ndarray((8,),numpy.uint16,breg) > > This causes breg and wreg to share the same 16 bytes of memory. *You > can define constants to access specific registers: What I'm doing wrong? [sromero@compiler:~]$ cat test.py #!/usr/bin/python import numpy # Register array breg = numpy.zeros((16,),numpy.uint8) wreg = numpy.ndarray((8,), numpy.uint16, breg ) reg_A = 1 reg_F = 2 reg_AF = 1 reg_B = 3 reg_C = 4 reg_BC = 3 breg[reg_B] = 5 breg[reg_C] = 10 print breg[reg_B] print breg[reg_C] print wreg[reg_BC] [sromero@compiler:~]$ python test.py 5 10 0 ? |
Re: Writing an emulator in python - implementation questions (forperformance)
Oops, numpy arrays start with index=0 :-) |
Re: Writing an emulator in python - implementation questions (forperformance)
Carl Banks wrote:
> You > can define constants to access specific registers: > > R1L = 1 > R1H = 2 > R1 = 1 > > breg[R1H] = 2 > print wreg[R1] But keep in mind that named "constants" at the module level are really global variables, and therefore incur a dictionary lookup every time they're used. For maximum speed, nothing beats writing the numeric literals directly into the code, unfortunately. Generally, I think you're going to have quite a battle on your hands to get a pure Python implementation to run as fast as a real Z80, if it's even possible at all. And if you do succeed, the code will be pretty awful (due to things such as not being able to use named constants). I would be taking a different approach -- develop a prototype in Python, concentrating on clarity rather than speed, and later reimplement the core of the emulator as an extension module, using Pyrex or Cython or otherwise. -- Greg |
Re: Writing an emulator in python - implementation questions (forperformance)
On Fri, 13 Nov 2009 15:29:03 +1300, greg wrote:
> Generally, I think you're going to have quite a battle on your hands to > get a pure Python implementation to run as fast as a real Z80, if it's > even possible at all. And if you do succeed, the code will be pretty > awful (due to things such as not being able to use named constants). I don't know about that... Python on a 2GHz processor (which is more or less entry-level for desktop PCs these days), emulating something which used to run at 2.5MHz? Even if the Python code does 1000 times more work, the modern processor is nearly 1000 times faster, so your Python code won't be much slower than the first generation Z80. The latest Z80 processors operate at 50MHz. That still gives you a factor of 40 to work with. Write your Python carefully, optimizing only the bits that *need* optimizing, and perhaps using a few C extensions or Psyco, and I think you have a good chance of being close enough to the speed of a real Z80. -- Steven |
Re: Writing an emulator in python - implementation questions (forperformance)
On Fri, 13 Nov 2009 15:33:53 +1300, greg wrote:
> Santiago Romero wrote: >>>How about >>> page, index = divmod(address, 16384) >> >> Surely, much better and faster :-) > > Not necessarily, because it involves a function call, and constructing > and deconstructing a result tuple. If you time them, you may well find > that the explicit shift and mask operations turn out to be faster. It's easy enough to test: >>> from timeit import Timer >>> t1 = Timer('a = n>>14; b = n & 16384', 'n=2137902') >>> t2 = Timer('a,b = divmod(n, 16384)', 'n=2137902') >>> min(t1.repeat(repeat=5)) 0.32850909233093262 >>> min(t2.repeat(repeat=5)) 0.54839301109313965 The shift and mask are a little faster on my machine, but that's certainly what I would call a micro-optimization. Unless the divmod call is the bottleneck in your code -- and it almost certainly won't be -- I don't think it's worth the obfuscation to use shift/mask. What should be done is write the code in the clearest way you can, then, only if it is it too slow, profile it to see where it actually needs optimizing. -- Steven |
| All times are GMT. The time now is 12:31 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.