On Jul 25, transformer wrote:
> I have to build a signed (unsigned) multiplier. IEEE single precision
> and 32 bit. I know from the IEEE Trans. Comp. Jan. 1999 p. 15. paper
> that the MUX based Pekmestzi multiplier is superior to the commonly
> used modified booth multiplier.
> However, from 1999 to now no other paper was released about the MUX
> based one and no other comparison is available. All papers still use
> the modified booth encoding.
>
> Has anybody implemented the MUX based one in FPGA/ASIC? Is there
> really an advantage over the modified booth one in terms of delay and
> area?
I remember I considered that article but in the end I went for another
solution for one multiplier we built. Honestly I don't remember the
details anymore: it was about 5 years ago!
The greek thing was a nice idea, but the logic depth wasn't less than a
booth encoded one: probably they account for over two gates for one XOR,
which is in most cases far too pessimistic - as I wrote, it was a lot of
time ago and I don't remember exactly.
There was another later article on the Journal of Solid-State Circuits
about a regular multiplier (full-adder based), mabe in 2000, try looking
for it: there might be comparisons.
Some years ago there was also one english startup who claimed they can
outperform current design by a large factor (two). I think it was 2001
and the company was named AutoPD. I'm quite sure they exaggerated their
claim, but that's usual marketing and it's a part of the game.
Nevertheless, their basic idea was to dynamically generate the compressor
blocks, and this is ok! The best previous method I know of was by
allocating skewed full-adders (Oklobdzjia) and this leaves a bit of place
for optimizations, so I'm pretty sure that there is a way to get a bit
more speed (maybe 10-15% on a 32b signed MAC, for sure not 200%, unless
you use *very strange* basic cells

)
I think after 2001 there was generally a lack of interest (and money -
good bye new economy ...) to squeeze the last picoseconds out of an adder
tree structure: this also makes sense, there is much more to win in other
parts of the datapath and careful layout and transistor sizing really take
a lot of time - no matter how good you are at doing "black magic"
The skewed full-adders appoach (TDM + Booth + some refinement) is a good
framework and leads to good results, probably the optimum can be achieved
by tuning the compressor blocks.
So, I wouldn't go for the MUX-based thing. Of course if you work on
"strange" stuff (bad cells, bad XOR implementation, strange process,
strange FPGA switch blocks ...) things might change a bit.
--
Lorenzo Di Gregorio