On 7/20/2013 7:57 PM, lokesh kumar wrote:

> Yes, I am also thinking there must be some ways to know about the number of loops. We can calculate small numbers like 5-bit or 6-bit by using our hand. But it is very hard to calculate for 163-bit.

>

> In your last message you told that, 3 loops are used to reduced. But it is not correct. After the first loop, you are getting two zeros in the beginning. So we do not need to apply the reduction polynomial value.So the 2nd loop is not needed. We can directly consider 1111101 and XOR with 100101 to get the 5-bit number.
I think you didn't look at my example carefully enough. If a = 11111

then c = 101010101 which is 9 bits. The IP is used once which removes

the one in the msb and as you say, the next bit is a zero which is not

changed. So we get two bits in the first step. But that still leaves

us with a seven bit number and each further application of the IP only

removes a single bit from the resulting value of c.

> http://www-brs.ub.ruhr-uni-bochum.de...deepS/diss.pdf

>

> I am quite not sure if you can access this link.
Yes, I downloaded it and can open it, just not in Acrobat, I have an

older copy which won't open this file, I have to use SumatraPDF reader.

> If you are unable, then please google "elliptic curve cryptography for constrained devices" and check out the first link. On page number 132 (figure 8..2), the multiplier circuit for 163 bit is given. The reduction polynomial for 163 bit is, F(x) = x^163 + x^7 + x^6 + x^3 +1
I think you mean figure 8.1 on page 132? That is what I am seeing.

> So if you closely take a look on the circuit, then you can see that almost all the parts are similar. But there are 3 extra XOR gates are connected. The first extra XOR gate is connected after C2 register ( it is for x^3 which is a part of irreducible polynomial). The second extra XOR gate is connected after C5 register ( for x^6 which is a part of irreducible polynomial) and the third extra XOR gate is connected after C6 register ( for x^7 term in the polynomial). The output of C162 is directly connected to the XOR gate before C0. Because we do not need to add XOR gates for x^162 and 1.

>

> Similarly for a 5-bit number, the reduction polynomial is F(x) = x^5 + x^2 + 1

> So in this case we need to connect only one extra XOR gate after C1 (or before C0). And the output from c4 is directly connected to the XOR gate before C0.

>

> In this case the reduction polynomial is attached. So the result if we take both A and B are of 5-bit numbers, then we will get the 5-bit number as an output.

> This is the basic operation for the multiplier.

>

> Now please look at the squaring circuit on page 133 (figure 8.3). I am unable to understand the circuit. I am not sure if the reduction polynomial is attached here. Please have a look on it and let me know if you understand.
I assume you mean figure 8.2 on page 133.

Sorry, I can follow your directions above, but I understand this less

than you do. However, I think there is a lot not shown in Figure 8.2 on

page 133. I guess most, if not each output bit has at least one XOR

gate. I am guessing that they have manually "unrolled" the loop of the

calculation and it distills to a series of XOR gates on each output. In

fact, I seem to recall doing exactly that with a CRC computation once.

We had to calculate CRC on some 32 bits of input at a time. I took the

single bit calculation and applied it 32 times algebraically to produce

a fairly messy, but accurate result. We can do that in VHDL without

manually calculating the coefficients. We just apply the IP ANDed with

the corresponding bit of the temp variable and the calculations will be

implemented.

So if you want a fully combinatorial circuit like they are describing in

the squaring circuit of Figure 8.2 I will write a loop of combinatorial

logic. I can only think the complication of the code you provided where

they copy other higher order bits into the "zero" bits of c must be a

way to simplify the circuit and save the logic of the zero bits.

Although things like zero inputs are often optimized automatically in

FPGA design tools making this unnecessary.

--

Rick