# Variable Block Adder (1A)

•

•

Copyright (c) 2022 - 2010 Young W. Lim.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

Please send corrections (or suggestions) to youngwlim@hotmail.com.

This document was produced by using OpenOffice and Octave.



Any kill or generate condition results in divided (broken) critical paths

All FA's in R-2 groups must have the propagate condition

The maximal delay  $\Delta$  of a Carry Skip Adder is encountered when carry is generated in the least-significant bit position,

- rippling through *k-1* bit positions,
- skipping over R-2 = N/k-2 groups in the middle,
- rippling to the k-1 bits of most significant group and
- being assimilated in the *N-th* bit position to produce the sum  $S_N$ :

$$\begin{split} \Delta_{\text{CSA}} &= (k-1) \, \Delta_{\text{rca}} + (R-2) \, \Delta_{\text{SKIP}} + (k-1) \, \Delta_{\text{rca}} \\ &= \, 2 \, (k-1) \, \Delta_{\text{rca}} + (R-2) \, \Delta_{\text{SKIP}} \\ &= \, 2 \, (k-1) \, \Delta_{\text{rca}} + (N/k-2) \, \Delta_{\text{SKIP}} \end{split}$$

$$\begin{split} \Delta_{\text{CSA}} &= (k-1) \, \Delta_{\text{rca}} + (R-2) \, \Delta_{\text{SKIP}} + (k-1) \, \Delta_{\text{rca}} \\ &= \, 2 \, (k-1) \, \Delta_{\text{rca}} + (R-2) \, \Delta_{\text{SKIP}} \\ &= \, 2 \, (k-1) \, \Delta_{\text{rca}} + (N/k-2) \, \Delta_{\text{SKIP}} \end{split}$$

Carry Skip Adder is faster than RCA at the expense of a few relatively simple modifications.

The delay is still linearly dependent on the size of the adder N, however this linear dependence is reduced by a factor of 1/k

 $N = R \cdot k$ 



however, <u>unlike</u> the <u>carry select</u> structure, the <u>variable block</u> adder must also worry about the <u>delay</u> from the <u>Cin</u> input through the block's <u>ripple chain</u>

Thus, after the carry chain passes the <u>midpoint</u> of the logic, the blocks begin <u>decreasing</u> in length.

This <u>balances</u> the path delays in the system and improves performance

The division of the overall structure into blocks depends on the details of the logic structure and the length of the entire computation





https://en.wikipedia.org/wiki/Carry-lookahead\_adder





Delay d1 from A, B to P — parallel, t

- parallel, the <u>same</u> delay in each group

Delay d2 from A, B to C

- serial, the <u>accumulated</u> delay from Isb

Delay d3 from A, B, Ci to Co - ripple carry delay





Delay d3 from A, B, Ci to Co - ripple carry delay







$$N = R \cdot k$$
  $d1 \propto k$ 

 $\frac{d2}{} \propto R \left( = \frac{N}{k} \right)$ 



$$\begin{array}{ccc}
\mathbf{N} = \mathbf{R} \cdot \mathbf{k} & d1 & \infty & \mathbf{k} \\
d2 & \infty & \mathbf{R} & \left( = \frac{\mathbf{N}}{\mathbf{k}} \right)
\end{array}$$





the organization of the blocks in the variable block carry structure bears some similarity to the carry select structure

the early stages of the structure grow in length, with short blocks for the low order bits, building in length further in the chain in order to equalize the arrival time of the carry from the block with that of the previous block

https://en.wikipedia.org/wiki/Carry-lookahead\_adder



11/26/22













17

All carries propagated more quickly by varying the block sizes Accordingly the initial blocks of the adder are made smaller, so as to quickly detect carry generates that must be propagated

The middle blocks are made larger because they are not the problem case,

And then the most significant blocks are made smaller so that the late arriving carry inputs can be processed quickly

https://en.wikipedia.org/wikik/Carry-skip adder



The longest path length through the carry skip block is potentially much shorter than the path from carry-in to carry-out through a ripple carry block.

However, the carry skip block has a slightly longer path from the least significant <g,t> input to carry output

Hence, this adder will only be faster when skipping groups makes up for the extra gate overhead accumulated by going from generate/transfer to carry-out

The maximum path length through a one block wide carry skip adder is the same as though a ripple carry adder, since the bottom block in a skip adder is a ripple carry





Binary Adders, T W Lynch, Master Thesis, University of Texas at Austin 1996

#### Two separate ripple carry adders

Cin signal is used to <u>determine</u> which adder's outputs should be used

if the Cin signal is true, the output (carries) are selected from the true-Cin-adder

if the Cin signal is false, the output (carries) are selected from the false-Cin-adder

redundant hardware <u>removes</u> Cin data dependency

first start redundant computation later select the correct one



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

### Timing in broken carry chains



the length of the adders and the breakpoint are carefully chosen such that the **adders** finish computations just as their Cin become available

Cin

These computations can take place <u>before</u> the completion of the <u>previous columns</u>, since they do <u>not</u> depend on the <u>actual value</u> of the <u>Cin signal</u>

High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry

 $T_2 \leq T_1$ 

Time

true Cin'

false Cin'

true Cin

false Cin

### Carry Select Fast Carry Logic



High Performance Carry Chains for FPGAs, S. Hauck, M. M. Hosler, T. W. Fry



We use a block length <u>from low order</u> to <u>high order cells</u> of 2, 2, 4, 5, 7, 5, 4, 2, 1 for a normal 32 bit structure

8+17+7

The first and last block in each adder is a simple ripple carry chain, while all other blocks use the variable block structure.



https://en.wikipedia.org/wiki/Carry-lookahead\_adder

Delay values of the variable block carry chain relative to other carry chains

The idea behind Variable Block Adder (VBA) is to minimize the longest critical path in the carry chain of a Carry Skip Adder, while allowing the groups to take different sizes.

Such an optimization in general does <u>not</u> result in an <u>increased complexity</u> as compared to the Carry Skip Adder

The <u>first</u> and last blocks are <u>smaller</u>, and the <u>intermediate</u> blocks are <u>larger</u>.

That compensates for the critical paths originating from the ends by <u>shortening</u> the length of the path used for the carry signal to ripple in the end groups, allowing carry to <u>skip over larger</u> groups in the middle.

There are two important consequences of this optimization:

- (a) the total delay is reduced as compared to a Carry Skip Adder
- (b) the delay dependency is <u>not</u> a <u>linear function</u> of the adder size N as in a <u>Carry Skip Adder</u>.

This dependency follows a square root function of N instead

For an optimized VBA, it is possible to obtain a <u>closed form</u> solution expressing this delay dependency

It is also possible to extend this approach to multi-levels of carry skip as done in a determination of the optimal sizes of the blocks on the first and higher levels of skip blocks is a linear programming problem, which does <u>not</u> yield a <u>closed form</u> solution.

Such types of problems are solved with the use of **dynamic programming** techniques.

The speed of such a mult-level VBA adder surpasses single-level VBA and that of fixed group Carry-Lookahead Adder (CLA).

For an optimized VBA, it is possible to obtain a closed form solution expressing this delay dependency which is given as: where: c1, c2, c3 are constants.

$$\Delta_{VBA} = c_1 + \sqrt{c_2 N + c_3}$$

It is also possible to extend this approach to multiple levels of carry skip as done.

- (1) the speed of the logic gates used for CMOS implementation depends on the output load: fan-out, as well as the number of inputs: fan-in.
- (2) CLA implementation is characterized with a <u>large fan-in</u> which <u>limits</u> the available <u>size</u> of the groups.

On the other hand VBA implementation is simple.

Thus, it seems that CLA has passed the point of diminishing returns as far as an efficient implementation is concerned.

This example also points to the importance of modeling and incorporating appropriate technology parameters into the algorithm.

Most of the computer arithmetic algorithms developed in the past use a simple constant gate delay model.

(2.) a fixed-group CLA is not the best way to build an adder.

It is a sub-optimal structure which after being optimized for speed, consists of groups that are different in size yielding a largely irregular structure

There are other advantages of VBA adder that are direct result of its simplicity and efficient optimization of the critical path.

Those advantages are exhibited in the lower area and power consumption while retaining its speed.

Thus, VBA has the lowest energy-delay product as compared to the other adders in its class.

### Delay model

Oklobdzija addition VLSI

On implementing addition in VLSI technology

Delay dependency: Fan-out, Fan-in,

Delay estimates:

$$D_NAND = 1 + 0.3F0 + 0.5(Fi-2)$$
  
 $D_NOR = 1 + 0.5F0 + 0.5(Fi-2)$ 

$$D_NAND = 0.7 + 0.3Fo$$

*t* denote the time required for a carry signal to ripple across a bit *T* denote the time required for the signal to skip over a group of bits *m* denotes the optimal number of groups for an n-bit carry chain *m* is the smallest positive integer satisfying

$$n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1 - (-1)^m)\frac{1}{8}T$$

### Delay model

n: the number of bits in a carry skip adder m: the number of groups into which the bits are divided  $x_1, \ldots, x_m$ : the sizes of the groups beginning with the most significant bit

T: the time required for a carry signal to skip over a group of bits

To be precise we should write T = T(x) to indicate that T depends on the size x of the group over which the carry is skipped However, T changes very slowly with x over the range of group sizes So we assume that T is constant

For a given n, the following three step procedure gives An optimal way of dividing an n bit adder into groups of bits



- total n = 32 bits
- m = 9 groups
- i-th group has x<sub>i</sub> bits (size)
- constant skip delay  $T = T(x_i)$



t denote the time required for a carry signal to ripple across a bit
 T denote the time required for the signal to skip over a group of bits
 m denotes the optimal number of groups for an n-bit carry chain

## X<sub>i</sub> and Y<sub>i</sub> and constraints (1)

 $x_3$  delay<sub>ripple</sub> ripple delay of a group

- *n* bits
- **m** groups

$$y_3 = min\{1+3T, 1+(m+1-3)T\}$$
 
$$\min \{ delay1_{skip}, delay2_{skip} \}$$
 
$$skip delay over a group$$

$$0 \le x_i \le y_i, i=1,...,m$$

$$delay_{ripple} \le delay1_{skip}$$

$$delay_{ripple} \le delay2_{skip}$$

# X<sub>i</sub> and Y<sub>i</sub> and constraints (2)



$$0 \le x_i \le y_i, \quad i=1,\ldots,m$$

#### Determining *m*

**Method 1** – using a histogram

Let *m* be the <u>smallest</u> positive integer such that

$$n \leq \sum_{i=1}^{m} y_i$$

$$y_i = min\{1+iT, 1+(m+1-i)T\}, i = 1,...,m$$

**Method 2** – using a closed formula

Let *m* be the <u>smallest</u> positive integer such that

$$n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1 - (-1)^m)\frac{1}{8}T$$

#### Determining $x_i$ (1)

$$y_i = min\{1+iT, 1+(m+1-i)T\}, i = 1,...,m$$

construct a histogram whose i-th column has height  $y_i$ 

so these  $y_i$ 's are <u>at least n unit squares</u> in the histogram, starting with the first row, shade in n of the squares, <u>row by row</u>

let  $x_i$  denote the number of shaded squares in column i of the histogram,

$$i = 1, ..., m$$



# Determining $x_i$ (2)





The  $x_i$ 's can be computed iteratively as follows:

Initially take  $x_1 = x_m = 0$ 

At each iteration, increase as many of the  $x_i$ 's as possible by one unit, without violating the constraints

$$0 \le x_i \le y_i, i = 1,...,m$$

Thus, at some iteration, we have the algorithm terminates

$$\sum_{i=1}^{m} x_i = n \quad \text{and} \quad$$

# Determining *m* (method 1 : using a histogram)

Let *m* be the <u>smallest</u> positive integer such that

$$n \leq \sum_{i=1}^m y_i$$

$$y_i = min\{1+iT,1+(m+1-i)T\}, i = 1,...,m$$

$$y_1 = 1 + 1 \cdot T$$
  $y_m = 1 + 1 \cdot T$   
 $y_2 = 1 + 2 \cdot T$   $y_{m-1} = 1 + 2 \cdot T$   
 $y_3 = 1 + 3 \cdot T$   $y_{m-2} = 1 + 3 \cdot T$   
 $\vdots$   $\vdots$ 

if 
$$T = 3$$
 $m = 7 \pmod{d}$ 
 $m = 8 \pmod{d}$ 
 $y_1 \mid 4 \mid y_7 \mid y_6 \mid y_2 \mid 7 \mid y_6 \mid y_7 \mid y_6 \mid y_5 \mid 10 \mid y_6 \mid y_5 \mid 16 \mid 19$ 

$$\sum_{i=1}^{7} y_i = 55$$

$$\sum_{i=1}^{8} y_i = 68$$

#### $y_i$ 's for a given m

Let *m* be the <u>smallest</u> positive integer such that

$$n \leq \sum_{i=1}^{m} y_i$$

$$y_i = min\{1+iT, 1+(m+1-i)T\}, i = 1,...,m$$

$$y_i = min\{1+iT,1+(3-i)T\}, i = 1,...,2, \leftarrow m = 2$$
  
 $y_i = min\{1+iT,1+(4-i)T\}, i = 1,...,3, \leftarrow m = 3$   
 $y_i = min\{1+iT,1+(5-i)T\}, i = 1,...,4, \leftarrow m = 4$   
 $y_i = min\{1+iT,1+(6-i)T\}, i = 1,...,5, \leftarrow m = 5$   
 $y_i = min\{1+iT,1+(7-i)T\}, i = 1,...,6, \leftarrow m = 6$   
 $y_i = min\{1+iT,1+(8-i)T\}, i = 1,...,6, \leftarrow m = 7$   
 $y_i = min\{1+iT,1+(9-i)T\}, i = 1,...,7, \leftarrow m = 7$ 

# Histogram – $y_i$ 's for a given m



# Example 1 - (1)

For a 48 bit adder we have, from Figure

$$x_1 = x_7 = 4, x_2 = x_6 = 7, x_3 = 8, x_4 = x_5 = 9$$

The maximum delay is experienced by a signal generated in the second bit position and terminating in the 47<sup>th</sup> bit position

the delay is mT = 21

- total n = 48 bits
- m = 7 groups
- *i*-th group has x<sub>i</sub> bits (size)
- constant skip delay  $T = T(x_i) = 3$

#### Example 1 - (2)





# Example 2 - (1)

consider a 54 bit adder

From 2(i), we see that again m=7.

If we shade 54 squares in Figure, we see that

$$x_1 = x_7 = 4, x_2 = x_6 = 7, x_3 = x_5 = 10, x_4 = 12$$

Yields an optimal division of the adder.

Again the maximum delay is mT = 21

- total n = 54 bits
- m = 7 groups
- *i*-th group has x<sub>i</sub> bits (size)
- constant skip delay  $T = T(x_i) = 3$

#### Example 2 - (2)





10

47

4

## Example 3 - (1)

Consider a 64 bit adder From 2(i) we compute m=8.

the corresponding histogram is shown in Figure

The optimal gorup sizes are:

$$x_1 = x_8 = 4$$
,  $x_2 = x_7 = 7$ ,  $x_3 = x_6 = 10$ ,  $x_4 = x_5 = 11$ 

The delay of the longest signal is mT = 24

- total n = 64 bits
- m = 8 groups
- *i*-th group has x<sub>i</sub> bits (size)
- constant skip delay  $T = T(x_i) = 3$

## Example 3 - (2)





# Determining m (method 2 – using a closed formula)

Let *m* be the <u>smallest</u> positive integer such that

$$n \le \sum_{i=1}^{m} y_i$$
  $n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1 - (-1)^m)\frac{1}{8}T$ 

$$y_i = min\{1+iT, 1+(m+1-i)T\}, i = 1,..., m$$

$$y_1 = 1 + 1 \cdot T$$
  $y_m = 1 + 1 \cdot T$   
 $y_2 = 1 + 2 \cdot T$   $y_{m-1} = 1 + 2 \cdot T$   
 $y_3 = 1 + 3 \cdot T$   $y_{m-2} = 1 + 3 \cdot T$   
 $\vdots$ 

$$n = 48$$
  $n = 54$   $n = 64$   $m = 7$   $m = 8$   $T = 3$   $T = 3$ 

# Odd m and even m cases (1)

(I) Let m be the smallest positive integer such that

$$n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1-(-1)^m)\frac{1}{8}T$$

$$n \leq m + \frac{1}{2}mT + \frac{1}{4}m^2T \qquad (even m)$$

$$n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + \frac{1}{4}T \pmod{m}$$

- total n = 48 bits
- m = 7 groups
- i-th group has x<sub>i</sub> bits (size)
- constant skip delay  $T = T(x_i) = 3$

## Odd m and even m cases (2)

even 
$$m: n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T$$
  
odd  $m: n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + \frac{1}{4}T = m + \frac{1}{4}(m+1)^2T$ 

$$m = 5$$
:  $5 + \frac{5}{2}T + \frac{25}{4}T + \frac{1}{4}T = 5 + \frac{35}{4}T + \frac{1}{4}T = 5 + 9.3 = 32$ 

$$m = 6$$
:  $6 + \frac{6}{2}T + \frac{36}{4}T = 6 + \frac{48}{4}T = 6 + 12 \cdot 3 = 41$ 

$$m = 7$$
:  $7 + \frac{7}{2}T + \frac{49}{4}T + \frac{1}{4}T = 7 + \frac{63}{4}T + \frac{1}{4}T = 7 + 16 \cdot 3 = 55$ 

$$m = 8 : 8 + \frac{8}{2}T + \frac{64}{4}T = 8 + \frac{80}{4}T = 8 + 20.3 = 68$$

$$32 < n \le 41 \quad \Rightarrow \quad m = 6$$

$$41 < n \le 55 \quad \Rightarrow \quad m = 7$$

$$55 < n \le 68 \quad \Rightarrow \quad m = 8$$

## Example 1 - (3)

$$n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1-(-1)^m)\frac{1}{8}T$$

$$48 \leq m + \frac{3}{2}m + \frac{3}{4}m^{2} + (1 - (-1)^{m})\frac{3}{8}$$

• total 
$$n = 48$$
 bits

- m = 7 groups
- i-th group has x<sub>i</sub> bits (size)
- constant skip delay  $T = T(x_i) = 3$

| 1  | 4   |
|----|-----|
| 2  | 8   |
| 3  | 15  |
| 4  | 22  |
| 5  | 32  |
| 6  | 42  |
| 7  | 55  |
| 8  | 68  |
| 9  | 84  |
| 10 | 100 |
|    |     |





#### Example 1 - (4)

n =48 m =7 T =3



$$y_i = min\{1+iT, 1+(m+1-i)T\}, i = 1,...,m$$

$$y_1 = min\{1+1\cdot T, 1+7\cdot T\} = 1+1\cdot T = 4$$
  $0 \le x_1 \le 1+1\cdot T = 4$   $0 \le x_2 \le 1+2\cdot T = 7$   $0 \le x_2 \le 1+2\cdot T = 7$   $0 \le x_3 \le 1+3\cdot T = 10$   $0 \le x_4 \le 1+4\cdot T = 13$   $0 \le x_4 \le 1+4\cdot T = 13$   $0 \le x_4 \le 1+4\cdot T = 13$   $0 \le x_5 \le 1+3\cdot T = 10$   $0 \le x_6 \le 1+2\cdot T = 7$   $0 \le x_7 \le 1+3\cdot T = 10$   $0 \le x_7 \le 1+3\cdot T = 10$ 

$$0 \le x_i \le y_i, i = 1, \dots, m$$

#### Example 2 - (3)

$$n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1-(-1)^m)\frac{1}{8}T$$

$$54 \le m + \frac{3}{2}m + \frac{3}{4}m^2 + (1 - (-1)^m)\frac{3}{8}$$

• total 
$$n = 54$$
 bits

- m = 7 groups
- i-th group has x<sub>i</sub> bits (size)
- constant skip delay  $T = T(x_i) = 3$

| 1  | 4   |
|----|-----|
| 2  | 8   |
| 3  | 15  |
| 4  | 22  |
| 5  | 32  |
| 6  | 42  |
| 7  | 55  |
| 8  | 68  |
| 9  | 84  |
| 10 | 100 |
|    |     |





#### Example 2 - (4)





$$y_i = min\{1+iT, 1+(m+1-i)T\}, i = 1,...,m$$

$$y_1 = min\{1+1\cdot T, 1+7\cdot T\} = 1+1\cdot T = 4$$
  $0 \le x_1 \le 1+1\cdot T = 4$   $0 \le x_2 \le 1+2\cdot T = 7$   $0 \le x_2 \le 1+2\cdot T = 7$   $0 \le x_3 \le 1+3\cdot T = 10$   $0 \le x_4 \le 1+4\cdot T = 13$   $0 \le x_4 \le 1+4\cdot T = 13$   $0 \le x_4 \le 1+4\cdot T = 13$   $0 \le x_5 \le 1+3\cdot T = 10$   $0 \le x_6 \le 1+2\cdot T = 7$   $0 \le x_7 \le 1+3\cdot T = 10$   $0 \le x_7 \le 1+3\cdot T = 10$ 

$$0 \le x_i \le y_i, i = 1, \dots, m$$

## Example 3 - (3)

$$n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1-(-1)^m)\frac{1}{8}T$$

$$64 \leq m + \frac{3}{2}m + \frac{3}{4}m^2 + (1 - (-1)^m)\frac{3}{8}$$

- total n = 64 bits
- m = 7 groups
- i-th group has x<sub>i</sub> bits (size)
- constant skip delay  $T = T(x_i) = 3$

| 1  | 4   |
|----|-----|
| 2  | 8   |
| 3  | 15  |
| 4  | 22  |
| 5  | 32  |
| 6  | 42  |
| 7  | 55  |
| 8  | 68  |
| 9  | 84  |
| 10 | 100 |
|    |     |





#### Example 3 - (4)





$$y_i = min\{1+iT, 1+(m+1-i)T\}, i = 1,...,m$$

Oklobdzija: High-Speed VLSI arithmetic units: adders and multipliers

 $0 \le x_i \le y_i, i = 1, ..., m$ 

# Closed formula for $y_1 + y_2 + ... + y_m$

$$\sum_{i=1}^{m} y_i = m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1 - (-1)^m)\frac{1}{8}T$$

$$\sum_{i=1}^{m} y_i = m + \frac{1}{2}mT + \frac{1}{4}m^2T \qquad (even m)$$

$$\sum_{i=1}^{m} y_i = m + \frac{1}{2}mT + \frac{1}{4}m^2T + \frac{1}{4}T \quad (odd \ m)$$

$$y_i = min\{1+iT, 1+(m+1-i)T\}, i = 1,..., m$$

$$y_1 = 1 + 1 \cdot T$$
  $y_m = 1 + 1 \cdot T$   
 $y_2 = 1 + 2 \cdot T$   $y_{m-1} = 1 + 2 \cdot T$   
 $y_3 = 1 + 3 \cdot T$   $y_{m-2} = 1 + 3 \cdot T$   
 $\vdots$ 

## Closed formula for even m case (1)

$$m = 2k$$

$$y_i = min\{1+iT, 1+(m+1-i)T\}, i = 1,...,m$$
  $m = \frac{1}{2} \cdot k(k+1)$ 

$$y_{1} = \min\{1+1 \cdot T, \qquad 1+(m-0) \cdot T\} \qquad 0 \leq x_{1} \leq 1+1 \cdot T$$

$$y_{2} = \min\{1+2 \cdot T, \qquad 1+(m-1) \cdot T\} \qquad 0 \leq x_{2} \leq 1+2 \cdot T$$

$$y_{3} = \min\{1+3 \cdot T, \qquad 1+(m-2) \cdot T\} \qquad 0 \leq x_{3} \leq 1+3 \cdot T$$

$$y_{k} = \min\{1+k \cdot T, \qquad 1+(k+1) \cdot T\} \qquad 0 \leq x_{k} \leq 1+k \cdot T$$

$$y_{k+1} = \min\{1+(k+1) \cdot T, \qquad 1+k \cdot T\} \qquad 0 \leq x_{k+1} \leq 1+k \cdot T$$

$$y_{m-2} = \min\{1+(m-2) \cdot T, \qquad 1+3 \cdot T\} \qquad 0 \leq x_{m-2} \leq 1+3 \cdot T$$

$$y_{m-1} = \min\{1+(m-1) \cdot T, \qquad 1+2 \cdot T\} \qquad 0 \leq x_{m-1} \leq 1+2 \cdot T$$

$$y_{m-0} = \min\{1+(m-0) \cdot T, \qquad 1+1 \cdot T\} \qquad 0 \leq x_{m-0} \leq 1+1 \cdot T$$

$$0 \le x_i \le y_i, i = 1, \dots, m$$

$$\frac{1}{2} \cdot k(k+1)$$

# Closed formula for even m case (2)

$$1 + 2 + \dots + k = \frac{1}{2}k(k+1)$$

$$\frac{n(a+l)}{2}$$

$$m + 2 \cdot \frac{1}{2}k(k+1)T$$

$$= m + k(k+1)T$$

$$= m + \frac{m}{2}(\frac{m}{2}+1)T$$

$$\frac{n(a+l)}{2}$$

$$= k$$

even 
$$m : m + \frac{1}{2}mT + \frac{1}{4}m^2T$$

odd 
$$m : m + \frac{1}{2}mT + \frac{1}{4}m^2T + \frac{1}{4}T$$

Oklobdzija: High-Speed VLSI arithmetic units: adders and multipliers

 $= m + \frac{m}{2}T + \frac{m^2}{4}T$ 

## Closed formula for odd m case (1)

$$0 \le x_i \le y_i, i = 1, ..., m$$

# Closed formula for odd m case (2)

$$1 + 2 + \dots + k = \frac{1}{2}k(k+1) \qquad \frac{n(a+l)}{2}$$

$$m + 2 \cdot \frac{1}{2}k(k+1)T + (k+1)T \qquad m = 2k+1$$

$$= m + k(k+1)T + (k+1)T$$

$$= m + (k+1)^{2}T \qquad \frac{m+1}{2} = k+1$$

$$= m + \left(\frac{m+1}{2}\right)^{2}T$$

even 
$$m : m + \frac{1}{2}mT + \frac{1}{4}m^2T$$

odd 
$$m : m + \frac{1}{2}mT + \frac{1}{4}m^2T + \frac{1}{4}T$$

Oklobdzija: High-Speed VLSI arithmetic units: adders and multipliers

 $= m + \frac{m^2}{4}T + \frac{m}{2}T + \frac{1}{4}T$ 

#### Round and truncate values

$$round(x) = truncate(x+0.5)$$

$$round(3.2) = truncate(3.2+0.5) = 3$$
  
 $round(3.7) = truncate(3.7+0.5) = 4$ 

$$round\left(\frac{m}{2} + \frac{m^2}{4}\right) = truncate\left(\frac{m}{2} + \frac{m^2}{4} + \frac{1}{4}\right)$$

$$\left(\frac{m}{2} + \left(\frac{m}{2}\right)^2 + \frac{1}{4}\right) = \frac{1}{4}(2m + m^2 + 1) = \frac{1}{4}(m + 1)^2$$

|    | 10    | ( (0) 10 |                 | 4.14      |
|----|-------|----------|-----------------|-----------|
| m  | m/2   | (m/2)^2  | sum             | sum + 1/4 |
| 1  | 0+0.5 | 0+0.25   | 0+0.75          | 1         |
| 2  | 1     | 1        | 2               | 2.25      |
| 3  | 1+0.5 | 2+0.25   | 3+0.75          | 4         |
| 4  | 2     | 4        | 6               | 6.25      |
| 5  | 2+0.5 | 6+0.25   | 8+0.75          | 9         |
| 6  | 3     | 9        | 12              | 12.25     |
| 7  | 3+0.5 | 12+0.25  | <b>15</b> +0.75 | 16        |
| 8  | 4     | 16       | 20              | 20.25     |
| 9  | 4+0.5 | 20+0.25  | 24+0.75         | 25        |
| 10 | 5     | 25       | 30              | 30.25     |
| 11 | 5+0.5 | 30+0.25  | 35+0.75         | 36        |
| 12 | 6     | 36       | 42              | 42.25     |

even 
$$m : m + \frac{1}{2}mT + \frac{1}{4}m^2T$$

odd 
$$m : m + \frac{1}{2}mT + \frac{1}{4}m^2T + \frac{1}{4}T$$

#### Values of m<sup>2</sup>/4

#### Isd = least significant digit



## Procedure (4)

 $X_i$  delay<sub>ripple</sub> ripple delay of a group

$$y_i = min\{1+iT,1+(m+1-i)T\}$$

$$\min \{ delay1_{skip}, delay2_{skip} \}$$

$$skip delay over a group$$

$$0 \le x_i \le y_i, \quad i=1,...,m$$

$$\text{delay}_{\text{ripple}} \le \text{delay1}_{\text{skip}}$$

$$\text{delay}_{\text{ripple}} \le \text{delay2}_{\text{skip}}$$

Oklobdzija: High-Speed VLSI arithmetic units: adders and multipliers

- *n* bits
- **m** groups

find the smallest positive integer m such that

$$n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1 - (-1)^m)\frac{1}{8}T$$

$$n \leq \sum_{i=1}^{m} y_i$$

so these are <u>at least *n* unit squares</u> in the histogram, starting with the first row, row by row

Let  $x_i$  denote the number of shaded squares in column i of the histogram, i = 1, ..., m

#### Procedure (1)

(I) Let m be the smallest positive integer such that

$$n \le m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1-(-1)^m)\frac{1}{8}T$$

- total n = 48 bits
- m = 7 groups
- i-th group has x<sub>i</sub> bits (size)
- constant skip delay  $T = T(x_i) = 3$

(II) Let

$$y_i = min\{1+iT, 1+(m+1-i)T\}, i = 1,...,m$$

and construct a histogram whose *i-th* column has height  $y_i$  for example, for T=3, and n=48, we have m=7

(III) It is easily verified that the area of the histogram in (II) is

$$m + \frac{1}{2}mT + \frac{1}{4}m^2T + (1-(-1)^m)\frac{1}{8}T \ge n$$

so these are <u>at least n unit squares</u> in the histogram starting with the first row, shade in n of the squares, <u>row by row</u> Let  $x_i$  denote the number of shaded squares in column i of the histogram,

$$i = 1, ..., m$$