# **Evaluation of Parallel Multipliers in VHDL using Programmable Logic Devices**

Carlos A. F. da Silva, Tiago M. Araki, Milene Händel, Renato P. Ribas, André Reis

{csilva, tmaraki, milene, andreis, rpribas}@inf.ufrgs.br

Instituto de Informática - UFRGS

Av. Bento Gonçalves, 9500 - Cpx 15064

CEP 91501-970 Porto Alegre - RS, Brasil

#### Abstract

The objective of this study is evaluating different kinds of unsigned parallel multipliers based on area and speed criteria in reprogrammable devices. Results from the study of the multipliers compilation will show that in most cases. integrated reprogrammable circuitshave the multiply operator with good design and performance.

## **1 - Introduction**

Multiplication is an important arithmetic operation because it is large used in signalprocessing circuits, vectorial computation and cryptography. The performance of multiplier circuits must be the best possible. Optimization in size, speed and power consumption is fundamental in any system that uses this operation. In the multiplication operation, the addition of the partial products is the most time consuming process. In this study, the way to reduce the number of partial products is discussed. The techniques for optimization involved are: array multiplier, Wallace-tree multiplier, modular multiplier, and library multiplier. The library multiplier is automatically instantiated with the use of the ' \* ' operator. Each manufacturator has its own multiplication circuit. The numbers of bits of the Multipliers evaluated are 4x4, 8x8, 16x16, 32x32 bits. The available PLD technologies are FPGA (Field - Programming Gate Array) and CPLD (Complex Programmable Logic Device). The FPGA was used because it can allocate more logic elements than CPLD.

#### 2 – Multipliers

2.1 – Array multiplier

From all structures analyzed, this circuit is the simplest that can be designed. It is composed of full-adders array. The delay in this worst case is linearly proportional to the multiplier size. It is the slowest and biggest multiplier analyzed. This multiplier is shown in *figure1*.



#### 2.2 – Wallace Tree multiplier

A Wallace tree is a bit-slice adder which adds all the bits in the same bit position. A 3 bitslice adder is presented which is actually a 3input 2-output carry-save full adder. This multiplier is the best (speed and area) for 8x8 bits or less. This multiplier is show in *figure2*.





2.3 – Modular multiplier

To multiply two numbers of large sizes, a modular structure is needed which can generate and sum sub-products and can have a recursive organization. Multiply Modules are array multipliers which can perform fast multiplication on short or moderate-length operands. Various n-by-n multiply modules are shown in *figure3*.



## 3 – PLD Platform

This for Altera analysis was made manufacturer. The chosen device from Altera was Flex10k (FPGA, design EPF10k70, with 70 K gates and 3744 logic elements). The chosen device from Xilinx was v200ebg352-6 (FPGA, family Virtex-E, 4704 logic elements).

# 4 – Results

Results from the compilation of the multipliers in FPGA for the Altera MaxPlusII are sorted in *table1*.

## 5 – Conclusion

With the collected simulation and their analysis in *table1*, we can conclude that the library multiplier is the most efficient (area and speed) multiplier above 16x16 bits. For other cases, 4x4 bits and 8x8 bits, the Wallace Tree multiplier is the best parallel multiplier. This fact occurs because these library multipliers are designed specially to its device, and its programmer or manufacturer knows how to obtain the best optimization for multipliers greater than 16x16 bits.

The results presented on *table2* show that the library multiplier is the most efficient in terms of area and speed for all multipliers lengths analyzed. This happens because the synthesizer infers a multiplier from the VHDL code and optimizes its synthesis.

# 6 – References

**PARHAMI,** Behrooz. Computer Arithmetic: Algorithms and Hardware Design. Oxford University Press, New York. 2000.

**RABAYE,** Jan M. Digital Integrated Circuits: A Design Perspective. Second Edition. Prentice Hall Electronics and VLSI Series. 2003. LU, Mi. Arithmetic and Logic in Computer Systems. John Wiley & Sons, New Jersey 2004

|                   | JCI3C 9.2004. |              |               |               |
|-------------------|---------------|--------------|---------------|---------------|
| table1            | 32x32 bits    | 16x16 bits   | 8x8 bits      | 4x4 bits      |
| Flex 10k20RC270-2 | L.E / Td(ns)  | L.E / Td(ns) | L.E / Td (ns) | L.E / Td (ns) |
| Wallace-Tree      | 2348 / 283,4  | 593 / 151    | 153 / 71,5    | 28 / 34,6     |
| Array             | 3676 / 298,9  | 858 / 150,4  | 178 / 77,5    | 37 / 40,1     |
| Modular           | 3151 / 282,8  | 727 / 151,4  | 160 / 74      | 28 / 34,6     |
| Library (a * b)   | 3107 /247,5   | 748 / 128,5  | 169 / 76,3    | 32 / 37,8     |
|                   |               |              |               |               |
| table 2           | 32x32 bits    | 16x16 bits   | 8x8 bits      | 4x4 bits      |
| v200ebg352-6      | L.E / Td(ns)  | L.E / Td(ns) | L.E / Td (ns) | L.E / Td (ns) |
| Wallace-Tree      | 2101 / 111,0  | 536 / 57,2   | 128 / 31,2    | 30 / 18,3     |
| Array             | 2015/109,6    | 495 / 57,0   | 119 / 30,6    | 27 / 16,9     |
| Modular           | 1499 / 31,3   | 355 / 24,8   | 79 / 18,7     | 15 / 12,5     |
| Library (a * b)   | 1057 /25,5    | 265 / 20,3   | 65 / 16,2     | 15 / 12,5     |