# **EVALUATION OF DIFERRENT XOR GATES**

Alberto W. Júnior, Felipe dos S. Marranghello, Renato P. Ribas, André I. Reis {awjunior,fsmarranghello,rpribas,andreis}@inf.ufrgs.br Universidade Federal do Rio Grande do Sul

# ABSTRACT

This paper presents a new topology for the xor function and compares it with others implementations found in literature. All topologies are simulated in HSPICE as a preliminary analysis and the layout for some cells is generated using a commercial tool for the PTM45nm [1] technology. The evaluation is done regarding the desirable characteristics for a cell in a standard cell library. Not only the cell area, delay and power consumption are important characteristics but also the output signal quality and the input connections to the cell must be considered. The new topology is suitable for utilization in the standard library design flow and presents some advantages when compared to previous options.

### **1. INTRODUCTION**

The standard cell library flow is the most used methodology for IC designs. Thus, the quality of the cells inside the library directly interferes in the quality of the manufactured circuit itself.

Cell libraries usually have a range of basic cells, such as NORs, NANDs, INVs, AOI, OAI which are normally designed using Complementary Series Parallel [CSP] CMOS style. In this style, we have a pull-up and a pulldown network, made only by PMOS and NMOS transistors respectively, that are complementary to each other and have only series-parallel liaisons that perform a given logic function. This logic style is the most commonly used in standard cell though it has several restrictions regarding the transistors arrangement.

Nevertheless, most of the time these libraries have some extra cells such as XORs, FA, HA and MUX which are usually made in other styles since the standard CMOS design leads to poor performance for these cells.

In this context, XORs have a major role in several circuits, including comparators, parity checkers, full and half adders and so on. Some implementations are based on 3, 4, 6, 8, 9 and 10 transistors, [2], [3], [4], [5] and [6].

This paper proposal is to analyze some of the proposed implementations regarding their utilization in a standard cell library. Also, a new topology to accomplish this logic function is proposed. The cells are simulated and a cell library is built and characterized by a commercial EDA tool.

The remaining portion of this paper is organized as follows. Section 2 is a background on standard cells libraries. In Section 3, several XOR topologies are presented included the new proposal. Section 4 presents the results obtained from experiments and section 5 presents the final conclusions.

# 2. BACKGROUND

A Standard Cell Library is a set of cells that can be used during logical synthesis. Therefore, the electrical behavior of each cell must be known. This includes the propagation delay and dynamic power consumption for each input transition for several input slopes and output loads, as well as static power consumption, area and the input capacitance.

Even though many commercial libraries are handcrafted, there are tools, both commercial and academic, which perform the cell creation task and provide libraries that may be as good as the hand-crafted ones in a smaller time.

Usually all cell's inputs are connected only to one or more transistors gates. Some topologies use one or more inputs directly connected to the source or drain of a transistor, this can complicate the characterization task since the pin's input capacitance depends on the input value. That is a major issue because the input capacitance is represent within the library by only one number and the utilization of a cell with drain/source input will lead to delay and power estimation errors since the input capacitance will not be precise. Therefore, cells containing source/drains inputs are not used in standard cell libraries.

Another problem is a PMOS conducting a logic zero and/or a NMOS conducting a logic one. A PMOS (NMOS) driving the logic value 0(1) degrades output's value adding (subtracting) the threshold voltage to the signal. Therefore, it is possible to lose the logical information if this signal goes through several transistors and it is necessary to restore the signal at some point. Pass Transistor Logic (PTL) is a design style that uses this transistors configuration. This kind of topology may also lead to an increased leakage current. If an inverter is added after a PMOS(NMOS) conducting a 0(1) none of transistors will be completely off due the signal degradation and therefore the static current will be greater than it usually is. One workaround to this problem is using transmission gates instead of a single transistor. The circuit area increases but the electrical problems are solved.

Only the cells available in the library may be used in the design, consequently it is usual to have several gates that implement the same logic function but have distinguished driving capabilities; this is known as the cell drive strength. The drive strength is roughly proportional to the transistors width. To double the cell's drive strength is to double the width of all transistors inside it. Nevertheless, a cell may only have a drive capability if it drains current from the power net. Thus, a transmission gate has no drive capability.

# **3. TOPOLOGIES**

This section presents several different ways to implement the XOR function found in literature and a new proposal. They are classified according to the number of transistors needed to implement the cell, the smallest number of transistors used is three and the greatest is 10. There is a brief analysis of each topology considering advantages and disadvantages regarding their implementation in a standard cell library. A XOR2\_nT is a cell that implements the xor function and uses n transistors.

### 3.1. XOR2\_3T

The topology proposed in [2] uses only three transistors. This implementation has several electrical problems. When the input vector is A='1' and B='0' there is a short circuit. Thus, the topology itself does not guarantee the cells correct functional behavior. It is needed to size the transistors to make the gate logically correct. There is also bad zero conduction through PMOS. This topology is shown in figure 1.



Figure 1: XOR2\_3T Topology

# 3.2. XOR2\_4T

The 4T xor, as proposed in [3], is shown in figure2. It uses the gate-diffusion-input (GDI) style. This topology has no short circuit problem and it logically works despite the size of the transistors, but it has drain input and zero conduction through the PMOS when A='0' and B='0' and when A='1' and B='0'.



Figure 2: XOR2\_4T Topology

# 3.3. XOR2\_6T

This topology is presented in [4]. It is a four transistors XNOR cell with an inverter at the output. It has drain/source input and bad conduction through NMOS or PMOS that may lead to static power consumption since none of the inverter's transistors are completely off for some input combination (e.g A='1' and B='0'). The topology is presented in figure 3.



Figure 3: XOR2\_6T Topology

#### 3.4. XOR2\_8T

This topology as presented in [5] is shown in figure 4. It is a PTL structure that uses transmission gates in order to guarantee that a 1(0) always goes through a PMOS(NMOS). Note that besides the four transistors shown in the figure, one inverter for each input is also required.



Figure 4: XOR2\_8T Topology

### 3.5. XOR2\_9T

This is also a new topology. It looks like a multiplexer implementation but it has logic sharing between the two planes. Therefore, the two inverters that are required to negate the inputs can be implemented adding only one extra transistor. It has no major electrical problems.



Figure 5: New XOR2\_9T Topology

# 3.6. FREE STANDARD CELL LIBRARY

These two topologies are described in [6] and are shown in figure 6. They are designed to be used in a standard cell library. Therefore they have no electrical problems. One has nine transistors (xor2v0) and the other has ten transistors (xor2v2). Within this library, the 9T cell is the smallest, and the 10T is the faster.



Figure 6: XOR2V2 and XOR2V0 Topologies

From all the topologies analyzed here, only three (xor2\_9t, xor2v0 and xor2v2) have all the desirable characteristics for cells used in a standard cell library.

### 4. RESULTS

This section presents how the topologies were validated and compared to each other. All tasks described here use the PTM 45nm[1] technology. All the transistors have the same gate length of 50 nm (which is the minimum value allowed by the technology). All the PMOS transistors have a width of 205 nm and all NMOS transistors have a width of 90 nm.

All topologies were logically verified through electrical simulation in HSPICE. All, excluding the 3T, performed the xor function as expected. In order to make the 3T cell logical behavior correct it was needed to make the PMOS 10 times larger than the NMOS.

In addition, the cells discussed in session 3 were characterized using HSPICE. Seven inputs slopes from 3 picoseconds to 150 picoseconds and seven output loads from 0.4 femtofarads to 10 femtofarads are used, totalizing 49 simulations for each cell. For each pair of input slope and output capacitance all the eight possible delays are measured. Instead of connecting an ideal source to each cell input, the source is connected to an inverter chain as shown in figure 7.



#### Figure 7: HSPICE Characterization Circuit

The measured delay is the xor delay plus two inverters delay. Also, the power consumption (defined here as the current drained from the power net) is the consumption of all cells and it is measured for 200MHz operation frequency. This methodology helps to obtain more trustable results since, accordingly to the definition used in this paper, a topology that it is not connected to the power net has no power consumption, but it may increase the previous cell's consumption. Table 1 shows the delays results.

# **Table 1: HSPICE Delay Simulation**

|--|

| Topology | MIN    | MAX    | MEAN   |
|----------|--------|--------|--------|
| xor2_3T  | 13.952 | 243.99 | 61.820 |
| xor2_4T  | 18.933 | 192.57 | 53.498 |
| xor2_6T  | 18.933 | 287.80 | 77.068 |
| xor2_8T  | 35.449 | 177.81 | 66.029 |
| xor2_9T  | 40.558 | 228.98 | 76.594 |
| xor2v0   | 39.618 | 232.06 | 76.882 |
| xor2v2   | 31.543 | 182.29 | 70.505 |

For all topologies the delay presents a considerable variation from the minimum value to the maximum value. That is mostly due to the fact that all the transistors have the same size regardless of the stack they are in. Therefore, the longer stacks will have a greater delay. Nevertheless, this table gives a reasonable idea about the performance of each cell since the transistors sizing could be done for all topologies. Since the worst case determines the maximum frequency operation of the cell the fastest topologies are the xor2\_4t, xor2\_8t and the xor2v2. Table 2 shows the power consumption results.

**Table 2: HSPICE Power Simulation** 

| Power(uW) |        |        |        |
|-----------|--------|--------|--------|
| Topology  | MIN    | MAX    | MEAN   |
| xor2_3T   | 0.2618 | 50.156 | 15.978 |
| xor2_4T   | 0.2393 | 7.4924 | 3.0113 |
| xor2_6T   | 0.4203 | 10.106 | 4.9623 |
| xor2_8T   | 0.4157 | 2.7570 | 0.7758 |
| xor2_9T   | 0.4356 | 2.7374 | 0.7729 |
| xor2v0    | 0.4623 | 2.7742 | 0.7942 |
| xor2v2    | 0.4233 | 2.7216 | 0.7617 |

As would be expected, the xor2\_3t has a bigger consumption then the others because of the short circuit. Also, the xor2\_6t is penalized in this analysis because of the degenerated signal that arrives at the output inverter.

### Table 3: DelayXPower Metric

| Delay*Power(aJ) |         |           |          |
|-----------------|---------|-----------|----------|
| Topology        | MIN     | MAX       | MEAN     |
| xor2_3T         | 4.0711  | 8493.7008 | 973.2765 |
| xor2_4T         | 6.1952  | 1213.6135 | 214.3105 |
| xor2_6T         | 15.437  | 2771.1110 | 464.9963 |
| xor2_8T         | 20.134  | 788.6398  | 103.2706 |
| xor2_9T         | 14.058  | 562.4677  | 74.8082  |
| xor2v0          | 14.3838 | 584.6796  | 74.2956  |
| xor2v2          | 15.8430 | 447.6430  | 67.2251  |

The four topologies that presented the best results for the delayXpower metric are the xor2\_8t, xor2\_9t, xor2v0 and xor2v2. These cells were created using a commercial tool for library creation. From these topologies only the xor2\_8t should not be used in a standard cell library. This tool constructs a layout and characterizes the cells accordingly to the topology defined in the spice file. The technology, operating conditions and the set of input slopes and output loads are the same as the ones used during spice simulation. The cells were compared regarding area, power and delay. The rest of this section presents the results obtained. Table 4 presents the cell layout area.

| Table 4: Cell Layout A | Area |
|------------------------|------|
|------------------------|------|

| Topology | Area(um <sup>2</sup> ) |
|----------|------------------------|
| xor2_8t  | 1.596                  |
| xor2_9t  | 1.596                  |
| xor2v0   | 1.862                  |
| xor2v2   | 2.128                  |

From this table it is possible that a cell with a bigger number of transistors does not necessarily have a bigger area and two cells with the same number of transistors do not have necessarily the same area. Both xor2\_9t and xor2v0 topologies have nine transistors and xor2\_8t have eight transistors. Table 5 presents the static power consumption for each cell.

**Table 5: Characterized Cell Leakage Power** 

| Topology | Leakage(nW) |
|----------|-------------|
| xor2_8t  | 11.33       |
| xor2_9t  | 12.56       |
| xor2v0   | 12.56       |
| xor2v2   | 14.38       |

None of this topologies have static power consumption problem. Therefore, as the number of transistors in the cell increases so does the leakage.

| Table 6: Characterized ( | Cell Dynamic Power |
|--------------------------|--------------------|
|--------------------------|--------------------|

| Topology | Power |      |       |
|----------|-------|------|-------|
|          | MIN   | MAX  | MEAN  |
| xor2_8t  | -4.85 | 1.66 | 0.150 |
| xor2_9t  | 0.76  | 1.83 | 1.045 |
| xor2v0   | 1.13  | 2.00 | 1.145 |
| xor2v2   | 1.25  | 2.30 | 1.300 |

This table considers only the current drained by the cell. The consumption is measured adding the current that flows through each input and from the power net. The negative value for the xor2\_8t is consistent because the current may flow in both directions through the inputs. For the other three topologies (xor2\_9t, xor2v0, xor2v2) the current in each input is near zero. These results have a difference from the HSPICE simulation once the methodologies used in each case are different. The simulation considered only the current drained from the power net (therefore it is impossible to obtain a negative number) and it also considered the existence of a cell that drives the xor cell. Table 7 presents the cell delay for the characterized cells.

| Table 7. Cell Delay | Tabl | e 7: | Cell | Delay |
|---------------------|------|------|------|-------|
|---------------------|------|------|------|-------|

| Topology | Delay(ps) |        |       |
|----------|-----------|--------|-------|
|          | MIN       | MAX    | MEAN  |
| xor2_8t  | 16.33     | 198.28 | 62.58 |
| xor2_9t  | 25.04     | 249.10 | 79.77 |

| xor2v0 | 28.17 | 251.06 | 83.14 |
|--------|-------|--------|-------|
| xor2v2 | 26.33 | 205.61 | 74.22 |

Analyzing the power, area and delay result it is possible to conclude that the new is a improvement considering the previous version with nine transistors and presents good results when compared to the others.

#### **5. CONCLUSIONS**

This paper compares several cells topologies, designed with different logic styles and transistors number, for the xor function and proposes a new one. The simulations using HSPICE are done in a way that allows estimations about the impact of the utilization of each cell on a circuit instead of looking into the cell as a standalone object. Also, the creation of a layout for several topologies gives a more accurate result for the delay and area. It is shown that a greater number of transistors in a cell does not guarantee a smaller area in the final layout as well as it does not guarantee a faster cell. The new proposal appears to be a good option for implementation in a standard cell library.

#### **10. REFERENCES**

[1] W. Zhao, Y. Cao, "New generation of Predictive Technology Model for sub-45nm early design exploration", IEEE Transactions on Electron Devices, vol. 53, no. 11, pp. 2816-2823, November 2006.

[2] Sreehari Veeramachaneni and Hyderabad, "New improved 1-bit adder cells", CCECE/CCGEI, Niagara Falls. Canada, May 5-7 2008, pp. 735-738.

[3] Dan Wang ; Maofeng Yang ; Wu Cheng ; Xuguang Guan ; Zhangming Zhu ; Yintang Yang ; "Novel low power full adder cells in 180nm CMOS technology", Industrial Electronics and Applications, 2009. ICIEA 2009. 4th IEEE Conference on 25-27 May 2009 430 – 433 Xi'an Print ISBN: 978-1-4244-2799-4

[4] Jyh-Ming Wang, Sung-Chuan Fang, Wu-Shiung Feng, "New Efficient Designs for XOR and XNOR Functions on the Transistor Level", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 29, NO. 7, JULY 1994

[5] N. Weste and K. Eshraghian, *Principles of CMOS VLSI Design*, Addison-Wesley, Reading, MA, 1985.

[6] Graham Petley, <u>www.vlsitechnology.org</u> last access: June, 7<sup>th</sup>, 2010