# A 1 GHz 64 b Serial Peripheral Interface for 40 nm Bulk CMOS Technology

Rodrigo N. Wuerdig, Filipe Baumgratz, Sergio Bampi

Informatics Institute - PGMICRO - Federal University of Rio Grande do Sul (UFRGS) - Porto Alegre - Brazil {rodrigo.wuerdig, fdbaumgratz, bampi}@inf.ufrgs.br

*Abstract*— This work proposes a high-speed and straightforward Serial Peripheral Interface (SPI), reaching up to 125 MB/s of throughput while operating at 1 GHz. This work also explains the processes intrinsic to turn a hardware description into a GDSII file of the SPI. The SPI architecture relies on a robust and simplistic cascade structure of master-slave flip-flops, being capable of operating in a broad range of frequencies. The frequency scalability of the offered architecture herein this paper makes it suitable for different applications, like IoT, where a lower operating frequency could fit the energy economy requirements. The architecture is logically and physically synthesized under Cadence commercial framework using TSMC 40 nm technology node and standard cell library and simulated at every process, supply voltage, and temperature (PVT) corner.

#### I. INTRODUCTION

With the rising popularity of IoT over the last decade, which requires simple, low-power, and robust solutions the Serial Peripheral Interface (SPI) still remains one of the most popular interfaces for inter- and intra-chip communication [1]. SPI is often compared to the I<sup>2</sup>C protocol, each one has its key advantages. While I<sup>2</sup>C requires less spare pins, SPI can achieve a greater throughput [2], this work proposes a SPI that could reach up to 125 MB/s while operating at 1 GHz.

To make circuits easier and with a more favorable yield, the majority part of the industry resorts to the digital cell-based design method. The cell-based design consists of a collection of cells, with different logical functions and driving strengths, that are called standard-cells. Circuits composed of standard cells are more promising to work since each piece is fully validated.

This document describes the steps taken to turn a hardware description into a polygon description file (e.g., GDSII and OASIS), GDSII is a database file format which is the de facto industry standard for data exchange of integrated circuit or IC layout artwork, which is one of the last step before making the masks for the lithographic process. A process of translating the HDL (Hardware Description Language) file into an arrangement of cells.

This paper is organized as follows. The Multi-corner Logical Synthesis steps are covered under Section II. Precautions and details of modern technology synthesis are discussed in Section III. Finally, Sections IV and V draws the main conclusions from the designed SPI.

## **II. LOGICAL SYNTHESIS**

Logical synthesis is one of the steps taken to develop an IP core. Starting from an HDL file, the main objective of

this step is to refine the description, translating it from the RTL (Register Transfer Level) to logical gates (cells). This translation can be made manually or aided via some software, such as Cadence Genus and Synopsys Design Compiler. This project was done using the Cadence framework. In order to



Fig. 1: SPI Logical Schematic

give enough drive current to attach this IP to PAD pins, specific constraints were set in order to mimic an specific output capacitance. The circuit is synthesized using the specified clock constraint of 1 GHz.

#### A. Multi Mode Multi Corner Flow

Multi-mode multi-corner (MMMC) analysis refers to performing STA (Static Time Analysis) across multiple operating modes, PVT (Process, Voltage, and Temperature) corners and parasitic interconnect corners at the same time. Cadence Genus<sup>TM</sup>supports this feature. To enable MMMC, a file should be made describing every operating condition [Table I] and appending them to the corresponding liberty file (.*lib*).

TABLE I: Utilized views for the multi-corner synthesis of the SPI.

| Corner       | Temperature (°C) | Supply Voltage (V) | Process |
|--------------|------------------|--------------------|---------|
| Worst Case   | 125              | 0.81               | Worst   |
| Nominal Case | 25               | 0.90               | Nominal |
| Best Case    | -40              | 0.99               | Best    |

MMMC synthesis can further give the possibility to generate reports for every view during logical and physical synthesis. This multi-corner reports can roughly display how the circuit will operate in different conditions, this will also open the possibility to extract different delay files (*.sdf*) for further HDL simulation.

#### **III. PHYSICAL SYNTHESIS**

The main idea of the physical synthesis is to integrate the netlist (usually a Verilog file) generated by the logical synthesis. The netlist generated by the logical synthesis describes an association of library cells that perform what is described on the RTL level before logical synthesis. This work employs the Cadence Innovus<sup>TM</sup>implementation tools.



Fig. 2: SPI Floorplan

The floorplan (Fig. 2) aspect ratio was set in order to fit write and read buses (64 bits each) at the bottom of the IP. This aspect also aims to facilitate the addition of PAD cells, considering that the top side groups every output pin.

The SPI powerplan (Fig. 3) is quite simple, considering the single voltage domain. The physical synthesis of this step is only composed by the process of correlating every power net, specific commands for generating power rings, doing special routing, and etc.. Usually, during the powerplan, vertical metal stripes are attached to the supply rings, providing a more homogeneous  $V_{DD}$  on the design and countering the IR drop. The proposed design does not use vertical stripes, considering the trade-off between the voltage drop across supply rails and the small IP width.



Fig. 3: SPI Powerplan & Welltaps

Modern technologies nodes (under 65 nm) are commonly implemented with tapless library cells, which do not have built-in well taps for bulk connection in order to minimize cell height. Then, to prevent latch-up, special tap cells connects nwells to  $V_{DD}$  and p-sub to  $V_{SS}$  based on tap rules defined in the DRC file. This work uses well taps in a pre-defined 40 µm spacing to prevent latch-up. Well tap spacing also affects the threshold voltage of cell transistors. Cells that are closer to tap cells have a higher threshold, increasing its transition delays and reducing its leakage current. While on cells that are far opposite effects are seem.

Cell placement (Fig. 4) takes the specified cells on the netlist and tries to place them on the optimal positions on the floorplan. After the initial placement, two more incremental placements are made to avoid cell placement problems that could cause DRC (Design Rules Check) violations.



Fig. 4: SPI After Placement

Clock tree synthesis (CTS) is a critical step in the physical synthesis flow. An optimized clock tree (CT) can help avoid serious issues (like excessive power consumption, routing congestion, and elongated timing closure phase) further down the flow [3]. There are many elements that affect the way a clock tree is made since designers often have to run many experiments in an effort to optimize the clock tree. Clock gating arrangements, CTS targets, clock library cell types and even placement of spare cells have a direct impact on the quality of a clock tree.

After CTS, the routing process determines the path for interconnection. Routing includes the standard-cells and pins (the pins on the block boundary or pads at the chip boundary).

In the routing stage (Fig. 5), metal and vias are used to create the electrical connections in the layout so as, to complete all connections defined by the input netlist.



Fig. 5: SPI After Route

The use of fillers can be divided into two main things, cell fillers and metal fillers. The IP proposed in this paper uses both fillers. Cell fillers are responsible to give poly-silicon homogeneity across the floorplan.



Fig. 6: SPI After Fillers

Another concern on what comes to filler insertion is to guarantee that metal fillers are connected to  $V_{SS}$ . So, having the metal fillers attached to the grounding net, opens the capability of fillers act as an electromagnetic interference (EMI) shield for the circuit and also increase the planarity of the circuit.

# IV. RESULTS

Reports for the area, power dissipation, and delay were done under multiple operating conditions. Results display clues of circuit functionality.

TABLE II: SPI Area Results

|     | Cell Inst. | Area      |
|-----|------------|-----------|
| SPI | 271        | 610.7 µm² |

The circuit does not change its area values for every corner (Tab. II), since the circuit is synthesized with the worst corner and analyzed with every other corner to simulate the results, like in the industry.



Fig. 7: Estimated slack values for each PVT corner.

Slack results (Fig. 7) greater than zero contributes with a clue of possible circuit functionality under every corner. Power dissipation Fig. 8 demonstrated a realistic curve, almost linear variation, across different operating conditions varying from 1.25 mW in the worst corner up to 1.88 mW at the best corner.

The circuit is simulated using a Verilog schematic along with the extracted standard delay format (SDF) files of every



Fig. 8: Estimated power dissipation values for each PVT corner.

synthesis step and operating corner to verify circuit functionality. The simulations were also done under Cadence framework, using the Incisive Enterprise<sup>TM</sup>simulator.

Running at such high transfer speeds, 125 MB/s, could make no sense for IoT applications, but this work endorses the possibility of a single, simple, and efficient architecture that could run in a broad spectrum of frequencies and transfer-speeds. This single architecture can work in an energy prior mode by reducing its frequency, Fig.8 displays that the significant part of the total power consumption is because of the switching power, while this architecture could also run in a high-speed scenario.

## V. CONCLUSION

An robust and simple SPI interface design was described in this paper, to be used as an infra-structure IP for 40 nm CMOS Analog-Mixed Signal designs. The process of synthesizing the SPI for TSMC 40 nm was a great addition to the process of learning new skills related to the digital workflow. This work does not cover the whole process necessary to have a guarantee of a fully functional circuit, but, at least it has introduced and instigated the search for the right practices for logical and physical synthesis.

## VI. ACKNOWLEDGMENT

The authors would like to thank CNPq, Capes, and Fapergs Brazilian agencies for financial support to our research.

#### REFERENCES

- A. K. Oudjida, M. L. Berrandjia, A. Liacha, R. Tiar, K. Tahraoui, and Y. N. Alhoumays, "Design and test of general-purpose spi master/slave ips on opb bus," in 2010 7th International Multi- Conference on Systems, Signals and Devices, June 2010, pp. 1–6.
- [2] F. Leens, "An introduction to i2c and spi protocols," *IEEE Instrumentation Measurement Magazine*, vol. 12, no. 1, pp. 8–13, February 2009.
- [3] A. G. K. S. David Flynn, Rob Aitken, Low Power Methodology Manual: For System-on-Chip Design (Integrated Circuits and Systems). Springer, 2007.