# Cell Library Design for Ultra-Low Power Internet-of-Things Applications

Michael Lopes de Oliveira Graduate Program in Electrical Engineering Universidade Federal de Minas Gerais Belo Horizonte, Brazil michael-lopes@ufmg.br Frank Sill Torres Cyber-Physical Systems DFKI GmbH Bremen, Germany Keyliane da Silva Fernandes Universidade Federal de Minas Gerais Belo Horizonte, Brazil

Abstract— A wide range of applications in the field of the Internet-of-Things (IoT) possess only strongly limited energy resources. This rises the need for Ultra-Low Power (ULP) designs that enable processing with lowest energy consumption. This, however, comes at considerable performance costs. Consequently, there is a need for flexible design flows that enable the designer to select the best tradeoff between energy consumption and performance for the destined application. Towards this end, we present in this paper a complete flow for the design of a ULP standard cell library based on the subthreshold methodology. The flow considers both the selection of the appropriate technology parameters, like supply voltage and base sizing, as well as the actual design and characterization of the standard cells. Simulation results for an ARM-based processor and ITC'99 and EPFL benchmark circuits realized in a commercial 130 nm technology indicate a possible reduction of power consumption of several factors, while performance degrades between two and three orders of magnitude.

## Keywords—Ultra-low power, Internet-of-Things, Energy consumption, Standard cell library, Subthreshold Design

#### I. INTRODUCTION

Recent technology advances have enabled applications in the field of the Internet-of-Things (IoT) with strongly limited energy resources, as for example in wireless sensor networks [1], in biomedical system [2] or for radio frequency identification (RFID) [3]. In these Ultra-Low Power (ULP) applications, the main focus is on energy consumption in exchange for the overall system performance. Towards this end, supply voltages below the transistor threshold voltage are used, having in mind its quadratically impact on the energy consumption [4]. However, this so-called subthreshold design methodology requires a redesign of existing circuits.

While for analog blocks this redesign is mostly done manually, digital blocks still can be realized via the standard cell methodology [5]. Here, after a synthesis step, logic functions are mapped on a library of standard cells which are then automatically placed and routed by existing design tools. Existing standard cells cannot be used, though, as CMOS circuits behave differently in the subthreshold regime [4]. Hence, there is a need for new flows for the design of a standard cell libraries for ULP applications. The authors in [4] propose an analytical solution for the determination of optimum supply voltage and subthreshold voltage with emphasis on energy consumption. However, this approach is design-driven and ignores the specific requirements of a standard cell library. In contrast, the authors of [6] propose multiobjective optimization for sizing the elements of a standard cell library in subthreshold regime. The presented approach leads to promising results, but there are no reports on the selection of an appropriate supply voltage and the characterization of the cells. In [7], an extensive flow for characterization of standard cells operating with non-standard voltage supplies is presented. However, this work focuses on given standard cell libraries and avoids resizing the standard cells.

In contrast, we present in this work a complete flow for design of standard cells libraries operating in the subthreshold regime. The proposed flow considers the selection of appropriate supply voltage as well as the sizing and characterization of the elements of the library. That means, we enable the design of cell libraries which are specifically for each ULP project in order to explore the best tradeoff between performance and energy consumption for each individual application.

The remainder of this work is structured as follows. Section II reviews the theoretical concepts in order to keep this work as much as possible self-explanatory. Section III introduces the proposed flow and presents the related steps. Section IV discusses the application of the flow for a comercial 130 nm technology and for selected benchmark designs as well as an ARM-based processor. Finally, Section V concludes this work.

#### II. THEORETICAL BACKGROUND

In this section, the theoretical concepts are revised. First, an overview on the main concepts regarding the energy dissipation in CMOS circuits and the differences between dynamic and leakage dissipations are presented. Then, subthreshold circuit design is explained, describing the advantages and drawbacks in using this methodology. Finally, we give a brief overview of the related state-of-the-art.

#### A. Energy Dissipation in Integrated Circuits

In electronic circuits, the total energy consumption *E* follows from:

$$E = \int P(t)dt \tag{1}$$

Where P denotes dissipated power at time t.

The total power dissipation of a digital CMOS circuit can be divided into dynamic dissipation  $P_{dyn}$  and leakage dissipation  $P_{leak}$ , with:

$$P(t) = P_{dyn}(t) + P_{leak}(t)$$
<sup>(2)</sup>

Further,  $P_{dyn}$  can be separated into the two components switching dissipation  $P_{sw}$  and short-circuit dissipation  $P_{sc}$  [7]:

$$P_{dyn}(t) = P_{sw}(t) + P_{sc}(t)$$
(3)

In current technologies,  $P_{sw}$  is the dominating source for power dissipation [8]. It depends on the number of switching events, which follows from the switching activity  $\alpha$ , the clock frequency  $f_{clk}$ , the output load  $C_L$  and the supply voltage  $V_{DD}$ , i.e.:

$$P_{sw} \propto \alpha f_{clk} C_L V_{DD}^2 \tag{4}$$

 $P_{sc}$  results from short periods of time when both PMOS and NMOS networks are conducting during switching. Typically, the short-circuit dissipation represents less than 10 % of the total dynamic dissipation. In the subthreshold region, the short-circuit power reduces exponentially and usually is not considered in ULP designs [9].

Leakage power dissipation results from parasitic transistor currents. The dominating currents are subthreshold current  $I_{sub}$ , which is the drain-source current in switched-off devices, the gate-oxide current  $I_{gate}$ , which is the tunneling current through the gate oxide, and the junction leakage  $I_{diode}$ , which is the current between drain-bulk and source-bulk diodes [8][10][11]. The leakage power dissipation  $P_{leak}$  can be approximated with:

$$P_{leak} \propto V_{DD} \left( I_{gate} + I_{sub} + I_{diode} \right) \tag{5}$$

The propagation delay  $t_p$  of a standard cell denotes the maximum time an output changes after a new input arrived. For supply voltages above the transistor threshold voltage  $v_{th}$ , a first order approximation of  $t_p$  is as follows:

$$t_p \propto \beta \frac{C_L V_{DD}}{\left(V_{DD} - v_{th}\right)^{\gamma}} \tag{6}$$

Where  $\beta$  is a technology dependent parameter and  $\gamma$  represents the velocity saturation index.

#### B. Subthreshold Circuit Operation

From (4) and (5) follows that the supply voltage has quadratic and linear impact on dynamic and static power dissipation, respectively. Consequently,  $V_{DD}$  reduction is a widely applied strategy for low power applications, e.g. as dynamic voltage scaling [8].

In conventional designs, though,  $V_{DD}$  remains always above the threshold voltage  $v_{th}$ . This is in order to assure that devices



Fig. 1. Delay of seven-stage ring oscillator versus supply voltage in 65 nm technology [14]

operate in superthreshold or strong inversion region in which the delay depends nearly linear from  $V_{DD}$  – as can be seen in (6).

However, for applications with strong energy limitations one can reduce  $V_{DD}$  below  $v_{th}$ , such that the devices operate in subtreshold or weak inversion region [12]. In this region, the minority carrier concentration is small and the channel between drain and source terminals has no horizontal electric field. A longitudinal electric field appears due the drain-to-source voltage  $V_{ds}$  and the carries move by diffusion between the source and the drain, creating the subthreshold current  $I_{sub}$ , which follows from [13]:

$$I_{sub} = I_0 e^{\frac{V_{gs} - v_{th}}{nV_T}} \left( 1 - e^{\frac{V_{ds}}{V_T}} \right) \text{ with } I_0 = \frac{W\mu_0 C_{os} V_T^2 e^{1.8}}{L}$$
(7)

Where  $V_T$  denotes the thermal voltage, W and L represent the transistor width and length,  $C_{ox}$  is the oxide capacitance,  $\mu_0$  means the carrier mobility and n is the subthreshold slope, which describes how much the gate-to-source voltage  $V_{gs}$  must drop in order to decrease the leakage current by an order of magnitude [8].

In the subthreshold operation region of operation, a first order approximation of the delay  $t_{p,sub}$  results from [12]:

$$t_{p,sub} \propto \beta C_L V_{DD} e^{\frac{V_{DD}}{nV_T}}$$
(8)

It follows that in subthreshold operation region, the delay increases exponentially with decreasing  $V_{DD}$ .

Fig. 1 demonstrates the relationship between the supply voltage and the delay of a seven-stage ring oscillator based on an inverter chain in a 65 nm technology node [14]. As can be seen, in subthreshold region delay depends exponentially from  $V_{DD}$ , while in superthreshold region the relation is linear.

Having in mind that in subthreshold region carrier transport is dominated by diffusion instead of drift as in strong inversion, carrier mobility has lower impact [13]. Consequently, the difference in carrier mobility of NMOS and PMOS devices has less impact on the circuit sizing. An additional characteristic of the subthreshold regime is that currents in switched-off and switched-on devices are in similar ranges, which requires to consider ratio issues during circuit sizing [13]. Having in mind these particularities, circuits operating in subthreshold regime demand an appropriate transistor sizing that differs from sizing in strong inversion.

### C. Standard Cell Methodology

A standard cell library is a set of combinational and sequential cells to be used in *Electronic Design Automation* (EDA) flows. A typical standard cell library contains a library database, which consists of a number of views of each cell, e.g. layout, schematic, logical and abstract as well as timing, power and noise information [15].

The abstract view is a simplified version of the cell layout and is used by the place-and-route tools. Information regarding cell name, border limits and the metal layers of the cells are presented in this format. The logical views are logic descriptions of each cell in a hardware description language, such as VHDL or Verilog.

The characterization is the process in which all these information are extracted for each cell. This process consists of the simulation of the cell model in various operation conditions and process parameters changes. The characterization provides information on timing and energy consumption for those conditions for each cell.

#### D. State-of-Art and Related Works

In many applications, the advantage of energy saving can surpass the drawback of having a higher delay. This section shows some of the most recent works regarding this kind of circuit.

The work presented in [16] discusses the design and implementation of ULP processors for ubiquitous sensing and computing. The main goal of this work is the exploration of the ULP design in order to provide processors with high energy efficiency and flexibility. Two application scenarios are implemented with the proposed processors: a high-throughput compute-intense processing for multimedia and a lowthroughput low-cost processing for Internet of Things (IoT). The processor is fabricated at 130 nm process technology, running at 216 MHz and has a power consumption of 414 mW.

In [17], the development of a standard cell library in a 65 nm technology is described. The authors apply multi-objective optimization for cell sizing at a predefined supply voltage. Further, all cells have been characterized, but not redesigned, for supply voltages ranging from 250 mV and 1.2 V. Results for an AES hardware accelerator indicate a reduction of the energy consumption of more than factor 9, while performance declined by nearly factor 6000.

In [18] the impact of the inverse narrow width effect (INWE) on the transistor drain is investigated for different technology nodes. Further, a INWE-aware sizing method is proposed. This



Fig. 2. General view of proposed flow for cell library design for ULP applications

method is applied for the design of a standard cell library, which enables considerable reductions of delay, energy consumption and area. Using this library, a baseband processor was designed that is operated with a frequency of 6 MHz at 0.5 V and possesses a power consumption of nearly 4  $\mu$ W.

#### III. STANDARD CELL LIBRARY DESIGN FLOW

This section describes the proposed flow that enables the generation of a standard cell library for ULP applications. The flow considers a technology exploration as well as a final characterization of all elements.

#### A. Flow Structure

The flow is divided into two main sections, where the first one focuses on the estimation of the technology related parameters, while the second section relates to the cell design and its characterization. The final result of the flow is a characterized standard cell library for ULP applications that enables the EDA of digital IoT designs.

A general overview of the flow is presented in the Fig. 2. It starts with the data of the chosen technology and design constraints and ends with a fully characterized cell library. The left-hand side lists the steps related to technology exploration, while the right-hand side presents the steps with emphasis on cell design and characterization.

#### B. Technology Exploration

During technology exploration, the final supply voltage used and the base sizing parameters shall be determined. This flow section starts with *1.1 Testbenches design*. That means, a set of test circuits for selected logic cells is generated that enable the variation of the supply voltage and the control of input signals. Further, it is mandatory to add appropriate input and output loads created by a fanout-of-4 configuration<sup>1</sup> the cells to be analyzed. Having in mind the similarity of the logic cell behavior, following three test cells should be chosen: Inverter, NAND2 and NOR2. These circuits contain single transistors and stacks of two devices, which is a requirement for the later sizing analysis. The initial cell sizing can be freely chosen but should follow standard sizing methods [8].

The next step 1.2 VDD Exploration relates to a general analysis of the relation between supply voltage and performance for the given technology. Therefore, the supply voltage of each testbench is varied and the maximum values for energy dissipation per transition and transition delay are extracted. Despite the fact that the transistor sizes remain constant, these values give a good insight to the trade-off between energy dissipation and delay. Comparing delay versus the energy consumption, one can select a specific supply voltage for the next steps. This choice is driven by the rough approximation that the extracted relations between supply voltage and delay/energy for the chosen test circuits is transferable to larger designs. That means, one can approximate the gain in energy consumption at the loss of performance for the final application.

The step 1.3 Sizing analysis focuses on the estimation of base sizing parameters that shall be applied as initial values during the later cell design phase. These parameters define the gate length, the minimum width of NMOS and PMOS devices and the relation of transistor sizes in simple stack structures.

The analysis starts with selection of the gate length. Therefore, maximum values for delay and energy are estimated for the chosen supply voltage while the transistor length is increased until it reaches at most the double of the initial value, as indicated in [6]. This is repeated for all test circuits. Next, one can select the length to be applied based on the comparison of delay versus energy.

In the following, the width of all devices is set to the minimum technology value. Then, the width of all PMOS devices is increased until the maximum transition delays of falling and rising output slopes are equal within a margin of  $\pm 10$  %. The chosen margin follows a proposal from [20]. If this is not possible, then the width of the NMOS device is raised by 10 % and the width of the PMOS devices is increased again starting from its initial value. This procedure is repeated for all test circuits.

The extracted supply voltage, gate length and transistor widths for different configurations are the input parameters for the second main section of the proposed flow (see also Fig. 2).

#### C. Cell Design and Characterization

This section starts with 2.1 *Cell Design* which relates to the generation of the schematics and the layout of the standard cells of the final library. The transistor sizing during schematic design is initialized by the values obtain in step 1.3. Next, sizing can be

executed manually, via multiobjective optimization [6] or by commercial tools like MunEDA WiCkeD. It is important to note that all these approaches can be executed considerably faster due to the appropriate initial values. This step is concluded with the extraction of the parasitic capacitances and resistances for each cell.

The next step 2.2 *Library characterization* focuses on the characterization of the designed and extracted cells. Therefore, tools like Cadence Liberate or Synopsys SiliconSmart can be applied.

Finally, in step 2.3 *Verification* the resulting library is verified by executing a full automated design flow for selected combinational and sequential projects. This includes logical and functional verification as well as physical design checks.

#### IV. EXEMPLARY REALIZATION

In order to verify the proposed flow, we implemented a standard cell library for ULP applications and compared performance, area and energy costs for several benchmarks and an ARMv2 based processor.

#### A. Cell Library Design

We chose a comercial 130 nm technology, having in mind a future realization of the ULP processor for IoT applications. The technology has a standard supply voltage of 1.2 V and a nominal the threshold of 410 mV. The minimum width is 160 nm and the minimum length is 120 nm.

We implemented testbenches for the inverter, NAND2 and NOR2 circuit and determined 0.2 V as the minimum supply voltage at which both circuits still produced correct outputs. Next, we varied the supply voltage from 0.2 V to 1.2 V and determined the maximum delay and the maximum energy consumption per transition. We allowed that both parameters could come from different input combinations. Fig. 3 depicts the results for the inverter and the NAND2 circuits which clearly indicate that energy consumption can be reduced at the costs of performance. In both examples, delay changes by up to three orders of magnitude, e.g. delays of fastest and slowest Inverter differ by factor 2,500, while energy differs by up to factor 40. Based on these results, we chose the supply voltages 0.25 and 0.35 for further analysis.

The following gate length analysis indicated different values for PMOS and NMOS devices in order to obtain the best compromise between delay and energy dissipation. In case of the NMOS devices, a recommended gate of 360 nm was determined, while the PMOS devices should have a gate length of 240 nm. Following the proposed flow, we continued with the selection of the initial widths for all devices of the test benches. For the PMOS devices, we determined for both supply voltages a minimum width for the PMOS device of  $W_{PMOS} = 350$  nm, while

<sup>&</sup>lt;sup>1</sup> Fanout-of-4 means that the output load of the cell consists of 4 copies of the cell, whereas all cells have same transistor sizing [19]

TABLE I.DIMENSIONS OF IMPLEMENTED CELLS $(L_{NMOS} = 360 \text{ nm}, L_{PMOS} = 240 \text{ nm})$ 

| Cell Type | W <sub>NMOS</sub> [nm]  | W <sub>PMOS</sub> [nm]  |
|-----------|-------------------------|-------------------------|
| INV       | 200                     | 350                     |
| NAND2     | 350                     | 350                     |
| NOR2      | 200                     | 700                     |
| AND2      | 200 (200 <sup>1</sup> ) | 350 (350 <sup>1</sup> ) |
| OR2       | 200 (200 <sup>1</sup> ) | 700 (350 <sup>1</sup> ) |
| XOR2      | 200                     | 350                     |
| AOI22     | 350                     | 700                     |
| FlipFlop  | 200 (350 <sup>2</sup> ) | 350 (700 <sup>2</sup> ) |

1 inverter stage, 2 final inverter stages

the corresponding value for the NMOS device resulted to  $W_{NMOS} = 200$  nm.

After conclusion of the technology exploration we continued with the cell design of the standard cell libraries. Therefore, we selected eight combinational and sequential cells and sized them using Cadence Virtuoso. The final results are depicted in Table 1.

Next, we generated the layouts and extracted the parasitics using Cadence QRC. This was followed by the library characterization using Cadence Liberate. Verification was done via synthesis of selected ISCAS85 circuits [21], its conversion into a schematic representation followed by its simulation. Further, we designed a physical implementation of the circuits using Cadence Innovus and executed physical design checks.

#### **B.** Simulation Results

After library design, we implemented several benchmark circuits and a processor using the two designed libraries and the commercial library for the same technology process. For a fair comparison, we reduced the commercial library to the same cell types available in the designed cell libraries.

We evaluated benchmarks taken from ITC'99 suite [22], which comprises a set of sequential circuits, the EPFL benchmark suite [23], which contains representative combinational circuits, and the AMBER23 core [24]. The latter is an Open-Source microprocessor with a 32-bit ARM architecture and an ARMv2 instruction set. The synthesis goal was set to delay minimization without any restrictions in terms of area, power and delay. The results shown in Table 2 list the obtained maximum path delay (column *Delay*), the area of all applied cells (column *Cell Area*) and the Power dissipation as total power dissipation (column *Dynamic*) and Leakage power dissipation (column *Leakage*).

The results clearly indicate that using supply voltages below the threshold voltage leads to a notable reduction of the power dissipation, and thus, the energy dissipation. The average gain compared to the standard supply voltage is with factor 37 (for  $V_{DD} = 0.25$  V) and factor 18 (for  $V_{DD} = 0.35$  V) nearly as the expected values of roughly 28 (for  $V_{DD} = 0.25$  V) and 14 (for  $V_{DD} = 0.35$  V) taken from results in Fig. 3. Furthermore, the delay of the subthreshold circuits increased in average by factor 432 ( $V_{DD} = 0.25$  V) and 79 ( $V_{DD} = 0.35$  V) which is close to the expected values.



(b) NAND2

Fig. 3. Comparison of maximum delay and maximum Energy dissipation per transition for Inverter and NAND2 test circuits.

The nearly identical delays of the several ITC'99 benchmarks follow from the similarities of its critical paths. Area costs decreased in average by 18 % which is mainly based on the chosen optimization criteria that favors delay.

The results for the AMBER23 are in line with the results obtained for the ITC'99 benchmarks. That means, using a standard cell library in the subthreshold regime reduces the power dissipation by nearly factor 5, and performance declines by factor 960.

#### V. CONCLUSIONS

For many applications in the field of the Internet-of-Things, energy is the main limiting factor, while performance plays only a minor role. Hence, there is a need for integrated systems that rigorously trade-off performance against energy consumption. In case of digital systems, this requires specific standard cell libraries that are designed for these so-called Ultra-Low Power (ULP) applications. Following this observation, we presented here a complete flow for the design of ULP standard cell libraries in the subthreshold regime. The flow considers the actual characteristics of the chosen technology as well as the specific cell sizing strategies that differ from circuits with standard supply voltages. Based on the proposed flow, two standard cell libraries for supply voltages below the threshold voltage have

 TABLE II.
 Results For selected ITC'99 and EPFL BENCHMARKS and the AMBER23 core using two designed cell libraries with a supply voltage below the threshold voltage (0.25 V and 0.35 V) and a reduced commercial cell library (1.2 V)

|                    | Delay [ns] |        |        |                              |        |            | Power Dissipation |              |        |       |              |        |       |        |        |       |
|--------------------|------------|--------|--------|------------------------------|--------|------------|-------------------|--------------|--------|-------|--------------|--------|-------|--------|--------|-------|
|                    |            |        |        | Cell Area [mm <sup>2</sup> ] |        | Total [µW] |                   | Dynamic [µW] |        |       | Leakage [µW] |        |       |        |        |       |
|                    | VDD        | 0.25 V | 0.35 V | 1.2 V                        | 0.25 V | 0.35 V     | 1.2 V             | 0.25 V       | 0.35 V | 1.2 V | 0.25 V       | 0.35 V | 1.2 V | 0.25 V | 0.35 V | 1.2 V |
| Benchmark circuits | b18        | 2828   | 406    | 20                           | 502    | 423        | 589               | 217          | 332    | 6828  | 215          | 329    | 6809  | 2.0    | 2.5    | 19.5  |
|                    | b20        | 13753  | 2328   | 38                           | 76     | 76         | 99                | 33           | 81     | 1561  | 32           | 81     | 1557  | 0.3    | 0.5    | 3.5   |
|                    | b20_1      | 14255  | 2822   | 39                           | 79     | 78         | 104               | 31           | 85     | 1642  | 31           | 85     | 1638  | 0.3    | 0.5    | 3.6   |
|                    | b21        | 13410  | 2585   | 39                           | 83     | 80         | 105               | 34           | 84     | 1708  | 34           | 87     | 1704  | 0.3    | 0.5    | 3.7   |
|                    | b21_1      | 13327  | 2609   | 39                           | 81     | 82         | 109               | 33           | 90     | 1772  | 33           | 90     | 1768  | 0.3    | 0.5    | 3.8   |
|                    | b22        | 14173  | 2430   | 38                           | 120    | 117        | 157               | 47           | 72     | 1369  | 46           | 72     | 1637  | 0.5    | 0.7    | 5.4   |
|                    | b22_1      | 14881  | 2420   | 39                           | 120    | 119        | 161               | 47           | 71     | 1366  | 47           | 70     | 1360  | 0.5    | 0.8    | 5.6   |
|                    | Multiplier | 36089  | 6945   | 56                           | 179    | 180        | 249               | 6            | 13     | 288   | 6            | 12     | 280   | 0.7    | 1.1    | 8.5   |
|                    | Mem_Ctrl   | 15515  | 2608   | 17                           | 316    | 324        | 304               | 13           | 28     | 396   | 12           | 26     | 387   | 1.1    | 1.7    | 9.3   |
|                    | Adder      | 6504   | 1511   | 40                           | 10     | 8          | 7                 | 0.5          | 0.9    | 11    | 0.4          | 0.8    | 11    | 0.1    | 0.1    | 0.3   |
|                    | Square     | 16813  | 3177   | 53                           | 123    | 125        | 179               | 5            | 10     | 219   | 4            | 10     | 216   | 0.5    | 0.7    | 6.2   |
|                    | Bar        | 4769   | 875    | 4                            | 17     | 17         | 22                | 1            | 2      | 41    | 0.9          | 2      | 41    | 0.1    | 0.1    | 0.7   |
|                    | AMBER23    | 4308   | 790    | 70                           | 211    | 213        | 264               | 1093         | 1243   | 4431  | 1093         | 1241   | 4422  | 0.9    | 1.3    | 9.1   |

been designed and characterized in a commercial 130 nm technology. Simulation results for standard benchmark circuits and an ARM-based processor indicate that working in subthreshold regime enables an improvement of power consumption by several factors, while performance declines by two to three orders of magnitude.

#### ACKNOWLEDGMENT

The authors would like to thank the Brazilian agencies CNPq, FAPEMIG and CAPES for the financial support for this project.

#### REFERENCES

- J. Myers et al, "An 80nW retention 11.7pJ/cycle active subthreshold ARM Cortex-M0+subsystem in 65nm CMOS for WSN applications," in ISSCC, 2015.
- [2] F. Goodarzy and S. Skafidas, "Ultra-low-power wireless transmitter for neural prostheses with modified pulse position modulation", Healthcare Technology Letters, Volume: 1, Issue: 1, 1 2014.
- [3] P. Kamalinejad, K. Keikhosravy, R. Molavi, S. Mirabbasi and V. Leung, "An Ultra-Low-Power CMOS Voltage-Controlled Ring Oscillator for Passive RFID Tags", New Circuits and Systems Conference (NEWCAS), 2014.
- [4] B. Calhoun, A. Wang and A. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits", IEEE Journal Of Solid-State Circuits, Vol. 40, No. 9, 2005.
- [5] R. Sharma, "Characterization and Modeling of Digital Circuits", CreateSpace Independent Publishing Platform, 2015.
- [6] M. Blesken, S. Lütkemeier and U. Rückert, "Multiobjective Optimization for Transistor Sizing of Subthreshold CMOS Logic Standard Cells", Proceedings of 2010 IEEE International Symposium on Circuits and Systems, 2010.
- [7] M. Gibiluka, M. Moreira, W. Neto and N. Calazans, "A standard cell characterization flow for non-standard voltage supplies", 29th Symposium on integrated Circuits and Systems Design (SBCCI), 2016.
- [8] N. H. E. Weste and D. M. Harris, "CMOS VLSI Design A Circuits and Systems Perspective", Addison Wesley, 4<sup>a</sup> Ed., 2011.
- [9] Scott Hanson et al, "Ultralow-voltage, minimum-energy CMOS". IBM Journal of Research and Development, 50(4-5):469–490, 2006.

- [10] A. Agarwal, S. Mukhopadhyay, C.H. Kim, A. Raychowdhury and K. Roy, "Leakage power analysis and reduction: models, estimation and tools", Proc. IEE, v.152, n.3, p 353-368, 2005.
- [11] B. Calhoun, A. Wang and A. Chandrakasan, "Subthreshold Design for Ultra Low-Power Systems", Springer, 2006.
- [12] M. Alioto, "Ultra-Low Power VLSI Circuit Design Demystified and Explained: A Tutorial," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 59, no. 1, pp. 3-29, Jan. 2012.
- [13] P. Butzen and R. Ribas, "Leakage current in submicrometer CMOS gates", UFRS, 2007.
- [14] H. Kanitkar, "Subthreshold circuits: Design, implementation and application", Rochester Institute of Technology, RIT Scholar Works, 2008.
- [15] H. Poormina and K. Chethana, "Standard Cell Library Design and Characterization using 45nm technology", IOSR Journal of VLSI and Signal Processing, Volume 4, Issue 1, Ver. I, 2014.
- [16] M. Ning, "Ultra-low power circuit techniques for miniaturized sensor nodes", Doctoral Thesis in Electronic and Computer Systems, Sweden 2015.
- [17] M. Vohrmann et al, "A 65 nm standard cell library for ultra low-power applications", European Conference on Circuit Theory and Design (ECCTD), 2015.
- [18] J. Zhou, S Jayapal, L. Huang, B, Büsze and J. Stuyt, "A 40 nm Dual-Width Standard Cell Library for Near/Sub-Threshold Operation", IEEE Transactions On Circuits And Systems, 2014.
- [19] D. Harris, R. Ho, G. Wei and M. Horowitz. "The Fanout-of-4 Inverter delay metric", Stanford University, 2015.
- [20] K. H. Stangherlin, "Energy and speed exploration in digital CMOS circuits in the near-threshold regime for very-wide voltage-frequency scaling", Federal University of Rio Grande do Sul, 2013
- [21] M. Hansen, H. Yalcin, and J. P. Hayes, "Unveiling the ISCAS-85 Benchmarks: A Case Study in Reverse Engineering," IEEE Design and Test, vol. 16, no. 3, pp. 72-80, July-Sept. 1999.
- [22] F. Corno, M. S. Recorda and G. Squillero, "RT-level ITC'99 Benchmarks and First ATPG Results", IEEE Design & Test of Computers, Volume 17, Issue: 3, 2005.
- [23] L. Amaru, P.-E. Gaillardon, and G. De Micheli, "The EPFL Combinational Benchmark Suite," Int'l Workshop on Logic Synth., 2015.
- [24] Amber Core Specification. Technical report, Amber Open Source Projec, 2011.