# SAT-Based Post-Processing for Regional Capture Power Reduction in At-Speed Scan Test Generation

Stephan Eggersglüß\*<sup>†</sup>

Kohei Miyase<sup>‡</sup>

Xiaoqing Wen<sup>‡</sup>

\*Institute of Computer Science University of Bremen 28359 Bremen, Germany segg@informatik.uni-bremen.de

<sup>†</sup>Cyber-Physical Systems DFKI GmbH 28359 Bremen, Germany <sup>‡</sup>Kyushu Institute of Technology Iizuka, Fukuoka 820-8502 Japan {k\_miyase,wen}@cse.kyutech.ac.jp

Abstract—With more and more sophisticated low-power design techniques being applied to modern LSI chips for aggressive functional power reduction, the risk of fault-free chips falsely failing production test grows due to excessively high test power compared with functional power. Existing low-power ATPG methods, however, suffer from severe test data inflation and often use unfocused global test power reduction. This paper proposes a novel optimization-SAT-based at-speed scan test generation method that is explicitly targeted at eliminating high-capturepower test vectors in a pre-generated compact test set. This method employs layout information in reducing capture switching activity in a focused regional manner. Experiments demonstrate that the proposed method can effectively eliminate a large number of high-capture-power test vectors with neither test data inflation nor fault coverage loss.

# I. INTRODUCTION

The use of sophisticated low-power design techniques becomes more and more dominant in modern LSI chips to reduce the functional power in order to meet the power specification. In turn, this increases the risk of fault-free chips falsely failing the production test resulting in yield loss. Generally, the power dissipation during test is much higher than the functional power. Therefore, there is a strong need to prevent tests to exceed the functional power budget and cause false failures [1].

Typically, there are two types of power dissipation during at-speed scan testing. Shift power is the power dissipated during loading the test vectors in shift cycles. Capture power is the power dissipated during the capture cycles when the test vectors are applied at speed. This work is concerned with capture power only.

Several techniques exist for the reduction of capture power during at-speed scan testing. A widely-used approach to reducing the capture power is X-filling [2]–[6]. Here, don't care bits are filled with specific values in order to prevent transitions and, by this, reduce capture power. The disadvantage of Xfilling techniques is that bit assignments done by ATPG for fault detection cannot be changed anymore. Therefore, their effectiveness is limited and strongly depends on the ATPG process. On the other hand, dedicated ATPG techniques to produce low-capture-power tests as proposed in [7], [8] suffer from degraded ATPG performance and test data inflation.

A recent approach [9] integrates information about the applied test compression hardware into a compression-constrained ATPG process for low transition counts. Other techniques [10], [11] use hardware structures to disable certain parts of the circuit, e.g. scan chains and clock gating. However, they often suffer from increased design difficulty,

fault coverage degradation and/or test data inflation. The work in [12] uses functional test programs to determine a maximal threshold for test power consumption.

Conventionally, global capture power reduction is performed by representing the switching activity of the complete circuit with a single metric such as *Weighted Switching Activity* (WSA) [13]. However, the benefit of global reduction is questionable since it cannot prevent concentrated switching activity in neighboring areas which may still cause failures. Additionally, global capture power reduction typically leads to severe test data inflation [14], since the search space is highly constrained.

Post-ATPG approaches exist which identify capture-powerrisky test vectors, i.e. tests whose WSA values exceed a certain global WSA threshold. These capture-power-risky vectors are either modified or replaced with new capture-power-safe tests. The approach in [15] discards all (global) capture-power-risky tests and tries to detect the remaining faults by specifying the X-bits of the capture-power-safe vectors. Afterwards, the remaining undetected faults are targeted by a commercial ATPG tool.

More sophisticated methods take the neighboring areas of paths or layout information into account in capture power reduction. By this, the switching activity can be reduced in a more focused manner. The work in [16] takes the powergrid into account and identifies localized switching activity in a given test set. Violating vectors are removed and a PODEMbased low-power ATPG algorithm is used to generate new patterns for these faults. This process results in extra patterns as well as fault coverage loss due to the increased complexity.

The approaches in [14], [17] try to reduce the switching activity of neighboring areas of long paths by using X-filling or X-restoration techniques. The work in [18] uses layout information and arranges the gates in regions. A compaction heuristic is then used to merge test cubes so that the switching activity is evenly distributed. The approach in [5] uses layout information to identify regions with high switching activity for a given test set. An X-filling method is applied which takes spatial information into consideration to reduce switching activity region-by-region. Similarly, the approach in [19] takes layout information into account to minimize the effect of power supply noise using LP-based X-filling techniques.

This paper proposes a new ATPG procedure which is able to eliminate *High-Capture-Power* (HCP) test vectors of a given highly compacted test set in a post-ATPG process. This is done by replacing the HCP test with a newly generated test with reduced switching activity below a certain threshold. The application as a post-ATPG process has the advantage that this approach can be used in addition to regular ATPG using all dedicated and fast techniques to produce a compact test set.

Although the approach works also for global capture power reduction, the method particularly focuses on regional information which is integrated into the ATPG process. By this, the switching activity is reduced in a more focused manner compared to the unfocused global reduction. In comparison, global WSA reduction often cannot reduce regional highly concentrated switching activity.

Regional HCP tests, i.e. tests with excessively high switching activity in some circuit areas, are identified and effectively retargeted by the proposed ATPG. The problem is formulated as an optimization-SAT problem so that robust solving engines can be used. In contrast to conventional X-filling methods, the proposed ATPG engine is able to modify certain bits assigned for detecting faults to obtain more focused reduction in capture-power violating regions. Since a larger search space is used than in the X-filling-based approaches, more effective capture power reduction is expected.

Using this novel region-based ATPG procedure, a large number of HCP tests can be eliminated without any increase of the pattern count and without any loss of fault coverage. Therefore, this can be flexibly used to improve the power characteristics of a pre-generated test set.

The rest of the paper is organized as follows. Preliminaries are given in Section II. The integration of layout information is presented in Section III. The overall low-capture-power ATPG flow and problem formulation is described in detail in Section IV. Experimental results are given in Section V and Section VI draws conclusions.

# II. PRELIMINARIES

At-speed delay testing necessitates the consideration of two consecutive time frames  $t_1$  and  $t_2$ . A first test vector  $v_1$  is applied to provide initial values. A second test vector  $v_2$  is then applied at-speed and the values are captured at the outputs. For launch-on-capture scan testing,  $v_2$  is a direct result of  $v_1$ . Therefore, we refer to a test vector v. In order to measure the switching activity during test, the WSA metric is widely used. Each signal in the circuit is associated to a WSA node. The WSA value of a node is the number of state changes between  $t_1$  and  $t_2$  multiplied by (1 + N), where N is the number of fanouts. The WSA value of the circuit is the sum of all WSA of all nodes.

Each test vector v in a generated delay test set can be evaluated with respect to the amount of switching activity, i.e. WSA. Given a WSA threshold WT, a test vector  $v_i$  is said to be a *High-Capture-Power* (HCP) test vector if the WSA value of  $v_i$  exceeds WT, i.e. WSA $(v_i) > WT$ .

Next, *Boolean Satisfiability* (SAT) techniques are briefly introduced, since these techniques are used in order to eliminate HCP test vectors.

In the last decade, SAT-based algorithms gained much attention in the field of testing, e.g. [20]–[22]. The advantage of the application of a SAT-based algorithm is the effectiveness of the underlying reasoning engine, i.e. a SAT solver, in solving hard problems. In order to apply a SAT solver to a problem, the problem has to be formulated as a Boolean formula (SAT instance) in *Conjunctive Normal Form* (CNF). A CNF  $\Phi$  is a

| у6 | R <sub>x1y6</sub> | R <sub>x2y6</sub> | R <sub>x3y6</sub> | R <sub>x4y6</sub> | R <sub>x5y6</sub> | R <sub>x6y6</sub> | R <sub>x7y6</sub> |
|----|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| y5 | R <sub>x1y5</sub> | R <sub>x2y5</sub> | R <sub>x3y5</sub> | R <sub>x4y5</sub> | R <sub>x5y5</sub> | R <sub>x6y5</sub> | R <sub>x7y5</sub> |
| y4 | R <sub>x1y4</sub> | R <sub>x2y4</sub> | R <sub>x3y4</sub> | R <sub>x4y4</sub> | R <sub>x5y4</sub> | R <sub>x6y4</sub> | R <sub>x7y4</sub> |
| у3 | R <sub>x1y3</sub> | R <sub>x2y3</sub> | R <sub>x3y3</sub> | R <sub>x4y3</sub> | R <sub>x5y3</sub> | R <sub>x6y3</sub> | R <sub>x7y3</sub> |
| y2 | R <sub>x1y2</sub> | R <sub>x2y2</sub> | R <sub>x3y2</sub> | R <sub>x4y2</sub> | R <sub>x5y2</sub> | R <sub>x6y2</sub> | R <sub>x7y2</sub> |
| y1 | R <sub>x1y1</sub> | R <sub>x2y1</sub> | R <sub>x3y1</sub> | R <sub>x4y1</sub> | R <sub>x5y1</sub> | R <sub>x6y1</sub> | R <sub>x7y1</sub> |
|    | x1                | x2                | x3                | x4                | x5                | x6                | x7                |

Fig. 1. Example Circuit Partitioning

conjunction of *n* clauses  $\omega_1, \ldots, \omega_n$  and a clause is disjunction of *m* literals  $\lambda_1, \ldots, \lambda_m$ . A literal is a Boolean variable *x* in its positive (*x*) or negative form ( $\overline{x}$ ) form. The SAT solver computes a solution of the CNF. A satisfying solution is an assignment which satisfies each clause  $\omega_i$ . A clause is satisfied if at least one literal  $\lambda_j$  in clause  $\omega_i$  is satisfied. If no such solution exists, the SAT solver proves the SAT instance is unsatisfiable.

In addition to pure SAT-based algorithms, solvers for optimization-SAT are also available. While a SAT solver computes an arbitrary solution, i.e. the first found solution, optimization-SAT solvers accept an optimization function  $\mathcal{F}$ . This function is used to assess the quality of the solution and, by this, guide the search towards a solution which optimizes  $\mathcal{F}$ . Generally, the optimization function is given in the form of a linear sum. The function is formulated over a set of Boolean variables  $x_1, \ldots, x_k$  included in the SAT instance  $\Phi$ . Each variable  $x_i$  is associated with a constant value  $c_i$ , i.e. a weight:

$$\mathcal{F} = \sum_{i=1}^{k} c_i \cdot x_i$$

The optimization function is evaluated in an arithmetic way, i.e. it sums up all constant values associated with variables assigned to *true*. An optimization-SAT solver generates solutions in an iterative way by improving the result of  $\mathcal{F}$ . Effective learning techniques are used to quickly traverse the complete search space until the optimal solution is found and proven to be optimal. If the solver runs into a timeout, the best solution found so far is still available.

# III. INTEGRATION OF REGIONAL INFORMATION

Global WSA is often used to decide whether a test vector v is a HCP test vector. However, the switching activity during application of v can be unevenly distributed across the circuit. A non-HCP test vector may therefore be able to cause concentrated switching activity in some circuit regions. This may lead to problems during the test, e.g. high local IR-drop causing excessive path delay. Several approaches exist, which integrate layout information in order to prevent high switching activity in neighboring areas, e.g. [5], [16], [18], [19].

Similar to these approaches, the circuit die is partitioned into a set of regions R in our approach. An example of such a partitioning is shown in Figure 1. After placement and routing, layout information is collected from a *Design Exchange*  *Format* (DEF) file commonly used in industrial design flows. Each region  $R_{xy} \in R$  is defined by two coordinates x, y. A coordinate  $x_i (y_i)$  denotes a range on the circuit die from  $x_{i-1} + 1$  to  $x_i (y_{i-1} + 1 \text{ to } y_i)$  with  $x_0 = -1$  and  $y_0 = -1$ . Using the data collected from the DEF file, each gate/cell can be mapped directly to one region in the circuit. The mapping needs only to be done once in a pre-process.

After a test set has been generated, the WSA values for each test vector v are determined. Besides the global WSA value of v, a regional WSA analysis is also used to determine the WSA value of each region  $R_{xy} \in R$  denoted by *rWSA*. Similar to the global WSA which is defined as the sum of the WSA values of all nodes in the circuit, the regional WSA *rWSA*<sub>Rxy</sub> is defined as the sum of the WSA values of all nodes in the region  $R_{xy}$ .

Analogously to a global WSA threshold WT, a Regional WSA Threshold RWT can be defined. The RWT can be either defined for all regions or individually for each region. In this work, we use the same RWT for each region. However, the extension to individual RWTs is straight-forward. We further define that a HCP region of v is a region whose regional WSA under v exceeds the RWT: rWSA > RWT. Any test vector v with at least one HCP region under v is therefore a (regional) HCP test vector.

#### IV. OPTIMIZATION-BASED LOW-CAPTURE-POWER ATPG

This section presents the proposed optimization-based *Low-Capture-Power* (LCP) ATPG approach. The approach is applied as a post-process applied to a pre-generated transition fault test set, e.g. by a conventional ATPG tool. Section IV-A shows the overall flow of the proposed methodology to improve the power characteristics of a test set. Section IV-B gives the details concerning the formulation as an optimization-SAT problem which is the key of the effectiveness of the approach.

# A. Overall Retargeting Process

The aim of the process is to eliminate HCP test vectors as much as possible from a given pre-generated test set by replacement. This method aims for the improvement of highly compacted test sets. Therefore, no test data inflation is caused by the proposed methodology. In addition, the fault coverage is retained as well.

The overall flow is depicted in Figure 2. First, the given test set T is analyzed with respect to capture-power violations. HCP tests, i.e. tests with switching activity exceeding a certain threshold, are extracted and denoted by  $T_p$  in the following. The identification of the HCP tests can be done globally and regionally. The goal is to replace each test  $t_p \in T_p$  by a newly generated test  $t_p^*$  with reduced capture power and without fault coverage loss.

All HCP tests  $t_p \in T_p$  are processed in a loop. One test  $t_p$  is selected at a time. As a first step, all essential faults  $F_e^{t_p}$  of  $t_p$  are identified. An essential fault  $f_e$  is a fault which is detected by  $t_p$  but not by any other test in T [23]. That means, if  $t_p$  is to be replaced without any fault coverage loss, a new test has to detect all faults  $f_e \in F_e^{t_p}$ . Essential faults can be determined by fault simulation techniques and maintaining detection counters for each fault.

Additionally, all HCP regions  $R_p$  are identified, which violate their switching activity threshold, i.e. those regions which cause  $t_p$  to be a HCP test vector.

Both essential faults  $F_e^{t_p}$  and HCP regions  $R_p$  are given to the new optimization-based LCP ATPG engine. This engine uses optimization-SAT techniques as underlying reasoning engine. The problem is formulated such that the solution space consists of potential tests while an optimization function is used to guide the reasoning engine to find the best test in this solution space. The information given to the ATPG engine is used in the following way. A detailed description of the problem formulation is given in Section IV-B.

- Essential faults  $F_e^{t_p}$  The detection of all essential faults is given as a necessary condition. A SAT instance  $\Phi_{F_e}$  is formulated such that any satisfying solution is a test which detects all essential faults  $f_e \in F_e^{t_p}$ .
- HCP regions  $R_p$  The violated regions are incorporated into an optimization function  $\mathcal{F}_r$  as target regions. The task of the optimization function is to assess the quality of each solution, i.e. test, by a single value. Here, the quality is given by the amount of switching activity in the violated regions, i.e. the WSA value. The lower the switching activity is, the higher the quality of the solution.

The ATPG engine yields a test  $t_p^*$ , which detects all former essential faults and, at the same time, reduces the switching activity of the HCP regions as much as possible. In contrast to X-filling methods, the ATPG engine is able to explicitly select assignments which are beneficiary for WSA reduction in a very focused manner. In order to prevent new violated regions from appearing, the values not determined by the ATPG, i.e. the remaining X-values, are copied from the previous test  $t_p$ .

After the new improved test  $t_p^*$  has been generated, the test is analyzed with respect to switching activity. If it improves the power characteristics compared to  $t_p$ , i.e. no or a smaller number of regions are violated,  $t_p$  is replaced by  $t_p^*$ . Note that each new test has to be fault simulated to update the fault detection counters in order to obtain the correct set of essential faults.

It has also been observed that an iterative application of this loop is able to further eliminate HCP test vectors. This is possible since the set of essential faults for each test can be changed because of newly generated tests and their additional fault detections.

*Example 1:* The problem is illustrated in Figure 3. Although two time frames are generally necessary for transition testing, only one is depicted for the sake of simplicity. The analysis of one HCP test vector provides six essential faults  $f_1, \ldots, f_6$  which are marked with an "X" in the figure. Furthermore, two regions  $R_1$  and  $R_2$  are violated. All six faults and the HCP regions  $R_1$  and  $R_2$  are given to the LCP ATPG. A test is generated which detects  $f_1, \ldots, f_6$  and, at the same time, minimizes the amount of switching in  $R_1$  and  $R_2$ . The necessary values are updated as shown at the inputs of the circuits.

# B. Optimization-SAT formulation

This section describes the optimization-SAT problem formulation for the proposed LCP ATPG in detail. The formulation is partially similar to the multiple-target ATPG formulation which has also been used in [24] to improve test compaction.



Fig. 2. Low-Capture-Power ATPG Retargeting Flow



Fig. 3. Illustration of Target Regions

The formulation is divided into two parts: a SAT instance  $\Phi_t$  representing the solution space of all tests detecting a given fault set  $F_e$  and an optimization function  $\mathcal{F}_r$  which is used to guide the reasoning engine to find the best solution.

First, it is described how to construct a SAT instance  $\Phi_t$ for multiple-target fault detection for all faults  $f_e \in F_e$ . Since transition fault testing is conducted, two time frames  $t_1, t_2$ of the circuit have to be considered. That means, two Boolean variables  $x_1$  and  $x_2$  are assigned to each signal representing the logic value of  $t_1$  and  $t_2$ , respectively. Based on these variables, a CNF representing the logic function is created for each gate gfor both time frames.<sup>1</sup> Additionally, fault detection constraints for all target faults  $f_e \in F_e$  are added in CNF to  $\Phi_t$ . Detailed information on how to formulate these constraints can be found in [24].

The SAT instance  $\Phi_t$  is guaranteed to be satisfiable. This is because the target faults have already been shown to be compatible in the original test vectors. Typically, there is not only one solution to  $\Phi_t$ , i.e. one test, which detect all faults but multiple solutions. The challenge is to determine the best solution for the desired application, i.e. capture power reduction. Therefore, the optimization function  $\mathcal{F}_r$  is used to rate a solution and, by this, directs the search to a better solution until the best solution is found. Optimization-SAT solvers internally use effective techniques, e.g. learning, to traverse the search space.

In order to be able to automatically assess the quality during the solving process, the SAT instance  $\Phi_t$  has to be augmented by additional implications. The ultimate goal of the search is the reduction of the switching activity in the violated target regions  $R_p$ . First, all signals  $x_r$  in the target regions are identified and denoted by  $S_r$ . In order to automatically measure the switching activity during the reasoning, an additional Boolean variable  $x_S$  is assigned to each signal  $x_r$  in  $S_r$ . The variable denotes whether there is a transition on this signal between  $t_1$  and  $t_2$  or not. The following implications have to be added in CNF to ensure the correct assignment of each signal  $x_S \in S_r$ .

- $x_1 = 0 \land x_2 = 0 \rightarrow x_S = 0$
- $x_1 = 1 \land x_2 = 1 \rightarrow x_S = 0$
- $x_1 = 0 \land x_2 = 1 \rightarrow x_S = 1$
- $x_1 = 1 \land x_2 = 0 \rightarrow x_S = 1$

The implications are unidirectional and do not alter the original solution space. In fact, the assignments of the  $x_S$  variables are solely derived by implication and represent additional information which can be stored implicitly in the variable assignments. Note that it may happen that there are violated regions which are not included in the relevant circuit part of the target faults. Then, the circuit part or the SAT instance, respectively, has to be extended by those parts necessary for justification of these regions.

The variables  $x_S^1, \ldots, x_S^n$  are then used to formulate the optimization function in the form of a linear sum:

$$\mathcal{F}_r = c_1 * x_S^1 + c_2 * x_S^2 + \ldots + c_{n-1} * x_S^{n-1} + c_n * x_S^n$$

In this function, each variable  $x^S$  is associated with a constant value c. The result of the function is the accumulation of all

<sup>&</sup>lt;sup>1</sup>Note that only signals and gates relevant for target fault detection have to be considered, which is typically only a small portion of the complete circuit.

TABLE I. EXPERIMENTAL RESULTS - LCP ATPG

|            |      |         |        |          | Global LCP |        |       |             |             | Region/Global LCP |       |           | Region LCP |       |           |
|------------|------|---------|--------|----------|------------|--------|-------|-------------|-------------|-------------------|-------|-----------|------------|-------|-----------|
|            |      |         | WSA T  | hreshold | global     | region |       | Rem. global | Rem. region |                   |       | Remaining |            |       | Remaining |
| Circ       | #Pat | #Region | global | region   | #HCP       | #HCP   | Time  | #HCP new    | #HCP new    | #HCP              | Time  | #HCP new  | #HCP       | Time  | #HCP new  |
| s15850_lay | 157  | 64      | 2500   | 155      | 56         | 23     | 11.2  | 0           | 12          | 23                | 4.5   | 1         | 23         | 0.6   | 0         |
| s35932_lay | 129  | 196     | 14000  | 140      | 32         | 40     | 30.2  | 0           | 42          | 40                | 44.8  | 14        | 40         | 8.9   | 1         |
| s38417_lay | 197  | 156     | 11800  | 200      | 32         | 35     | 25.5  | 0           | 35          | 35                | 26.5  | 1         | 35         | 10.5  | 2         |
| s38584_lay | 325  | 140     | 7500   | 160      | 51         | 74     | 39.2  | 0           | 58          | 74                | 85.2  | 4         | 74         | 20.0  | 2         |
| b11_lay    | 147  | 16      | 800    | 105      | 20         | 22     | 0.1   | 3           | 15          | 22                | 0.2   | 3         | 22         | 0.1   | 1         |
| b12_lay    | 452  | 36      | 900    | 110      | 17         | 21     | 0.2   | 0           | 18          | 21                | 0.2   | 2         | 21         | 0.9   | 1         |
| b13_lay    | 54   | 30      | 500    | 50       | 2          | 0      | 0.1   | 0           | 0           | 0                 | -     | 0         | 0          | -     | 0         |
| b14_lay    | 934  | 132     | 9500   | 220      | 52         | 176    | 22.9  | 0           | 148         | 176               | 121.0 | 5         | 176        | 27.9  | 1         |
| b15_lay    | 2018 | 225     | 6500   | 280      | 397        | 76     | 245.0 | 3           | 73          | 76                | 51.1  | 7         | 76         | 10.1  | 7         |
| b17_lay    | 2192 | 441     | 20500  | 340      | 72         | 73     | 100.5 | 0           | 67          | 73                | 291.8 | 14        | 73         | 45.0  | 11        |
| b20_lay    | 1030 | 289     | 17800  | 280      | 103        | 126    | 97.8  | 0           | 122         | 126               | 156.9 | 5         | 126        | 73.3  | 0         |
| b21_lay    | 1074 | 289     | 17800  | 280      | 133        | 151    | 108.8 | 0           | 129         | 151               | 212.0 | 7         | 151        | 83.8  | 0         |
| b22_lay    | 1065 | 342     | 20000  | 290      | 350        | 249    | 592.2 | 7           | 197         | 249               | 601.1 | 16        | 249        | 345.8 | 5         |
| total      |      |         |        |          | 1317       | 1066   |       | 13          | 916         | 1066              |       | 79        | 1066       |       | 31        |

constants  $c_i$  whose associated variable  $x_S^i$  is assigned with 1. As a constant value, the number of fanouts of the associated signal is used to optimize the assignment with respect to WSA.

Using this optimization function  $\mathcal{F}_r$ , the quality of the solution corresponds to the WSA value of the target regions. The optimization function of an optimization-SAT solver is typically used for minimization. That means, the result of the solving process is the solution which provides the smallest WSA value and, at the same time, detects all target faults  $F_e$ . Obviously, the best solution would be to assign the value 0 to all  $x_S$  variables in the sum. However, this is typically not possible due to the fault detection constraints which necessitate a certain amount of switching to detect all target faults. Therefore, the test vector causing the minimum of switching activity in the target regions is directly computed by the underlying reasoning engine.

#### V. EXPERIMENTAL RESULTS

This section presents the experimental results of the proposed approach. The approach was implemented in C++ and the publicly available optimization-SAT solver clasp [25] was used as underlying reasoning engine. The experiments were performed on Intel Xeon E3-1240 (GNU/Linux, 64bit, 32GByte RAM, 3.4 GHz) in single-threaded mode. The circuits were routed and placed using a commercial tool to obtain the necessary layout information. The proposed approach was carried out on a pre-generated highly compacted test set generated by a SAT-based ATPG algorithm [24] using preferred-fill techniques [2]. The transition fault model (launch-on-capture) was used as fault model. Layout information in DEF was used to arrange the signals/gates in regions.

Table I gives the experimental data. The circuit name is given in column *Circ*. Note that the netlist may differ from the original benchmark netlist due to optimization done by the layout compiler. The number of tests of the target test set is presented in column *#Pat*. The number of regions (*#Region*), the global WSA threshold (*global*) as well as the region WSA threshold (*region*) are user-defined parameters. In order to obtain suitable thresholds, functional simulation of thousands of clock cycles was carried out on the netlists. Based on the WSA values obtained by this simulation, the thresholds were determined. Three different configurations were used during the experiments:

- *Global LCP ATPG* All test vectors exceeding the global WSA threshold *WT* were retargeted by using global WSA minimization, i.e. the number of regions is 1.
- *Region/Global LCP ATPG* All test vectors exceeding the region WSA threshold *RWT* were retargeted.

However, global WSA minimization is carried out on these tests for reasons of comparison.

• *Region LCP ATPG* – All tests exceeding the region WSA threshold *RWT* were retargeted. For eliminating the violated HCP regions, these are targeted directly in a focused manner as described in the previous section.

For the Global LCP ATPG, the number of the global HCP test vectors (column global #HCP) are given as well as the number of the region HCP test vectors (column region #HCP). The latter values are only given for reasons of analysis, but the tests are not targeted during the LCP ATPG. The LCP ATPG run time is given in column Time in CPU seconds. The results show that the global WSA minimization is very effective. Column Rem. global #HCP new presents the number of HCP test vectors after the LCP ATPG application. Nearly all global HCP test vectors, which violate the global WSA threshold WT can be eliminated and be replaced by a nonviolating test vector. The LCP ATPG is able to reduce the global switching activity. However, it is also shown that the switching activity reduction is done in an unfocused manner. Column Rem. region #HCP new presents the number of tests (after LCP ATPG application), which violate the regional WSA threshold RWT of at least one region. The number of regionviolating HCP test vectors is still very high. Only a few of them could be eliminated. This is due to the fact that many regionviolating HCP tests do not exceed the global WSA threshold. The general switching activity of these tests is therefore low but regionally concentrated which could lead to problems during testing. For s35932\_lay, there is even an increase of the region-violating HCP test vectors.

For example, the analysis of the test set for b22\_lay shows that the region HCP test with the lowest global WSA has a global WSA value of only 12262 which is far below average compared to the other tests of this circuit. However, one region is violated with a region WSA value of 321 using this test. Results for this kind of detailed analysis cannot be provided due to page limitation.

Next, the result of the *Region/Global LCP ATPG* and the *Region LCP ATPG* is discussed. Here, the region-violating HCP tests are retargeted by global WSA minimization. In contrast to the *Global LCP ATPG*, a large number of the region HCP tests can be eliminated by the proposed procedure, since these are directly targeted. This clearly shows that a regional power analysis of the test set is needed. However, the results of the global WSA minimization can be further improved by the proposed *Region-based LCP ATPG*. If the regional information is used during the LCP ATPG process, the WSA reduction can be applied in a more focused manner.

Only 3% of the initial 1066 HCP test vectors remain above the threshold *RWT*. The region-based LCP ATPG not only eliminates more HCP test vectors but also needs substantially less run time. This is because the optimization function of the global WSA minimization is much larger than the regional one. This leads to a significant run time increase. Generally, this approach scales very well and the run time depends on the number of HCP test vectors.

In summary, the experiments show the effectiveness of the LCP ATPG approach. Most of the global as well as region high-capture-power test vectors can be eliminated without increasing the pattern count or decreasing the fault coverage. It is further shown that the global switching activity is not a good indicator for region HCP test vectors and an LCP ATPG approach should be applied targeting violating regions directly. Although the majority of HCP test vectors could be eliminated, very few remain. Future work will be the development of a LCP ATPG approach, which is able to directly target the remaining faults detected by these tests and produce a compact set of additional capture-power-safe tests as replacement. Another work is the integration of more specific target metrics such as PWSA as used in [26] and the application in a test-compression based design.

# VI. CONCLUSIONS

This paper presents a new Low-Capture-Power (LCP) ATPG approach for retargeting pre-generated compact test sets to improve their power characteristics. Previous approaches typically suffer from test data inflation while reducing the switching activity. The proposed ATPG formulation as an optimization-SAT problem is able to considerably reduce the switching activity either globally or regionally. Regional information from the layout is directly incorporated in the ATPG process in order to target HCP regions directly. In contrast to conventional X-filling methods, the ATPG procedure is able to assign bits more freely due to the larger search space. By this, most of the High-Capture-Power (HCP) tests can be eliminated without any pattern count increase and without any fault coverage loss. It can also be concluded that the global unfocused switching activity reduction is not able to prevent concentrated switching activity in neighboring areas. Focused switching power reduction as done in this work is clearly needed to obtain power-safe test sets. Future work is the integration of more specific metrics as well as the development of an LCP ATPG approach to directly produce capture-powersafe tests.

#### VII. ACKNOWLEDGMENT

The work of S. Eggersglüß has been supported by the Institutional Strategy of the University of Bremen, funded by the German Excellence Initiative and by the German Research Foundation (DFG) under contract number EG 290/5-1. The work of K. Miyase and X. Wen has been supported by JSPS Grant-in-Aid for Scientific Research (B) #25280016 and JSPS Grant-in-Aid for Scientific Research on Innovative Areas #15K12003

#### References

- P. Girard, "Survey of low-power testing of VLSI circuits," *IEEE Design* & Test of Computers, vol. 19, no. 3, pp. 82–92, 2002.
- [2] S. Remersaro, X. Lin, Z. Zhang, S. M. Reddy, I. Pomeranz, and J. Rajski, "Preferred fill: A scalable method to reduce capture power for scan based designs," in *International Test Conference*, 2006, pp. 1–10.
- [3] E. K. Moghaddam, J. Rajski, S. M. Reddy, and M. Kassab, "At-speed scan test with low switching activity," in VLSI Test Symposium, 2010, pp. 177–182.

- [4] X. Wen, K. Miyase, T. Suzuki, S. Kajihara, Y. Ohsumi, and K. K. Saluja, "Critical-path-aware X-filling for effective IR-drop reduction in at-speed scan testing," in *Design Automation Conference*, 2007, pp. 527–532.
- [5] W.-W. Hsieh, S.-L. Chen, I.-S. Lin, and T. Hwang, "A physical-locationaware X-filling method for IR-drop reduction in at-speed scan test," *IEEE Transactions on Computer-Aided Design of Integrated Circuits* and Systems, vol. 29, no. 2, pp. 289–298, 2010.
- [6] S. Eggersglüß, "Dynamic X-filling for peak capture power reduction for compact test sets," *Journal of Electronic Testing: Theory and Applications*, vol. 30, no. 5, pp. 557–567, 2014.
- [7] X. Wen, S. Kajihara, K. Miyase, T. Suzuki, K. Saluja, L.-T. Wang, K. Abdel-Hafez, and K. Kinoshita, "A new ATPG method for efficient capture power reduction during scan testing," in *VLSI Test Symposium*, 2006, pp. 58–65.
- [8] M.-F. Wu, K.-S. Hu, and J.-L. Huang, "LPTest: a flexible low-power test pattern generator," *Journal of Electronic Testing: Theory and Applications*, vol. 25, no. 6, pp. 323–335, 2009.
- [9] A. Kumar, M. Kassab, E. Moghaddam, N. Mukherjee, J. Rajski, S. M. Reddy, J. Tyszer, and C. Wang, "Isometric test compression with low toggling activity," in *International Test Conference*, 2014, pp. 1–7.
- [10] K. Chakravadhanula, V. Chickermane, B. Keller, P. Gallagher, and P. Narang, "Capture power reduction using clock gating aware test generation," in *International Test Conference*, 2009, pp. 1–9.
- [11] L.-J. Lee, "Observation-oriented ATPG and scan chain disabling for capture power reduction," *Journal of Electronic Testing: Theory and Applications*, vol. 29, no. 5, pp. 625–634, 2013.
- [12] A. Touati, A. Bosio, L. Dilillo, P. Girard, A. Virazel, P. Bernardi, and M. Sonza Reorda, "Exploring the impact of functional test programs re-used for power-aware testing," in *Design, Automation and Test in Europe*, 2015, pp. 1277–1280.
- [13] S. Wang and S. K. Gupta, "ATPG for heat dissipation minimization during test application," in *International Test Conference*, 1994, pp. 250–258.
- [14] X. Wen, Y. Nishida, K. Miyase, S. Kajihara, P. Girard, M. Tehranipoor, and L.-T. Wang, "On pinpoint capture power management in at-speed scan test generation," in *International Test Conference*, 2012, pp. 1–8.
- [15] Y.-H. Li, W.-C. Lien, I.-C. Lin, and K.-J. Lee, "Capture-power-safe test pattern determination for at-speed scan-based testing," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 33, no. 1, pp. 127–138, 2014.
- [16] V. R. Devanathan, C. P. Ravikumar, and V. Kamakoti, "Glitch-aware pattern generation and optimization framework for power-safe scan test," in VLSI Test Symposium, 2007, pp. 167–172.
- [17] K. Enokimoto, X. Wen, Y. Yamato, K. Miyase, H. Sone, S. Kajihara, M. Aso, and H. Furukawa, "CAT: a critical-area-targeted test set modification scheme for reducing launch switching activity in at-speed scan testing," in *IEEE Asian Test Symposium*, 2009, pp. 99–104.
- [18] J. Lee and M. Tehranipoor, "Layout-aware transition-delay fault pattern generation with evenly distributed switching activity," *Journal of Low Power Electronics*, vol. 4, no. 3, pp. 1–12, 2008.
- [19] S. Kiamehr, F. Firouzi, and M. B. Tahoori, "A layout-aware X-filling approach for dynamic power supply noise reduction in at-speed scan testing," in *IEEE European Test Symposium*, 2013, pp. 52–57.
- [20] S. Eggersglüß and R. Drechsler, "Efficient data structures and methodologies for SAT-based ATPG providing high fault coverage in industrial application," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 30, no. 9, pp. 1411–1415, 2011.
- [21] A. Czutro, M. Sauer, I. Polian, and B. Becker, "Multi-conditional SAT-ATPG for power-droop testing," in *IEEE European Test Symposium*, 2012, pp. 1–6.
- [22] M. Sauer, S. Reimer, T. Schubert, I. Polian, and B. Becker, "Efficient SAT-based dynamic compaction and relaxation for longest sensitizable paths," in *Design, Automation and Test in Europe*, 2013, pp. 448–453.
- [23] J.-S. Chang and C.-S. Lin, "Test set compaction for combinational circuits," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 14, no. 11, pp. 1370–1378, 1995.
- [24] S. Eggersglüß, K. Schmitz, R. Krenz-Bååth, and R. Drechsler, "Optimization-based multiple target test generation for highly compacted test sets," in *IEEE European Test Symp.*, 2014, pp. 1–6.
- [25] M. Gebser, B. Kaufmann, A. Neumann, and T. Schaub, "Conflict-driven answer set solving," in *International Joint Conference on Artificial Intelligence*, 2007, pp. 386–392.
- [26] F. Bao, M. Tehranipoor, and H. Chen, "Worst-case critical-path delay analysis considering power-supply noise," in *IEEE Asian Test Sympo*sium, 2013, pp. 37–42.