# Reliability Aware AMS/RF Performance Optimization

# Pietro M. Ferreira<sup>1</sup>, Hao Cai<sup>2</sup>, Lirida Naviner<sup>2</sup>

<sup>1</sup> IEMN,UMR CNRS 8520 - Dep. DHS, Université de Lille 1, France; <sup>2</sup>LTCI-CNRS-UMR 5141 – Dep. COMELEC, TELECOM ParisTech, France

# ABSTRACT

Reliability has become an important issue in a continuously CMOS technology scaling down. The exploration of the technology limits using classic performance optimization techniques leads to the best trade-off for: die area, power consumption, and speed. Nevertheless, such key characteristics have been degraded in a context of continuous use and stressful environment. Thus, circuit reliability emerges as a design criterion for AMS/RF performance optimization. Aiming a design for reliability, this chapter presents an overview of CMOS unreliable phenomena. Reliability-aware methodologies for circuit design, simulation and optimization are reviewed. We focus in particular for large and complex systems providing circuit design insights to achieve a reliability specification from system-level to transistor-level. We highlight the more sensitive building blocks in CT  $\Sigma\Delta$  modulator and demonstrate how performance is affected by unreliable phenomena. A system-level direct-conversion RF front-end design is described in top-down approach. Electrical simulations are presented with 65 nm CMOS technology.

#### INTRODUCTION

Towards to nanoscale integrated circuit (IC), modern ICs' technology has emerged new reliability challenges. The next generation of analog-mixed signal (AMS) and radio frequency (RF) circuits will encounter an increasing failure rate during all circuit operation-time. Such a drawback will be responsible of a reducing circuit lifetime as a pay-off for an increasing IC performance. A new design challenge emerges that motivate the research field of reliability aware AMS/RF performance optimization.

Exploiting the technology limits, classic design methodologies often look for the basic design criteria: die area, power consumption, and speed. The optimum is the design point where we have the specified performance. Since elusive physical phenomena emerged, designers start to establish some design margins in order to guarantee specified performance priority. Such design techniques lead to a non-optimum circuit by the presence of exaggerated redundancy, the use of overdesigned margins, and postulated design recommendations without any physical phenomena insight.

The increasing IC variability has proved to be big enough to find a large number of chip samples with performance far away from the specification. This phenomenon is known as yield reduction. The yield can be defined as the ratio of chip samples which meet the design specifications and of all chip samples in a context of a complete production process. However, yield concept cannot measure the number of chip samples which still meet the design specifications in a context of continuous use under a known environment condition.

During IC operation time, a number of physical phenomena may affect the circuit performance generating transient faults. Harsh environments present a thread for ICs, because transistors are much sensitive to alpha and neutron particles strikes, crosstalk, Electrostatic Discharge (ESD), and temperature variation. Nevertheless, transient faults concept is only applicable when performance degradation is very low probable. If such a performance degradation is increasing in time by an accumulative effect, we may

define this moment as the circuit lifetime after which such a circuit is no longer useful since the specified performance cannot be guaranteed.

The IC ageing (also named wareout) is a cause of performance degradation under stressful environment condition during a period of time. The specified period of time is including a time-varying concept into IC performance quality. If such a period of time is zero and so it is the complete production process moment; we measure the circuit yield. If the circuit performance quality drops out of the specification, the time when it occurs is defined as the circuit lifetime. Combining stressful environment condition and the circuit lifetime, the reliability is defined as the ability of a circuit to conform to its specifications over a specified period of time and under specified conditions.

In order to evaluate the reliability of a circuit, we assume that a circuit is composed of n statistical identical and independent components that were put into operation at same time (t = 0). The empirical reliability of an IC can be defined according to

$$\widehat{R}(t) = \frac{u(t)}{n}, (1)$$

where u(t) represents how many parts did not failed yet at t. It can be noted that the behavior of u(t) is a continuous decreasing step function. A direct application of the law of large numbers  $(n \to \infty)$  yields that  $\hat{R}(t)$  converges to the reliability function R(t) (Birolini, 1994).

The empirical rate of failure-in-time (FIT) is given by

$$\widehat{FIT}(t) = \frac{u(t) - u(t+\tau)}{u(t) \cdot \tau} = \frac{\widehat{R}(t) - \widehat{R}(t+\tau)}{\widehat{R}(t) \cdot \tau}$$
(2)

converges to the failure-in-time expressed by

$$FIT(t) = \frac{1}{R(t)} \cdot \frac{dR(t)}{dt} (3)$$

for  $n \to \infty$ ,  $\tau \to 0$  and  $n \cdot \tau \to 0$  (Birolini, 1994). Considering that at t = 0, the circuit executes its functions perfectly, that means R(0) = 1. In this case, the reliability function can be defined as

$$R(t) = e^{-\int_0^t FIT(\tau)d\tau}.$$
 (4)

Analyzing the equation (4), it can be seen that the reliability function depends of the behavior of FIT(t). In fact, FIT(t) has a typical shape represented in Figure 1. This curve is denominated *bathtub* curve due to its shape and it is described by three parts:

- *Decreasing Failure Rate* the failures that occurs when the circuit is first introduced as a result of momentary weakness in materials or in the item's production process, mostly derived from IC variability. Such characteristic have not been much affected in nanoscale technologies.
- Constant Failure Rate the period when FIT(t) can be approximated by a constant. Since nanometer transistors have a smaller sensitive area and the rate of transient fault is not dependent of the technology shrinking, it has been noticed an increasing FIT(t) in nanoscale technologies.
- *Increasing Failure Rate* the end of the circuit's operation, the circuit lifetime is achieved due to ageing (wearout) degradation. In nanoscale technologies, it has begun earlier and is strongly dependent on circuit topology and stress environment. The transition of *Constant Failure Rate* period and *Increasing Failure Rate* period marks the circuit lifetime. Such a circuit lifetime has been reduced in advanced technologies since shrunk sizes leads to more important reliability degradations (see Figure 1).

Understanding the importance of circuit reliability in nanoscale AMS/RF ICs, we propose a large study of known design for reliability methodologies and a formal implementation of reliability aware AMS/RF

performance optimization. Insights of physical phenomena are detailed in order to explain reliability degradation at transistor-, circuit-, and system-level. Tools for modelling and analysis of AMS/RF circuit reliability are highlighted in circuit design context. Thus, this chapter proposes AMS/RF performance optimization including circuit reliability as a design criterion. Finally,  $\Sigma\Delta$  modulator and RF front-end implemented with 65 nm CMOS technology illustrate how we include reliability constraint into the performance optimization.



Figure 1. Bath Tub Curve representing the typical shape of the failure rate of a circuit composed of n statistical independent parts (adapted from NASA, 2008).

# BACKGROUND

Since ICs technologies enters in nanoscale era, design for reliability has become an interest topic discussed in the literature. We review previous designs, methodologies, and relevant work in reliability evaluation of AMS/RF systems. A large amount of work in ICs reliability focus on ageing mechanisms and process variations at transistor and circuit level in a bottom-up approach. Few articles reported reliability study of large and complex systems for a top-down approach. Table 1 resumes the dominant work in recent years, which included reliability-aware methodologies and reliability analysis.

 Table 1. Literature review of reliability study in SDM and other types of ADC. N/A for CMOS node is not specified.

| Reference            | Unreliability             | Applications                                 | IC         | Methodology          |
|----------------------|---------------------------|----------------------------------------------|------------|----------------------|
|                      | phenomena                 |                                              | technology |                      |
| Singh<br>(2003)      | Transient faults          | Flash, SAR and ΣΔ<br>modulator ADCs          | N/A        | Sensitivity analysis |
| Yu (2006)            | Parameter<br>Fluctuations | DT $\Sigma\Delta$ modulator                  | N/A        | Statistical LUT      |
| Andrejevic<br>(2006) | Fault Diagnosis           | Digital part of $\Sigma\Delta$ modulator ADC | N/A        | Catastrophic defects |

| Tiwary<br>(2007)    | Process Variation                             | CT $\Sigma\Delta$ modulator                        | 180 nm         | Hierarchical and statistical methods                   |
|---------------------|-----------------------------------------------|----------------------------------------------------|----------------|--------------------------------------------------------|
| Ruberto<br>(2008)   | Ageing effects,<br>ESD, and thermal<br>issues | Wi-Fi PA                                           | 90 nm          | Transistor simulation<br>and measurements              |
| Yan (2009)          | HCI, NBTI, and<br>TDDB                        | Flash ADC                                          | 90 nm          | Failure-in-time<br>(FIT), Failure<br>prediction        |
| Quémerais<br>(2010) | HCI                                           | 60 GHz PA                                          | 65 nm          | Transistor simulation<br>and measurements              |
| Ferreira<br>(2011)  | Process Variation,<br>HCI and NBTI            | RF front-ends                                      | 65 nm          | Bottom-up design                                       |
| More (2011)         | HCI, NBTI                                     | SAR-ADCs, DT- $\Sigma\Delta$ modulators            | 32 nm          | Hierarchical<br>methodology                            |
| Maricau<br>(2011)   | Process Variation,<br>HCI, NBTI, and<br>TDDB  | LC-VCO, current<br>mirror , one-stage<br>amplifier | 90 nm,<br>65nm | Stochastic analog<br>circuit reliability<br>simulation |
| Cai (2013)          | Process Variation<br>HCI and NBTI             | CT- $\Sigma\Delta$ modulators                      | 65 nm          | Hierarchical and statistical methods                   |

Singh (2003) first proposed transient fault sensitivity analysis and reliability enhancement of analog-to-digital converters. Sensitivity analysis is used to grade blocks based on their sensitivity to alphaparticle faults. For  $\Sigma \Delta$  ADC, it presented that the digital decimation filter is the most critical block. Reliability enhancement is implemented by using the transient

Yu (2006) proposed a methodology used in DT  $\Sigma\Delta$  ADC. Each building block is modeled by lookup tables (LUT) to assess performance variations due to parameter fluctuations. To address the issue of parametric process variations, RSM is used to construct parameterized LUTs in key process variables and facilitate efficient statistical analysis of  $\Sigma\Delta$  ADC.

Andrejevic (2006) examined effects of delay and catastrophic defects in Sigma-delta ADC. The artificial neural network (ANN) is presented to diagnosis of defects in the digital part of a nonlinear mixed-mode circuit.

Tiwary (2007) presented a hierarchical method for statistical analysis of performance variations in CT  $\Sigma\Delta$  modulator. A 180 nm 4th order modulator was studied with DoEs and RSMs. This method can characterize the circuit-level performance parameters and relate them to design parameters in a high level behavioral model.

Ruberto (2008) shows reliability assessments of an integrated power amplifier (PA) design for validating the ruggedness of the design in CMOS 90 nm. A PA design methodology is presented in order to mitigate ageing effects, ESD, and thermal issues for deployment into new generation of RFIC radio chip/platform. Experimental results validate the proposed methodology.

Yan (2009) studied HCI, NBTI and TDDB degradation to 90 nm Flash ADC. This work is based on applying MTTF and FIT to quantitatively prediction of ageing mechanisms. It performed circuit failure analysis and critical block selection. The MTTF and FIT of ADC is estimated with a reliability model at system level. It concluded that PMOS transistor is the reliability-critical device and NBTI is the most critical failure mechanism for ADC under normal operation in submicrometer CMOS technology.

Quémerais (2010) investigates the effects of DC HCI stress on the characteristics of CMOS 65 nm transistors and proposes a transistor-level modelling. This reliability study is applied for a 1 stage PA. This work highlights a drop of 5% the gain, 7% of the output compression-point 1dB, and 7% of the saturation power. Measurement results are presented for 58 GHz after 50 hours of stress under  $V_{DD} = 1.7$  V on a 4 stages PA.

Ferreira (2011) studied reliability in RF front-end. Such work proposed a new design flow for AMS/RF circuits with the aim to improve the reliability. Failure evaluations, design space exploration and sensitivity analysis have been directed towards ageing effects and process variation in CMOS 65 nm transistors. Top-down and bottom-up reliable design of RF front-end are also included in the study.

More (2011) completed the Ph.D work on ageing mechanisms study of AMS circuits and systems at 32 nm CMOS node. A fully differential DT switch-capacitor (SC) Sigma-Delta ( $\Sigma\Delta$ ) modulator is done in Cadence and Simulink environment. The analog noise-shaping circuit is implemented in Cadence environment using 32 nm high-k metal gate CMOS technology except for the multi-bit quantizer and DAC. It presented that the offset in the first integrator stage outweighs by far the impact of aging on modulator degradation. Ageing induced gain errors in DAC also degrades modulator performance. The impact of ageing degradation in the multi-bit quantizer and the DAC circuit was not significant due to its low resolution.

Maricau (2011) discusses the state of the art in the design for reliability of analog circuits. This work describes the most important spatial and temporal transistor unreliability effects in submicrometer CMOS technologies. Also, accurate transistor compact models for each of these effects are reviewed for a bottom-up approach. Moreover, this work highlights novel and efficient approaches to analyze the reliability at system-level in a top-down approach. Finally, some circuit solutions towards guaranteed failure-resilient AMS circuits illustrates design for reliability methodologies.

Cai (2013) concentrates on reliability-aware methodology development, reliability analysis based on simulation as well as failure prediction of CMOS 65 nm  $\Sigma\Delta$  modulators. A hierarchical statistical approach for reliability study at system level is proposed to analysis the performance of  $\Sigma\Delta$  modulators under ageing effects and process variations. Statistical methods including correlation analysis, design of experiments, regression analysis and response surface modeling are combined into the analysis flow presented in such work.

### UNRELIABILITY PHENOMENA

#### **Spatial Phenomena**

Continuously scaling down of CMOS technology induces spatial reliability phenomena also known as process variations and mismatch. Process variation is defined as the deviation from designed values for a layout structure or circuit parameter. These variations may differ in their causes and consequences. But in general, the circuit variability is the set of the process parameters variations, misalignments of the integration masks, and the consequences in the circuit performance among different samples. Although the fabrication techniques have been greatly developed, parametric process variations can still shift circuit performance and influence the yield.

We focus on parametric process variation due to manufacturing process variation, global variation, inter-die variation, which appears on fabrication steps of ICs manufacturing. Such physical factors are mainly Random Dopant Fluctuations (RDF) and Line Edge Roughness (LER). The IC variability can be divided into two main categories: systematic and random variations. Repeatable electrical characteristics variations between two identical designed transistors are systematic variations. For IC random variations,

they can be further categorized as inter-die variations (also defined as global variation between lot-to-lot, wafer-to-wafer and die-to-die) and intra-die (within-die or local) variations.

Such process variations are essentially permanent, and they have often been analyzed at the global variation. Inter-die variation is represented by a deviation in the parameter mean of the circuit. Mutlu (2005) points out that the inter-die variation is generally much larger than within-die (WID) variations. Variations often modeled are: BSIM transistor parameters, connection width and length; and passive devices resistivity and permittivity. In order to quantify such variations, parameter distribution are obtained using silicon measurements from a large number of randomly selected devices across chips from different wafers (or and lots). Then, the mean and variance are estimated from the approximately normal distribution of these devices. In this approach, named the lumped statistics (Forzan, 2009), details of physical sources of these variations are not considered. The combined set of underlying deterministic as well as random contributions is simply lumped into a random statistical description.

In the advanced technologies, such physical factors are also responsible for an increasing local parameters variation. In fact these days, ICs are large enough that the old global sources of variation become local parameters variations as is WID. WID variation contributes to the loss of matched behavior between nearby structures. Individual MOS transistors or/and segments of signal wires may vary differently from designed or nominal values (e.g. RDF). They may also differ unintentionally from each other (e.g. LER). Layout dependencies may drive additional variations that are inherent to the component and its neighborhood favoring WID. That is why a careful full custom layout is mandatory for high performance RF and analog circuits. In such cases, WID can be effectively avoided by RF and analog circuits with symmetrical design.

RF and Analog circuits are often sensitive to WID induced mismatch (e.g., offset voltage in symmetrical current mirror). Comparatively, WID variation is more critical in such circuits than in digital circuits. Since the output of digital circuits depends on the value of transistor currents that charge or discharge the output capacitance. Because of this fact, digital circuits are most sensitive to transient faults.

#### **Transient Faults**

Transient faults, also known as soft errors, are the most important unreliability phenomenon in digital electronics. Random in their nature, transient faults are caused by alpha particles, cosmic rays, electromagnetic interference, and others. When a radiation ionizing particle strikes a circuit, the silicon substrate interacts with such particle generating a cylindrical track of electrons-holes pairs. Such charges reorganize themselves following the electrical field present on the p-n junction. Very quickly, electrons form a drift current resulting in a current pulse at the junction. Eventually, the ion-track collapses and a secondary mechanism of charge diffusion takes place. Then, electrons form a diffusion current resulting in a current plateau absorbed by the junction.

In spite of the different characteristics of drift and diffusion currents, both results in carrier collection at the junction causing a Single Event Effect (SEE). If such charges are bigger than a critical charge, such transient current is capable of damaging the p-n junction in a phenomenon known as Single Event Latchup. If no damage is done, both currents will be integrated by node capacitances converting in a node-voltage variation. If such a node-voltage variation is bigger than a threshold voltage, such SEE can shift a logic level. Single Event Transients (SETs) are transient pulses in a combinational logic of not valid logic levels generated after a SEE. If a SEE occurs in a sequential element or if a SET achieves a sequential element, Single Event Upsets (SEUs) are generated and a bit-flip is stored in a memory element.

Hopefully, digital circuits have particular masking properties which reduce the consequences of transient faults. Electrical masking is a combinational circuit characteristics of electrical pulse attenuation by subsequent gates in the logic path until the pulse drops below the threshold voltage. Thus, SETs are not propagated in a long enough logic path. Temporal masking, also known as latching-window masking, is a sequential circuit characteristic of data sampling only in a time frame established by a clock. If SEEs happen

out of such time frame a not valid logic level will not be stored in a memory element and will not result in a SEU. Logical masking is a combinational circuit characteristic able to neglect SEEs in a presence of specific gate-inputs forcing an output level due to gate truth-table (e.g. a valid 0 logic-level in the NAND gate input will force a 1 logic-level at the gate output, which cannot be influenced by SEEs in other inputs).

#### **Environmental Phenomena**

ICs performance are highly dependent on the working environment. Environmental phenomena arise during the circuit operation, and include variations in power supply voltage ( $V_{DD}$ ), temperature, workload dependency and noise coupling from electromagnetic compatibility (EMC). They depend on time, schematic topology, and external agents. Environmental effect (also named dynamic variations, temporal variations) impacts performance parameters during ICs usage, and thus the circuit reliability.

Typical  $V_{DD}$  and temperature fluctuations lead to transistor characteristics variation since such fluctuations are able to generate electron-hole pairs. Thus, induced charges result in transistor drain current and transconductance fluctuations having similar consequences as spatial phenomena. Commonly process-voltage-temperature (PVT) variations are recognized by designers, combining process variations and environmental variations. Short-term and long-term workload dependent runtime variations exist in ICs operation time. Therefore, dynamically adapting a circuit to the behaviors of its workloads is necessary.

On the other hand, EMC has become a major cause of IC redesign, mainly due to inadequate design methods and lack of expertise in parasitic noise reduction and immunity improvement (Ramdani, 2009). In such designs, digital switching noise degrades the ground reference voltage, causing a systematic  $V_{DD}$  fluctuation. Moreover, RF transmission and reception signals may positive feedback causing circuit oscillation or may be self-mixed leading to non-linearity issues.

Environmental phenomena are a main concern in circuit reliability, because they are able to generate and even amplify other phenomena.  $V_{DD}$  and temperature stress favor ageing phenomena. EMC is able to generate current spikes which weak spots may capture as SEE. Workload dependent runtime variations are capable to speed up ware-out failures. Therefore, environmental phenomena should be observed together with transient faults and ageing phenomena.

#### **Ageing Phenomena**

During early operation time, spatial variations are the most important phenomenon during decreasing ICs failure rate. Before going to market, manufactures assure stress tests (often referred as circuit burn-in) to reduce IC's infant mortality. During useful life, circuits suffers from a constant failure rate due to transient faults. When IC wear-out and ageing takes place, it is achieved the circuit lifetime and such circuit is often replaced. Although nanometer technologies profit of faster and smaller transistors, (NASA, 2008) points that technology scaling causes ageing much earlier than older technologies. It is expected a circuit lifetime smaller than 10 years in advanced technologies as 45 nm and 28 nm nodes.

In this scenario, ICs designers are obliged to push on the circuit lifetime through an increasing failure rate. This means including ageing variations as design margins to control circuit reliability. The main ageing phenomena are: Bias Temperature Instability (BTI), Hot Carrier Injection (HCI), Time-Dependent Dielectric Breakdown (TDDB), and Electromigration (EM). The first three phenomena concerns the MOS transistor and take place under environmental stress conditions which create crystalline structure defaults. Essentially, such defaults are electron-hole pairs able to conduct or trap charges in the gate-oxide interface. The latest is a phenomenon that takes place under metal connections.

BTI refers to the generation of oxide charge and interface traps at the  $Si-SiO_2$  interface in CMOS transistors. BTI is favored by increased gate bias stress and mostly at elevated temperature. Under p-type transistors BTI is known as negative bias temperature instability (NBTI), and conversely a positive bias temperature instability (PBTI) under n-type transistors. Incidental interface trap can appears in anywhere forming a density of Si-SiO<sub>2</sub> interface traps under the transistor gate. Such traps reduce the available charges

inside the transistor channel being often perceived as threshold voltage shift. A BTI aged transistor has a channel less inverted than fresh transistor during its useful life until it does not conduct anymore. Because its physical nature, such traps are generated under stress but part of such traps can be recovered when bias and temperature are reduced. Recent design for reliability methods have made efforts to maximize the recovery time in spite the stress conditions required for a proper operation (Maricau, 2011).

HCI, also named hot carrier degradation (HCD), channel hot carrier (CHC) or channel hot-electron (CHE) injection; is a phenomenon of charge energy rise up to overcome semiconductor potential barrier, becoming a hot carrier, and migrating to a different area of the device. Such high energy charges are often collected by the channel electric field flowing to the transistor drain. There, they are able to produce Si-SiO<sub>2</sub> interface traps which accumulates near the transistor drain forming an unrecoverable high density area of interface traps. Carrier high energy charging is an inherent phenomenon of an inverted transistor channel. The only way of reducing hot carrier generation is reducing transistor bias to operation modes in moderate-or weak-inversion. Such a solution is not always feasible for gate lengths less than 50 nm, for high power or for high frequency applications. A HCI aged transistor exhibits a shift of its characteristics over time (e.g., threshold voltage ( $V_{th}$ ) and electron mobility ( $\mu$ )).

TDDB is a temporal stochastic oxide damage caused by defects generation inside gate dielectric, which initiates the generation of traps in random positions inside Si-SiO<sub>2</sub> interface. The stress induced leakage current (SILC) forms a percolating path through the gate dielectric reducing the gate-to-substrate resistance until its breakdown (BD) (Stathis, 1999). The cumulative failure probability of a BD is described using the Weibull distribution. According to oxide thickness, BD may occur in different modes being:

- Hard-breakdown (HBD) due to a catastrophic failure and device destruction,
- *Soft-breakdown* (SBD) due to ohmic path generation, an aged transistor exhibits an increased gate leakage current.

Located at metal interconnections, EM is material erosion favored by excessive current density stress. Under such condition, electrons interact with the ions of the metal shifting their positions. The metal ions diffusion may cause open- and short-circuits. EM phenomenon must be considered in the layout design by sizing widened wires to reduce current density. In nanometer technologies, EM remains a concern as technology scaling down leads to grow interconnect current densities. Currently, there is no universal model to analysis and predict EM phenomenon (Bernstein, 2006). EM mitigation is done by a maximum current density imposed in the design for manufacturing rules.

# **RELIABILITY ANALYSIS TECHNIQUES**

# **Reliability prediction under transient faults**

Since digital circuits are the most abundant in integrated circuits, transient faults are the most common unreliability phenomena. Digital-circuit reliability prediction tools analyze masking properties to evaluate transient-fault's consequences from gate to system level. Thus, a reliability prediction of transient faults advances with the accuracy of high-level modeling under low-level issues. Reliability analysis of digital circuits can be classified as fault injection and analytical tools.

Fault injection tools are based on a simple and intuitive approach of reliability estimation. While transient faults are injected in logic gates, the circuit outputs are verified against its truth table. If the output is incorrect then a failure is detected from the tested input-data and the fault injected, else the fault is considered as masked. The circuit reliability prediction comes from comparing masked and unmasked faults. Circuit-fault injection mechanism can be accomplished using:

• *Simulation-based injection* - consists in comparing the circuit outputs of a fault-free version and a faultprone version, where nodes are picked and their logic level is shifted in order to simulate SETs and SEUs. This mechanism is simple and requires only a traditional simulation engine, but it is usually time consuming. The accuracy of such a mechanism depends on the simulation of large number of scenarios, including multiple faults and all possible inputs.

- *Emulation-based injection* uses the same principle of simulation-based injection, but makes use of reconfigurable platforms (e.g. FPGA) to implement the fault free device under test (DUT) and its fault prone version. The fault-prone version is obtained from the DUT by adding saboteurs at each node we are interested to control. Saboteurs are logic structures allowing to modify the logic value of a node. Such a mechanism is able to obtain faster results with same accuracy. As a drawback, emulation-based injection can be hardware expensive depending of the number of saboteurs.
- *Physical injection* relies on using physical fault generators (eg. laser) to inject faults in the fault prone DUT. Compared to the previous ones, this strategy is less suitable for prediction of the reliability during the design. Indeed, it is only applicable to the circuits already manufactured.

Analytical tools have been developed in order to increase the reliability prediction efficiency and reduce costs. In early design phases, analytical tools have lower costs than fault injection mechanism. Also, they achieve better simulation efficiency than simulation-based injection. Analytical tools possess their own limitations mostly proposing a trade-off between reliability accuracy and time consumption. As a general rule, analytical tools can estimate the reliability of combinational logic mostly by modelling the logic masking present in such a digital circuit. A brief description of the state of the art is presented in this section, details can be found in the works of Franco (2008), Junior (2012), Ban (2012), and Pagliarini (2013).

• *PTM* - Probabilistic Transfer Matrices is a simple method of modeling logic gates and circuit topology through matrices. For each gate, inputs and outputs patterns are correlated in the truth table to represent the probability to have a correct operation by q and an error by (1-q). The reliability of a logic gate (or a circuit) is obtained the sum of the probabilities of correct outputs in its PTM appropriately weighted by the probabilities of inputs. Assume that a circuit is divided as a cascade of N sets of gates. Denote  $L_i$  the set of gates of the i element in such cascade. Moreover, consider that each level  $L_i$  contains m logic gates in parallel. The PTM of  $L_i$  is given by

$$PTM_{Li} = PTM_{G1} \otimes PTM_{G2}(...) \otimes PTM_{Gm}, (5)$$

and the PTM to the whole circuit is given by

$$PTM_{circuit} = \prod_{i=0}^{N} PTM_{Li}.$$
 (6)

Thus, PTMs are propagated through each circuit level by performing Kronecker (5) (for blocks in parallel) and inner (6) (for blocks in series) products of basic PTMs. In spite of the accurate estimation of circuit reliability, PTM method has an exponential complexity with the number of inputs and outputs, being infeasible even for medium-sized circuits.

• SPR – Signal Probability Reliability is a method based on the representation of a node by a matrix 2 by 2, where each element is related to the probability it has correct or incorrect logic value. The reliability is given by the sum of probabilities for correct values

$$R = P(0_{correct}) + P(1_{correct}). (7)$$

Based in PTM method, the Signal Probability Reliability is a family of methods where signals are represented by

$$P_{2X2} = \begin{bmatrix} P(0_{correct}) & P(0_{incorrect}) \\ P(1_{incorrect}) & P(1_{correct}) \end{bmatrix} (8)$$

The analysis supposes that matrices of primary inputs are known. The output matrix of a given logic function comes from the product between its PTM and the Kronecker product of its inputs. SPR has an improved performance in comparison to PTM method because its algorithm complexity is linear with

the number of gates. Nevertheless, SPR can lead to inaccurate results. Such accuracy reduction is a consequence of errors when calculating matrices for nodes that result of reconvergent fanouts. Several heuristics have been developed to improve SPR methods in order to increase accuracy, while limiting complexity. These methods offer the designer a possibility to choose reliability-estimation accuracy and time in a trade-off (Franco, 2008; and Pagliarini, 2013).

• *PBR* – Probabilistic Binomial Model for reliability estimation is an approach relying on the probability of multiple faults. While being an analytical method, its implementation is more easily obtained by simulation-based fault injection. PBR solves the drawback of large number of scenarios by choosing a few specific ones used to generate the analytic model. Thus, PBM achieves reliability estimation accuracy a reasonable computational effort (Ban, 2012).

Recently, a new method based on the principle of fault sourcing and fault propagation has been developed. The objective of this method, named SNaP, is to model each gate potential for generating and propagating a fault. Then, the whole circuit is simulated for a number of input scenarios. Afterwards, the number of faults carried on to outputs are estimated. Finally, the number of faults is converted to circuit reliability using a reference gate. The most advantage on SNaP is modeling sequential circuits, which are not covered in previous methods. However, SNaP accuracy is reduced due to gate modelling accuracy and choice of input scenarios. Few heuristics have been proposed to achieve an accuracy similar to SPR methods with a reduced computational effort (Pagliarini, 2013).

#### **Reliability simulation methods**

The underlying purpose of ageing simulation is to aid ICs designers to properly estimate degradations of performance parameters, so that design margins can be controlled. The earliest reliability simulation method is DC wear-out modeling by SPICE-like simulators, which has been presented at the beginning of 1990. Hu (1992) developed the Berkeley reliability tools (BERT) and Xuan (2003) implemented an ASIC reliability evaluation tool (ARET). Hsu (1992) proposed a systematic reliability approach named RELY. These tools are easy to approach since there are only two SPICE operation points to simulate. However, they cannot be used in reliability analysis of large and complex circuits.

Ageing equivalent models at transistor level can be found in (Maricau, 2011). For HCI and NBTI effects, parameter degradation is respectively modeled as threshold voltage ( $V_{th}$ ), mobility ( $\mu$ ) and output conductance ( $g_{DS}$ ) degradation. SBD-induced gate-current increase can be modeled with gate-source and gate-drain resistor.

Cadence Virtuoso RelXpert (www.cadence.com) and Mentor Graphics Eldo (www.mentor.com) simulators are widely used with embedded ageing models. From a fresh netlist, they can generate an aged netlist which includes degraded transistor parameters. Ageing simulation is performed with this aged netlist. Finally, both fresh and aged circuit performance can be investigated by designers.

For process variations, currently in ICs design and simulation, Monte-Carlo (MC) method and corner analysis are major concerned to analyze global and local variations. Without a considerable simulation time, MC techniques are inherently accurate as they do not involve any approximation of simulation results. In practice, a large number of simulation iterations is required to achieve a reasonably precise estimation of ICs fabrication yield. MC simulation performs at a low level, demands excessive amounts of computer time, especially when combined with computationally intensive reliability. Advanced sampling techniques such as the stratified sampling, Latin Hypercube Sampling (LHS) and Quasi Monte Carlo (QMC) can be used. They can achieve a faster convergence rate comparing with MC-based timing analysis.

Unlike MC method, corner analysis is more efficient and widely used by IC designers to reserve design margins. In SPICE-like simulator, NMOS and PMOS transistors are defined with letter acronyms F, S and T (F: fast, S: slow and T: typical). For example, NMOS and PMOS transistors with lower oxide

thickness and threshold voltage are represented by an FF (fast NMOS, fast PMOS) corner. However, this method sometimes under- or overestimates circuit performance.

#### Statistical analysis

Although MC analysis can achieve intrinsic accuracy with repeatedly simulations, long simulation time is unavoidable. On the other hand, the corner-based analysis with one standard case (e.g. TT) and four extreme ones (e.g. FF, FS, SF and SS) can evaluate circuit variability at the risk of over-estimation. Thus, in order to achieve efficiency-accuracy trade-off, statistical methods are applied into variability and yield analysis.

Applying to ICs reliability analysis, the purpose of these statistics methods is to help designers to characterize the impact of physical parameters on the performance parameters. Commonly used statistics methods are correlation analysis, regression analysis, design of experiments (DoEs) and response surface modeling (RSM). A co-evaluation flow for both ageing effect and process variations is proposed in Figure 2.

BSIM4 physical parameters suffer variations that are statistically modeled by Gaussian, log-normal or uniform distribution. There are more than 800 process parameters exist in BSIM4 model (see www.device.eecs.berkeley.edu/bsim/bsim4.htm.). In order to minimize the number of process parameters, correlation and regression method (execute by Plackett-Burman design) are discussed here.



Figure 2. Statistical design flow aware process variation

Correlation refers to the linear dependence between two variables (or two sets of data). In statistical BSIM4 parameter modeling, correlation analysis is used to filter out correlated BSIM4 parameters. The

correlation coefficient  $\rho_{B_i,B_j}$  between two BSIM4 parameters  $B_i$  and  $B_j$  with expected values  $\mu_{B_i}$ ,  $\mu_{B_j}$  and standard deviations  $\sigma_{B_i}$  and  $\sigma_{B_j}$  is defined as:

$$\rho_{B_i,B_j} = \frac{cov(B_i, B_j)}{\sigma_{B_i} \cdot \sigma_{B_j}} = \frac{E\left[\left(B_i - \mu_{B_i}\right) \cdot \left(B_j - \mu_{B_j}\right)\right]}{\sigma_{B_i} \cdot \sigma_{B_j}} \tag{9}$$

where *E* is the expected value operator, and *cov* means covariance. If  $\rho_{B_i,B_j}$  is +1, the selected two BSIM4 parameters show a positive linear dependence to each other; if it is -1, they present a negative linear dependence. One of these two BSIM4 parameters can now be moved out from model. After correlation analysis, a design of experiments should be selected to verify this new model. To simplify this model, stepwise regression analysis is applied, analysis results such as *p* value from hypothesis testing are evaluated to determine the significant sequence of left BSIM4 parameters.

In detail, stepwise regression is to build regression models between BSIM4 process parameters (input factors) and performance parameters (objectives) which can make hypothesis testing for models and every independent input factors. A two-level Plackett-Burman (PB) design can be selected to execute stepwise regression (NIST/SEMATEC, 2012). Hadamard matrix can generate an orthogonal matrix for input parameter (Cai, 2013). Only 12 runs are needed with PB designs, whereas a full factorial design would require  $2^{11} = 128$  runs.

We can make a testing hypothesis for this meta-model for every independent input factors. The p-value, partial F values and  $R^2$  value can be used to set a certain threshold to test the hypothesis of regression model (NIST/SEMATEC, 2012). At each step, the p value of an F statistic is computed to test models with or without a potential BSIM4 parameter. If a parameter is not currently in the model, the null hypothesis is that the parameter would have a zero coefficient if added to the model. If there is sufficient evidence to reject the null hypothesis, this parameter is enrolled. Conversely, the parameter is removed from the model. Finally, the most optimum regression equation is built with the most significant BSIM4 parameters.

Furthermore, DoEs are performed with circuit simulator. DoEs is an information-gathering procedure which can be applied to parameter screening and result prediction. It can provide several solutions to generate experimental designs for various situations. Commonly used design are the two-level full factorial (contains all combination of different levels of the factors) and the two-level fractional factorial design (with a fraction such as 1/2, 1/4 of the full factorial case, or other type; e.g. PB designs, central composite designs (CCD)).

Finally, the simulated performance parameter will be used to generate RSMs according to BSIM4 parameters. The accuracy of RSMs can be evaluated by Root-mean-square error (*RMSE*). It is a measurement for differences between model predicted value ( $Y_{predicted,n}$ ) and simulated value ( $Y_{simulated,n}$ ), where:

$$RMSE = \sqrt{\frac{\sum_{n=1}^{N} (Y_{predicted,n} - Y_{simulated,n})^2}{N}}$$
(10)

#### Failure analysis at system level

Singh and Koren (2003) proposed an efficient fault-sensitivity methodology to analysis transient faults. For complex circuits and systems, sensitivity analysis can also help designers to evaluate the sensitivity of all architecture characteristics for all building block characteristics. (Ferreira, 2011) used this method to analysis RF front-end. (Cai, 2013) applied this method to reliability analysis of continuous-time sigma-delta modulator.

Failure analysis assumes that there are N different building blocks in a system: block 1, block 2, (...) block N. When unreliability effects happen, degradation occurred in block X may have high influence to whole system performance. Thus, block X is the most sensitive block and reliability-aware design of block X is essential. On the other hand, it is necessary to perform test case for all building blocks to find a weak spot. Failure analysis denotes 'No' for ideal case, 'Yes' for failure case (Cai, 2013; Ferreira, 2011). The block failure margin is always determined by the specification of each block. Sensitivity analysis and failure estimation can be performed with circuit simulators. The evaluation of weak spots in a circuit or a system is an important step to acquire AMS/RF system reliability information. Designers can improve reliability and choose selective overdesign.

#### PERFORMANCE OPTIMIZATION IN DESIGN FOR RELIABILITY

#### **Ageing Resilience**

In a circuit performance optimization, designers look for an optimum in the feasible design space. Such design space is often defined by the supply voltage  $(V_{DD})$  imposed by the chosen IC technology, thus the maximum allowed bias for  $V_{GS}$  and  $V_{DS}$  is being equal to  $V_{DD}$ . In order to include reliability constraints into design optimization, one of the most common solutions is respecting the design rules manual (DRM).

Although a resilient device could be designed by conservative choices, high performance devices are often ageing sensitive. Thus, circuit resilience could be achieved only by a comprehension of physical phenomena mechanisms. Ageing phenomena have a specific stressful environment to be avoided. Thus, the design space could be redefined in order to include reliability into design optimization. For this new design space where circuit characteristics (e.g.  $I_{DS}$  and gm for transistors) respect a maximum allowed variation (e.g. 2 %), we call Reliable Design Space (see Figure 3). Another approach is controlling the total stress time which should be lower than a maximum a time-to-failure (*ttf*). Detailed information about physical phenomena mechanisms and *ttf* modeling were presented by Cai (2013) and Maricau (2011).

BTI unreliability phenomenon is design specific and depends on: used bias, operation temperature, and the ratio between stress- and recovery-time. Certain circuit's topologies are automatically BTI-resilient (e.g. linear analog circuits). However, specific classes of circuit has an unavoidable BTI-stress due to its large signal characteristics. Designing comparators, output buffers and power amplifiers is a tricky task as BTI-resilience means performance reduction. In digital design, all transistors are subjected to BTI-stress when the logic level establishes, but it would not pose a concern to typical designs. Sensitive designs (e.g. clock-tree) instead have to be carefully analyzed to not be influenced by BTI degradation. BTI-resilient designs present special attention to power-down/standby-mode in order to stop the circuit in a non-stress condition, and so increasing the recovery-time. This means choosing a transistor bias in the reliable design space to obtain a minimal stress when performance is not required (see Figure 3).

In order to avoid a HCI that could induce a drift in transistor characteristics, the total time that each device is operated under worst-case bias condition should not exceed the device *ttf*. Such worst-case bias condition and device *ttf* are specified in the technology DRM. Due to HCI unreliability phenomenon, the circuit lifetime is defined as the total stress time that exceeds the specified *ttf*. An HCI-resilient transistor is a transistor that operates under such worst-case bias stress for its whole lifetime (e.g. a transistor biased in a reliable design space as illustrated in Figure 3). Recent works have pointed-out weak- and moderate-inversion regions as good candidates aiming a negligible HCI (Ferreira, 2011). The simulation results presented in Figure 3 illustrate such a bias condition where we found a reliable design space for a HCI-stressed NMOS ( $W = 1 \ \mu m$  and  $L = 60 \ nm$ , CMOS 65nm) characteristics of  $I_{DS}$  degradation (Figure 3a) and *gm* degradation (Figure 3b).

TDDB-resilient designs follow similar design guidelines as HCI-resilient designs in order to control the stressful gate-electric field. This design rules advises to do not draw a transistor layout with gate's area smaller than an area specified in the DRM and to do not exceed a gate bias able to generate an oxide breakdown. Most of circuits are HBD-resilient, because the gate bias able to hard breakdown is as bigger than the nominal  $V_{DD}$ . Nevertheless, SDB-resilient circuits are a difficult to design because the transient gate voltage should not be bigger than the DRM-specified SDB stress. Transmission gates in high voltage switching (e.g. in DC/DC converters) are a good example of SDB-sensitive circuits because it is very difficult to avoid voltage spikes while opening or closing a transmission gate. In such case, the total stress time should not exceed the DRM-specified *ttf*.

EM resilience is achieved by a careful layout. After circuit design, most of bias currents are known and EM-resilient layout is a layout where current densities do not overwhelm the DRM specifications. The common payoff is an increasing area in sensitive metal connections as  $V_{DD}/GND$  power lines and the number of VIAS drawn among metal layers. EM-stress would not pose a concern to typical analog and digital designs since transistor drain-current are not big enough to stress metal connections. However, RF circuits often have EM-sensitive devices as: inductors, power combiners, couplers, and integrated antennas. For such devices, circuit design has to include EM considerations while currents and sizes are chosen imposing a trade-off for performance and reliability.



Figure 3. HCI-stressed NMOS ( $W = 1 \mu m$  and L = 60 nm, CMOS 65nm) simulation results for: (a)  $I_{DS}$  degradation and (b) gm degradation (Ferreira, 2011).

#### Computer-aided Reliability

Graeb et al. (2001) developed the sizing rules method. They proposed 30 sizing rules for geometric or electrical constraints concerning function or robustness for important CMOS building blocks, which consider circuit function, robustness, geometries and electrical properties. Pan (2012) added reliability as a new design metric into sizing rules method. Based on the analysis and optimization of the fresh-to-degraded worst-case distance for each circuit performance, computer-aided reliability optimization has been proposed for AMS/RF circuits. Both process variations and ageing effects (HCI, NBTI) are evaluated. Fresh and degraded sizing rules as well as the maximum area constraints are applied to yield analysis. The trade-off between approximated layout area and circuit reliability can be reported to ICs designers.

The optimization of one performance parameter is often at the cost of others. Comparing to single objective optimization, there is a set of optimal solutions in Multi-objective Optimization (MOO). Every solution in this set is defined as Pareto optimal or non-dominated (Deb, 2002). Worst-case Pareto fronts represent the worst performance that can be simulated from a given circuit topology across the entire performance space.

For complex AMS/RF circuits and systems, directly performing ageing analysis or MC simulation (for process variation) on flat transistor level is not suitable according to computation efficiency. Figure 4 illustrates a hierarchical reliability simulation method. Different hierarchical levels are linked one-by-one with relevant reliability simulation tools and methods.

Figure 4 demonstrates this hierarchical reliability analysis flow with modulators. Reliability effects (e.g., ageing mechanisms and process variations) induced degradations and fluctuations at physical level can propagate up to product level (via transistor level, circuits' level (OTA) and system level (modulator)). System performance under process variations and ageing phenomena can be efficiently evaluated.



#### Device & product level: Reliability evaluation

Figure 4. A bottom-up approach for hierarchical reliability analysis.

# **RELIABILITY-AWARE AMS/RF DESIGN EXAMPLES**

# A 400 Hz 1-bit Second-order LP-CT Sigma Delta Modulator

A 400 Hz 1-bit second-order low-pass (LP) continuous-time (CT) Sigma-Delta ( $\Sigma\Delta$ ) modulator is behavioral designed and implemented at transistor level (Cai, 2013). The application is to detect cardiac signal in medical cardiac pacemaker. Reliability issues are critical in this kind of devices.

The proposed 2nd-order LP CT  $\Sigma\Delta$  modulator consists of a 2nd order loop filter, a one-bit quantizer with the return-to-zero (RZ) feedback DAC, operating at 32 kHz with an OSR of 40. Active RC integrators with operational-amplifier (op-amp) are used in the modulator because of its simplicity and high linearity. The loop filter is in CIFB architecture (Schreier, 2004). A maximum NTF gain equal to 12 dB can be achieved. Modulator performance is predicted by  $\Sigma\Delta$  Toolbox (Schreier, 2004). For 2nd-order LP CT modulator, the ideal peak SQNR is 56.7 dB, which appears when the input signal amplitude around -6 dBFS. Taking into account the nonideality of the resistors in the integrator, the peak SQNR is a 1.3 dB lower than the ideal, which is 55.4 dB. According to the noise budget calculation, integrators contribute

some noise to the total in-band noise. This is caused by the nonidealities of the op-amps (e.g. the finite DC gain and GBW, see Figure 5).

The implementation of building blocks at circuit level include RC integrator, quantizer, DAC and clock distributor circuit. Active-RC integrator is chosen for the loop filter. A folded-cascade single stage operational amplifier is applied because of the large output swing and easy input common-mode level. The one-bit quantizer is preceded by a dynamic comparator. DAC is designed using switches which are formed by the transmission gates. Figure 6 presents the output spectrum of designed modulator.

Block failure test and transistor level reliability simulation are applied to designed modulator. CT modulator can be divided into loop filter and feedback loop. Feedback loop includes dynamic comparator (comprise quantizer and latch) and DAC. For return-to-zero feedback system, a clock phase generator is needed. According to the minimum requirement for each component, the failure condition of integrators (op-amps) is:

$$A_{DC-loss} = 3 dB, (11)$$
$$GBW_{loss} = 32 kHz (1 \cdot f_s); (12)$$

Since during the modeling phase, the op-amps are implemented with minimum design ( $A_{DC} = 40$  (linear) and  $GBW = 96 \ kHz$ ). Lower  $A_{DC-loss}$  and GBW will make obvious degradation to modulator performance. The failure condition in feedback loop is defined according to excess loop delay:

$$\tau d_{quan} = 7.812 \,\mu s \,(25\% \, T_{clk}), (13)$$
  
 $\tau d_{ADC} = 9.5 \,\mu s \,(30\% \, T_{clk}). (14)$ 

In RZ DAC,  $\alpha$  is chosen as 0.25, which is 7.8125 us clock signal delay for comparator. With the block failure case setup above, the specification of both minimum design ( $A_{DC} = 40$  (linear) and  $GBW = 96 \ kHz$ ), and over design ( $A_{DC} = 40$  (linear) and  $GBW = 128 \ kHz$ ) for the loop filter is simulated. Detailed simulation results can be found in (Cai, 2013).

We conclude that in a low power, low order CT  $\Sigma\Delta$  modulator, the feedback loop is less reliable than the analog loop filter. Moreover, DAC is the most sensitive building block. On the other hand, circuit overdesign can enhance system reliability with power consumption trade-offs. For reliability-aware circuit design, we can make improvements in earlier design stages with ageing prediction. Comparing with HCI mechanism, NBTI is the dominant effect in the designed CT modulator.

#### A 125 kHz 3-bit Third-order LP-CT Sigma Delta Modulator

The second Sigma-delta ( $\Sigma\Delta$ ) modulator example is a 3-bit third-order LP-CT  $\Sigma\Delta$  modulator. The signal bandwidth is 125 kHz, while the clock frequency is 4 MHz. This modulator achieves an ideal 82.5 dB SQNR when the input signal amplitude around -3 dBFS.

CT modulator is less robust against clock jitter and loop delay compare to their discrete-time (DT) counterparts (Cherry, 1999). Clock distributor is an essential block in CT  $\Sigma\Delta$  modulator. All clocked components (e.g. switches, quantizer and DAC) are sensitive to clock uncertainty. Based on the designed ideal 3 bit 3rd order CT modulator, ageing mechanisms HCI and NBTI are studied with the proposed bottom-up hierarchical reliability analysis approach. To isolate as much as possible the clock uncertainties impact, all of the other blocks of the modulator are ideal (implemented with Verilog-AMS and macro-models).

Cai (2013) reported a non-overlapping clock distributor designed with 65 nm CMOS technology. Both HCI and NBTI can cause clock uncertainty. Quantitatively, HCI induced clock skew is less than 1/100 of NBTI. Thus, only NBTI mechanism is primarily concerned. NBTI can induce clock skew and jitter. In detail, clock distributor with standard- and low- $V_{th}$  model show clock uncertainties as a periodic clock skew under NBTI. We also notice that high- $V_{th}$  designed clock distributor suffer from jitter problem.



Figure 5. Output spectrums with  $A_{DC}$  and GBW variance. (a) SQNR is 55.0 dB, 55.0 dB, 54.5 dB, 51.3 dB when  $A_{DC}$  is 100, 80, 40, and 20 (linear). (b) SQNR is 54.9 dB, 54.8 dB when GBW is 128 kHz and 96 kHz.



Figure 6. Output spectrum of modulator, simulation results based on CMOS 65 nm technology.

Behavioral clock signals are setup to non-ideal according to previous NBTI induced jitter (high- $V_{th}$  transistor) and skew (both standard- and low- $V_{th}$  transistor). In order to analysis NBTI induced degradation of modulator, NBTI degradation is propagated: from defect level (NBTI mechanism), via transistor level ( $V_{th}$  degradation), through circuit level (NBTI induced clock jitter) and finally gathered at system level. As shown in Figure 7, the performance of CT  $\Sigma\Delta$  modulator can be significantly impacted by

NBTI induced jitter which happens in its clock distributor. Numerically from the hierarchical simulation, a 20 ps clock jitter in 4 MHz clock circuit can reduce SNR from 82.5 dB to only 32.4 dB.



Figure 7. Output spectrum under (a) skew and (b) jitter effects. 20ps jitter refer to simulated jitter level from clock distributor, which can cause 50.1 dB SNR degradation.

# A Direct-Conversion RF front-end for Wideband Applications

Multi-standard high performance wireless applications require the nanoscale IC technology. Also, this will require a reconfigurable architecture. Although the many imposed challenges, one of the most advantageous architectures for such application is the direct conversion architecture (Rosa, 2009). Moreover, such architecture can also be employed to the digital low-IF architecture if required. Figure 8 illustrates the chosen RF front-end. Such RF architecture has three main blocks: the BLIXER (Ferreira, 2011), aggregating a balun, a wide-band low noise amplifier, and an I-Q mixer; the digital controlled oscillator (DCO) (Ferreira, 2011), and the programmable gain amplifier (PGA) together with the low-pass filter (Lee, 2010). The RF front-end (illustrated in Figure 8) has as specifications: operational frequency being from 1 GHz to 6 GHz, bandwidth (max) equal to 20 MHz, gain bigger than 30 dB, *NF* lesser than 3.5 dB @ 1 GHz and 6 dB @ 6 GHz, and *IP*3 bigger than 0 dBm. The common functions of the RF front-end were aggregated according to the available transistor-level design for reliability and system-level failure analysis (Ferreira, 2011).

For a top-down design for reliability, we have to concentrate efforts in a behavioral modeling to speed up the reliability analysis and synthesis. First, behavioral modeling is carried out to implement the RF front-end architecture (Figure 8). Most of RF building-block behavioral models are available for an implementation of the RF front-end architecture (see <a href="http://www.designers-guide.org/">http://www.designers-guide.org/</a>). Later, an analytical failure model for system reliability degradation is proposed. Such a model shall be simple enough to hand analysis and accurate enough to early estimate the building block's reliability trends.

Each building block has a typical set of characteristics which should be enough to represent the block behavior. Among frequency, bandwidth, power consumption, noise, linearity, and gain characteristics; we will focus on gain, noise, and linearity failure analysis. Frequency and bandwidth are often controlled by a reliable reference. Moreover, the circuit degradation does not imply a worse power consumption, as its reduction is desirable. As Ferreira (2011) has presented, the power consumption decrease can be used as a measure of the degradation in the architectural level. Thus, the equation of the power consumption is also important for the circuit failure analysis and will also be analyzed.



*Figure 8. RF front-end architecture for multi-standard wireless applications: illustration (adapted from Rosa, 2009).* 

• *Power Consumption* – the total power consumption is a sum of all building-block power consumptions, as

$$P = P_{BLIXER} + P_{DCO} + P_{PGA}. (15)$$

Ferreira (2011) has presented that exists a relationship between the total derivative of  $P_i$  ( $\Delta P_i$ ) of the  $i_{th}$  building block and all others  $i_{th}$  building-block characteristic variations. Thus, the power consumption degradation should be used as an estimator of all other characteristics degradations, tracing in system-level some transistor-level constraints (e.g. stress bias generating ageing).

• *Gain* – the architecture gain is

$$G = G_{BLIXER} \cdot A_{LO} \cdot G_{PGA}, (16)$$

where  $G_{BLIXER}$  and  $G_{PGA}$  are the gain of the BLIXER and of the PGA, and

$$A_{LO} = \frac{V_{LO}}{V'_{LO}}.$$
 (17)

For amplifiers (BLIXER and PGA), the gain degradation ( $G_{BLIXER}$  and/or  $G_{PGA}$  reduction) is often observed by a  $P_i$  reduction, because degraded transistors has smaller drain current and transconductance. In other hand, DCO degradation is observed by a different  $V_{LO}$  amplitude in comparison to  $V'_{LO}$ , the expected DCO amplitude (not degraded). Such degradation on DCO also impacts mixer function implemented in BLIXER circuit.

• Noise – the architecture noise is measured by the  $NF = 10 \cdot \log F$ , and the total noise factor (F) (Qizheng, 2006) is

$$F_{tot}(f) = F_{BLIXER} + \frac{F_{PGA} - 1}{G_{BLIXER}^2 \cdot A_{LO}^2} + \frac{N_{LO}(f)}{N_s}, (18)$$

where  $N_s = 4kTR_s$  is the noise of the input source and

$$N_{LO}(f) = \frac{R_s \cdot 10^{P_s/_{10} - 3}}{\left(\frac{S_{11_{BLIXER}} + 1}{2}\right)^2} \cdot 10^{L(f)/_{10}} \cdot A_{LO}, (19)$$

Following the same premise, noise characteristics will be worse for the BLIXER and the PGA because signal-level is reduced from gain degradation. Nevertheless, DCO degradation may not affect the oscillator phase noise (L(f)) (Ferreira, 2011).

• *Linearity* – the architecture linearity is measured by the input referred interception point between the first and the third harmonic (*IP*3, in mili-Watts in the following equations) (Qizheng, 2006), so it is

$$IP3 = \frac{1}{\frac{1}{IP3_{BLIXER} + \frac{(1 - S_{11_{BLIXER}}) \cdot G_{BLIXER}^2 \cdot A_{LO}^2 \cdot R_s}{4 \cdot IP3_{PGA} \cdot R_{in_{PGA}}}}.$$
(20)

For amplifiers (BLIXER and PGA), the linearity variation is often an increase of such characteristics, because degraded transistors has smaller drain current and more linear transconductance (Ferreira, 2011).

A classic design aims the set of building blocks characteristics, while such building blocks characteristics may not be reliable. However, overdesigned margins have to be taken to assure the architecture reliability in such a design. So the RF front-end architecture may not fail after stress, but it will consume more power and die area, as a trade-off to achieve the required lifetime.

The RF front-end architecture failure can be estimated using the degradation of the building block characteristics (Ferreira, 2011). A reliable-architecture design should optimize the building-blocks characteristics taking into consideration stress environment condition and unreliability phenomena. By a failure evaluation (Ferreira, 2011), we could redefine the optimal characteristics goals taking the degradation into account, as:

$$\Phi_{j_{degraded}} = \Phi_{j_{typical}} \pm \Delta \Phi_{j}, (21)$$

where  $\Phi_j$  is a *j* characteristic of the RF front-end (e.g. gain, noise or linearity). Finally, the building-blocks characteristic variation  $(\Delta \Phi_j)$  is optimized for all  $\Phi_j$ . The model equations (15), (16), (18), (20) are able to early estimate the architecture characteristics. Using gradient formulation, equation (21) can be written for each characteristic variation. This formulation is suitable if the involved variables are deterministic. However in nanometer technologies, such characteristics have to be considered statistic variables. Thus, a better estimator should be the characteristics' variance, and we can assume that  $\Delta \Phi_j = k \cdot \overline{\sigma}_{\Phi_j}$  for  $k \in \{1, 2, 3\}$  according to the required yield.

By defining  $\Phi_{j_{degraded}}$  at a system lifetime, a reliability criterion is therefore included in top-down design approach. Thus, the design specification for typical characteristic  $(\Phi_{j_{typical}})$  is modified by  $\Delta \Phi_j$ . A building-block characteristic may be found as also its variation due to degradation. By sharing the system-level acceptable degradation among building blocks, a design for reliability is found and such design margins is optimized in some design trade-off. Here we present two design for reliability strategies, the first we optimize the noise and the second we optimize power consumption.

Finally, a classic design and both reliable designs were drawn to meet the RF architecture specifications at the required lifetime. Without the insight proposed by equation (21), a classic design shall impose a design margin of 4 dB in gain (solid line in Figure 9) to have acceptable noise characteristics (solid line in Figure 10). Thus, such design will imply in a bigger power consumption for whole circuit lifetime.

Nevertheless, an optimized margin could be considered (about 1 dB) if reliability criterion is included in early design stages (dashed and dashed-dotted lines in Figure 9). The low noise strategy (strategy 1 presented by a dashed line in Figure 10) results in an even lower noise than a classic design, but a bigger gain than the low power strategy. The low power strategy (strategy 2 presented by a dashed-dotted line in Figure 10) presents a bigger noise than a classic design, but not so big to be considered out of specification. Moreover, the low power strategy results in an even lower design margin for gain strategy (see strategy 2, dashed-dotted line in Figure 9) which results in the smallest power consumption for whole lifetime.



Figure 9. RF architecture gain performance before reliability degradation. Behavioral simulations for: classic design in solid line, low noise design for reliability (strategy 1) in dashed line, and low power design for reliability (strategy 2) in dashed-dotted line.

#### FUTURE RESEARCH DIRECTIONS

Still now, most of design for reliability concentrates in reliability analysis with a physical point of view. They are very advantageous for bottom-up designs, because they give insight in transistor degradation phenomena. The major drawback is the complex formulations proposed in the state-of-the-art which are not adapted for a reliability-aware AMS/RF computer-aided design optimization. Such reliability analysis and modelling tools are very accurate but also very exigent in computational time and memory. Thus, they are not suitable for an automate design for reliability and present serious limitations in system-level design.

This chapter describes digital and analog reliability analysis tools apart. Actually, there is no integration of such methods at the state-of-the-art. This means that we cannot provide the digital output signal probability reliability of a designed analog-to-digital converter neither an analog output signal degradation of a digital-to-analog converter. The lack of reliability tools and methods integration turns impracticable to deal with reliability constraints for such system designs.

Regarding the trends illustrated in Figure 1, we might expect that nanoscale IC technologies have transistor lifetime in order of magnitude of few years. Such a low lifetime could not be neglected anymore and would affect general customers. Reliability requirements will not concerns only spatial, avionics, energy, transport and bio-medical applications; reliability will affect telecommunications, entertainment and low-cost electronic applications. The new generation of IC should present a performance according to



Figure 10. RF architecture noise performance before reliability degradation. Behavioral simulations for: classic design in solid line, low noise design for reliability (strategy 1) in dashed line, and low power design for reliability (strategy 2) in dashed-dotted line.

the specifications even in increasing failure rate. Thus, the circuit lifetime could be pushed through IC wareout. Such subject is an interesting topic for future research directions.

#### CONCLUSIONS

ICs reliability has become an important issue in a continuously scaling down of CMOS technology. Lifetime requirements cannot be neglected anymore towards nanoscale IC era since transistor lifetime achieves the order of magnitude of few years. In order to include reliability as a design criterion, this chapter proposed a reliability-aware AMS/RF performance optimization.

For this, we described some unreliability phenomena and discussed reliability degradation for transient faults as well as for spatial, environmental and ageing phenomena. Afterwards, we review the state-of-the-art of reliability analysis and modelling tools using a design optimization point of view. Moreover, we highlighted synthesis methodologies to achieve a reliability specification.

A  $\Sigma\Delta$  modulator implemented with 65 nm CMOS technology was illustrated in this chapter. Considering low power and low order CT  $\Sigma\Delta$  modulator, we demonstrated that the feedback loop is less reliable than analog loop filter. Also, the DAC is the most sensitive building block in the system architecture. The performance of CT  $\Sigma\Delta$  modulator can be significantly impacted by ageing induced jitter which happens in its clock distributor. Numerically from the hierarchical simulation, a 20 ps clock jitter in 4 MHz clock circuit was shown as the cause of SNR reducing from 82.5 dB to only 32.4 dB. Even if circuit overdesign could enhance system reliability, it will engender power and area consumption.

A system-level direct-conversion RF front-end design was described. Without taking into account the mentioned reliability phenomena, a classic design optimization would impose an overdesigned margin of 4 dB in gain to have acceptable noise characteristics. But applying the proposed design for reliability strategies, the RF front-end architecture failure can be early estimated. Such a reliable-architecture design should optimize the building-blocks characteristics taking into consideration stress environment condition and unreliability phenomena.

### REFERENCES

Andrejevic, M., & Litovski, V. (2006). Fault Diagnosis in Digital Part of Sigma-Delta Converter. In *Proc. IEEE Seminaron Neural NetworkApplications in Electrical Engineering, NEUREL* (Vol. 00, pp. 177–180).

Ban, T. (2012). Méthodes et Architectures basées sur la Redondance Modulaire pour Circuits Combinatoires Tolérants aux Fautes. Télécom ParisTech.

Bernstein, J. B., Gurfinke, M., Li, X., Walters, J., Shapira, Y., & Talmor, M. (2006). Electronic circuit reliability modeling. *Microelectron. Reliab.*, *46*(12), 1957–1979. doi:10.1016/j.microrel.2005.12.004

Birolini, A. (1994). *Quality and Reliability of Technical Systems, Theory - Practice - Management* (1st ed.). Springer-Verlag.

Cai, H. (2013). Fiabilisation de Convertisseurs Analogique-Numérique à Modulation Sigma-Delta. Télécom ParisTech.

Cherry, J. A., Member, S., & Snelgrove, W. M. (1999). Excess Loop Delay in Continuous-Time Delta – Sigma Modulators. *IEEE Trans. Circuits and Systems II:Express Briefs*, 46(4), 376–389.

Cherry, J. A., & Snelgrove, W. M. (1999). Clock Jitter and Quantizer Metastability in Continuous-Time Delta – Sigma Modulators. *IEEE Trans. Circuits and Systems II:Express Briefs*, 46(6), 661–676.

Deb, K., Member, A., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A Fast and Elitist Multiobjective Genetic Algorithm : *IEEE Trans. Evol. Comput.*, 6(2), 182–197.

Forzan, C., & Pandini, D. (2009). Statistical static timing analysis: A survey. *Integration, the VLSI Journal*, 42(3), 409–435. doi:10.1016/j.vlsi.2008.10.002

Franco, D. T. (2008). Fiabilité du Signal des Circuits Logiques Combinatoires sous Fautes Simultanées Multiples. Télécom ParisTech.

Graebl, H. (2001). The Sizing Rules Method for Analog Integrated Circuit Design. In *Proc. IEEE/ACM Int. Conf. Comp. Aided Design* (pp. 343–349).

Gu, Q. (2006). *RF System Design of Transceivers for Wireless Communications. Springer*. Secaucus, NJ, USA: Springer-Verlag New York, Inc.

Hsu, W., & Hwang, C. (1992). Advanced Integrated-Circuit Reliability Simulation Including Dynamic Stress Effects. *IEEE J. Solid-State Circuits*, 21(3), 247–257.

Hu, C. (1992). IC Reliability Simulation. IEEE J. Solid-State Circuits, 27(3), 241–246.

Junior, G. G. D. S. (2012). Conception Robuste de Circuits Numériques à Technologie Nanométrique. Télécom ParisTech.

Lee, K. (2010). Multi-Mode, Multi-Band Active-RC Filter and Tuning Circuits for SDR Applications. In *Advances in Solid State Circuits Technologies* (pp. 95–106). INTECH.

Maricau, E., & Gielen, G. (2011). Computer-Aided Analog Circuit Design for Reliability in Nanometer CMOS. *IEEE Trans. Emerg Select Topics Circ. Syst.*, 1(1), 50–58.

Maris Ferreira, P. (2011). *Méthodologie de conception AMS/RF pour la fiabilité : conception d'un frontal RF fiabilisé*. Télécom ParisTech.

More, S. (2012). Aging Degradation and Countermeasures in Deep-submicrometer Analog and Mixed Signal Integrated Circuits. Univ. of Munich.

Mutlu, A. A., Rahman, M., & Member, S. (2005). Statistical Methods for the Estimation of Process Variation Effects on Circuit Operation. *IEEE Trans. Electron. Packag. Manuf.*, 28(4), 364–375.

NIST/SEMATEC. (2012). *e-Handbook of Statistical Methods*. Retrieved from www.itl.nist.gov/div898/handbook/

Pagliarini, S. N. (2013). Méthodes d'Analyse et Techniques d'Amélioration de Fiabilité pour les Circuits Numériques. Télécom ParisTech.

Pan, X., & Graeb, H. (2012). Microelectronics Reliability Reliability optimization of analog integrated circuits considering the trade-off between lifetime and area. *Microelectron. Reliab.*, *52*(8), 1559–1564. doi:10.1016/j.microrel.2011.09.026

Quemerais, T., Moquillon, L., Huard, V., Fournier, J.-M., Benech, P., & Corrao, N. (2010). DC hot carrier stress effect on CMOS 65nm 60 GHz power amplifiers. *Proc. of IEEE Radio Frequency Integrated Circuits Symp.*, *31*(9), 351–354. doi:10.1109/RFIC.2010.5477310

Rosa, J. M. D. La, Castro-López, R., Morgado, A., Becerra-Alvarez, E. C., Río, R. Del, Fernández, F. V., & Pérez-Verdú, B. (2009). Adaptive CMOS analog circuits for 4G mobile terminals - Review and state-of-the-art survey. *Microelectron. J.*, 40(1), 156–176. doi:10.1016/j.mejo.2008.07.001

Ruberto, M., Degani, O., Wail, S., Tendler, A., Fridman, A., & Goltman, G. (2008). A reliability - aware RF power amplifier design for CMOS radio chip integration. In *Proc. IEEE Int. Reliability Physics Symposium* (pp. 536–540). Phoenix.

Schreier, R., & Temes, G. C. (2004). *Understanding Delta-Sigma Data Converters* (p. 464). Wiley-IEEE Press.

Singh, M., & Koren, I. (2003). Fault-Sensitivity Analysis and Reliability Enhancement of Analog-to-Digital Converters. *IEEE Trans. Very Large Scale Integration Systems*, *11*(5), 839–852.

Stathis, J. H. (1999). Percolation models for gate oxide breakdown. *Journal of Applied Physics*, 86(10), 5757. doi:10.1063/1.371590

Tang, H. (2007). Hierarchical Statistical Analysis of Performance Variation for Continuous-time Delta-Sigma Modulators. In *IFIP Int. Conf. on Very Large Scale Integration, VLSI-SoC* (pp. 37–41).

White, M., & Chen, Y. (2008). Scaled CMOS Technology Reliability Users Guide. *JPL Publication 08-14 3/08*.

Xuan, X., Chatterjee, A., Singh, A. D., Kim, N. P., & Chisa, M. T. (2003). IC Reliability Simulator ARET and Its Application in Design-for-Reliability. In *Proc. IEEE Asian Test Symposium, AST* (pp. 18 – 21).

Yan, B., Member, S., Qin, J., Dai, J., Fan, Q., Bernstein, J. B., & Member, S. (2009). Reliability Simulation and Circuit-Failure Analysis in Analog and Mixed-Signal Applications. *IEEE Trans. Device Mat. Reliab.*, *9*(3), 339–347.

Yu, G., & Li, P. (2006). Lookup Table Based Simulation and Statistical Modeling of Sigma-Delta ADCs. In *Proc. ACM IEEE Design Automation Conference*.

# ADDITIONAL READING SECTION

Bernstein, J. B., Gurfinke, M., Li, X., Walters, J., Shapira, Y., & Talmor, M. (2006). Electronic circuit reliability modeling. *Microelectron. Reliab.*, 46(12), 1957–1979. doi:10.1016/j.microrel.2005.12.004

Black, J. R. (1969). Electromigration-A Brief Survey and Some Recent Results. *IEEE Trans. Electron Devices*, *16*(4), 338–347.

Dokouzgiannis, S. P., & Kontoleon, J. M. (1988). of Combinational Logic Circuits. *IEEE Trans Reliab.*, 37(5), 493–500.

Hara, H., Okamoto, Y., & Ohnuma, H. (1970). A New Instability in MOS Transistor Caused by Hot Electron and Hole Injection from Drain Avalanche Plasma into Gate Oxide. *Jpn. J. Appl. Phys.*, *9*(9), 1103–1112.

Ko, J., & Cm, S. (1977). Negative bias stress of MOS devices at high electric elds and degradation of MNOS devices. *J. Appl. Phys.*, 48(5), 2004–14.

Tu, R. H., Rosenbaum, E., Chan, W. Y., Li, C. C., Minami, E., Quader, K., ... Hu, C. (1992). *Berkeley Reliability Tools-BERT. Electrical Engineering* (Vol. 1991, pp. 1524–1534).

Weir, B. E., Silverman, P. J., Monroe, D., Krisch, K. S., Alam, M. A., Alers, G. B., ... Hwang, D. (1997). Ultra-Thin Gate Dielectrics: They Break Down, But Do They Fail? In *Proc. IEEE Int. Electron Devices Meeting* (pp. 73–76).

# **KEY TERMS AND DEFINITIONS**

BTI: Bias Temperature Instability (being NBTI in p-type, and PBTI in n-type transistors) DoEs: Design of Experiments DRM: Design Rules Manual EM: Electronmigration FIT: Failure-in-Time HBD: Hard Breakdown HCI: Hot Carrier Injection **IC:** Integrated Circuit LER: Line Edge Roughness MC: Monte Carlo simulation PBR: Probabilistic Binomial Model PTM: Probabilistic Transfer Matrices RSMs: Response Surface Modeling  $\Sigma\Delta$  Modulator: Sigma-Delta Modulator in Analog-to-Digital Converter (ADC) **RDF:** Random Dopant Fluctuations SBD: Soft Breakdown SEE: Single Event Effect SETs: Single Event Transients SEUs: Single Event Upsets SPR: Signal Probability Reliability TDDB: Time Dependent Dielectric Breakdown WID: Within-Die Variations