# A High-Speed Inductive-Coupling Link With Burst Transmission

Noriyuki Miura, *Member, IEEE*, Yoshinori Kohama, Yasfumi Sugimori, Hiroki Ishikuro, *Member, IEEE*, Takayasu Sakurai, *Fellow, IEEE*, and Tadahiro Kuroda, *Fellow, IEEE* 

Abstract—A high-speed inductive-coupling link is presented. It communicates at a data rate of 11 Gb/s for a communication distance of 15  $\mu \rm m$  in 180 nm CMOS. The data rate is 11× higher than previous inductive-coupling links. The communication distance is 5× longer than a capacitive-coupling link for the same data rate, bit error rate, and layout area. Burst transmission utilizing the high-speed inductive-coupling link is also presented. Multi-bit data links are multiplexed into a single burst data link. It reduces layout area by a factor of three in 180 nm CMOS and a factor of nine in 90 nm CMOS.

Index Terms—Burst transmission, data link, high speed, inductive coupling, three dimensional.

#### I. INTRODUCTION

MOS technology is scaling with Moore's Law. The number of transistors integrated in a single chip has been doubling every two years, providing an exponential growth in computational performance by 70% per year. However, in recent years, the end of Moore's Law is widely discussed [1]. One of the biggest problems is cost increase for chip fabrication. As technology scales, mask cost and lithography tools cost increase exponentially, resulting in serious yield degradation.

Three-dimensional (3D) system integration is one of the key approaches to realize "More than Moore" that can provide continuous growth without relying on simple device scaling. It enables multiple chips to be stacked vertically in a package. Fabrication cost can be reduced since a highly-integrated and hence high-performance system can be realized even by using less-scaled device technology.

A technical challenge in 3D system integration is how to form interconnections between stacked chips. Wire bonding is one of the solutions. Although it is a reasonable solution to deliver power to the stacked chips, the performance for data communication is limited due to its long interconnection

Manuscript received June 09, 2008; revised November 14, 2008. Current version published February 25, 2009. This work was supported by CREST/JST. The 180 nm CMOS VLSI chip in this work was fabricated by Taiwan Semiconductor Manufacturing Corporation (TSMC). The 90 nm CMOS VLSI chip in this work was fabricated through the chip fabrication program of VDEC, the University of Tokyo, with the collaboration by STARC, Fujitsu Limited, Matsushita Electric Industrial Company Limited., NEC Electronics Corporation, Renesas Technology Corporation, and Toshiba Corporation.

N. Miura, Y. Kohama, Y. Sugimori, H. Ishikuro, and T. Kuroda are with the Department of Electrical Engineering, Keio University, Yokohama 223-8522, Japan (e-mail: miura@kuro.elec.keio.ac.jp).

T. Sakurai is with the Department of Electrical Engineering, University of Tokyo, Tokyo 153-8505, Japan.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2008.2012365



Fig. 1. Performance comparison between this work and previous reports.

length. In addition, since pads for the bonding can be placed only in the chip periphery, the number of possible pins is small, which restricts aggregate data bandwidth. Micro bumps (MB) [2], [3] and through-Si vias (TSV) [4], [5] are direct vertical interconnections between stacked chips. The communication performance can be significantly improved since the interconnection length can be reduced and the direct vertical interconnections can be distributed across the entire chip area to enhance parallelism of I/Os. However, both MB and TSV require additional wafer and mechanical processes for fabrication, resulting in considerable cost increase. Capacitive [6]–[15] and inductive-coupling links [16]-[20] are wireless circuit solutions where a pair of metal electrodes forms a capacitively coupled link and a pair of metal coils forms an inductively coupled link, providing wireless direct vertical interconnections between stacked chips. They have two main advantages over MB and TSV. First, the metal electrodes and the metal coils can be simply made by on-chip IC interconnections. No additional wafer or mechanical process is required and is hence low cost. Second, capacitive and inductive coupling are both non-contact AC coupling. It allows the removal of electro-static discharge (ESD) protection circuits, which enables high-speed and low-power operation with a small-area I/O cell.

The first literature on capacitive and inductive-coupling links appeared in 1995 [15] and 2004 [20] respectively. Since then, various circuit techniques [6]–[14], [16]–[19] have been presented for performance improvement. Fig. 1 plots data rates



Fig. 2. Channel model of (a) capacitive and (b) inductive-coupling link.

and communication distances of previously-presented capacitive and inductive-coupling links. At ISSCC 2007, an 11 Gb/s capacitive-coupling link in 180 nm CMOS was presented [6], however the communication distance was only 3  $\mu$ m. When the communication distance was extended to 10  $\mu$ m, the data rate was degraded to 1.5 Gb/s [11]. The capacitive-coupling link can be used only at distances shorter than 10  $\mu$ m for the following reason. Fig. 2(a) depicts an equivalent circuit model of the capacitive-coupling link. Received voltage of the capacitive-coupling link  $V_{R,CAP}$  is given by  $V_TC_C/(C_C + C_{SUB})$ where  $V_T$  is the signal voltage in the transmitter electrode,  $C_C$ is the capacitance between the electrodes, and  $C_{SUB}$  is the capacitance between the electrode and the substrate.  $V_{R,CAP}$  is reduced in long-distance communication, since  $C_C$  is reduced and  $V_T$  is limited under the supply voltage. Even if the electrodes are enlarged,  $V_{R,CAP}$  increases very little, because both  $C_C$  and  $C_{SUB}$  increase in a similar way. On the other hand, the received voltage of the inductive-coupling link  $V_{R,IND}$  is given by  $MdI_T/dt$  where M is mutual inductance between coils and  $I_T$  is current through the transmitter coil (Fig. 2(b)). By increasing the coil size, M can be increased and hence  $V_{R,IND}$ can be increased even if the communication distance is long. Therefore, the inductive-coupling link can be used for longerdistance communication. However, as shown in Fig. 1, the data rate of the inductive-coupling link so far has been limited to around 1 Gb/s.

This paper presents a high-speed inductive-coupling link whose data rate is higher than 10 Gb/s (Fig. 1). It delivers 11 Gb/s for a distance of 15  $\mu$ m in 180 nm CMOS. Compared with the capacitive-coupling link in the same device technology [6], the communication distance is extended by  $5\times$  for the same data rate, bit error rate (BER), and layout area. Burst transmission utilizing this high-speed inductive-coupling link is also presented. Multi-bit data links in parallel are multiplexed into a single burst data link for reducing the number of data links and layout area. Test chips are fabricated in 180 nm and 90 nm CMOS. Device scaling improves the data rate of the burst transmission so that it can enhance the layout area reduction. The layout area is reduced by a factor of three in 180 nm CMOS

and a factor of nine in 90 nm CMOS. The rest of the paper is organized as follows. Section II will explain the circuit detail of the high-speed inductive-coupling link, followed by simulated and measurement results in 180 nm CMOS. Section III will present the inductive-coupling burst transmission, and will demonstrate layout area reduction by test-chip measurements in 180 nm and 90 nm CMOS. Section IV will summarize this paper with some conclusions.

#### II. HIGH-SPEED INDUCTIVE-COUPLING LINK

## A. Circuit Design

Fig. 3 depicts a proposed high-speed low-latency asynchronous inductive-coupling transceiver. An H-bridge driver in the transmitter generates  $I_T$  from Txdata and drives the transmitter coil. Positive or negative small pulse-shaped voltage  $V_R$  is induced in the receiver coil which is biased at  $V_B$  (around  $V_{DD}/2$ ) by a replica bias generator (detail will be explained later). Centered around  $V_B$ , a positive pulse is generated when Txdata transitions from low to high and a negative pulse is generated when Txdata transitions from high to low. The receiver is a hysteresis comparator that detects the small pulse and converts it to digital data Rxdata. The hysteresis comparator consists of a gain stage (CMOS inverters XL, XR) and a latch circuit (cross-coupled PMOS). The gain stage amplifies  $V_R$  pulse and it drives the succeeding latch to switch and recover Rxdata. According to Rxdata the latch circuit modulates threshold voltage of the inverters in the gain stage. A broken line in Fig. 3 denotes the modulated threshold voltage of the inverter XL, namely  $V_{TH}$ . For example, when Rxdata is low,  $V_{TH}$  increases to  $V_{TH0} + \Delta V$  where  $V_{TH0}$  is the nominal threshold voltage of the inverter (typically around  $V_{DD}/2$ ) and  $\Delta V$  is the hysteresis width of the comparator.  $\Delta V$  is designed within an appropriate range so that the receiver can distinguish the signal and noise. When the inverter's input exceeds  $V_{TH0} + \Delta V$  due to the positive pulse  $V_R$ , Rxdata switches to high. The latch circuit then shifts  $V_{TH}$  to  $V_{TH0} - \Delta V$  and holds Rxdata high until the negative pulse voltage  $V_R$  is applied to the inverter's input. Repeating this operation, digital data is correctly recovered from the pulse voltages.



Fig. 3. High-speed inductive-coupling link and its simulated waveforms.

**Proposed Transceiver** [16]N. Miura (ISSCC'07) **Txdata** Txdata Txclk **Pulse** Generator H-Bridge Long Latency Circuit Topology Slow Rxclk Timing Comparato <u>Controlle</u>r Rxdata Rxdata **Data Rate** 11Gb/s 1Gb/s Latency 36ps ~600ps Energy/bit 1.4pJ/b 0.33pJ/b

TABLE I SUMMARY OF SIMULATED PERFORMANCE

In this asynchronous transceiver, the received voltage  $V_R$  and the hysteresis width  $\Delta V$  are designed with a wide margin against variations in process, voltage, and temperature (PVT), and communication distance.  $V_R$  should keep large enough by transmit current increase.  $\Delta V$  should keep low enough for the signal detection while remain high enough to ignore noise. Transistor size is carefully chosen based on full process-corner simulation. In addition, in the receiver, a precise control on  $V_B$  is critical for reliable operation since the CMOS inverter in the gain stage has a sharp gain characteristic on  $V_B$ . A replica of the receiver, whose input and output terminals are all shorted, is utilized for the  $V_B$  generation. It precisely sets  $V_B$  at  $V_{TH0}$  and adaptively controls  $V_B$  against PVT variations. A bypass capacitor  $C_B$  is inserted to reduce noise from the power, ground, and substrate.

Simulated performance of the proposed transceiver is summarized in Table I and is compared with the previous induc-

tive-coupling transceiver [16]. In our proposed transceiver, an asynchronous scheme is employed for the data link. No clock is needed for the data recovery. Since complicated timing control required in the synchronous scheme by using multi-phase clocks and a high-precision phase interpolator [16] is not needed, operation speed is improved. The maximum data rate of the inductive-coupling transceiver is determined by the transition frequency of the transistor  $f_T$ . It is around 60 GHz in 180 nm CMOS. The self-resonant frequency of the coil can be designed to be higher than 100 GHz in 180 nm CMOS so that it does not limit the data rate. Circuit simulation shows that the data rate can be improved up to 11 Gb/s by the asynchronous transceiver. However, coil size should be increased to improve signal-tonoise ratio (SNR) in order to compensate for weak noise immunity of the asynchronous receiver. This area overhead can be eliminated by burst transmission that will be introduced later. Modulation scheme is also modified such that *Txdata* drives the



Fig. 4. Stacked test chips of high-speed inductive-coupling link.

H-bridge directly to generate  $I_T$ . A pulse generator in the conventional transmitter is removed. The number of circuit stages in the transmitter is reduced, resulting in a small link latency. The simulated latency from Txdata to Rxdata is only 36 ps that is equivalent to 0.5FO4 inverter delay in 180 nm CMOS. This short latency enables high-speed burst transmission, which will also be discussed later.

In this modified modulation scheme, the transceiver consumes large DC current. Although the overhead can be minimized in high-speed operation during an active mode, the DC current should be switched out in a stand-by mode for low-power applications, such as in [21].

#### B. Test Chip Measurement

Two test chips for a transmitter and a receiver are fabricated in 180 nm CMOS. The transmitter chip is thinned to three different thicknesses -40, 25, and 10  $\mu$ m, and stacked over the receiver chip, both face-up, with 5  $\mu$ m-thick adhesive (Fig. 4). The communication distances are therefore 45, 30, and 15  $\mu$ m, respectively. The 40  $\mu$ m chip thickness is possible for mass production in a leading company. Coil size is 120  $\mu$ m in diameter. Number of coil turns is five, providing self inductance of 6 nH and self-resonant frequency of 16 GHz. It communicates through the transmitter chip substrate. In this chip stacking, the transmitter and the receiver coils are aligned by using infrared light with an alignment accuracy of less than 3  $\mu$ m [16]–[18]. In mass production,  $\pm 10 \, \mu m$  misalignment should be considered. It will slightly degrade the coupling strength of the coils but can be compensated by 4% transmit current increase [22]. The bias voltage of the receiver  $V_B$  is applied through the center tap on the receiver coil. The bypass capacitor  $C_B$  stabilizes the bias voltage.

The stacked test chips are attached to a silicon wafer and then mounted on a probe station (Cascade Microtech SUMMIT 11201B). The power and ground of the chips are supplied by DC probes (Cascade Microtech EyePass probe) while high-frequency signals such as *Txdata* and *Rxdata* are delivered through



Fig. 5. Measured BER dependence on data rate.

TABLE II
PERFORMANCE COMPARISON BETWEEN CAPACITIVE- AND
INDUCTIVE-COUPLING LINK

|                        | ,                                            | This Worl | <sup>[6]</sup> Q. Gu (ISSCC'07) |                                        |
|------------------------|----------------------------------------------|-----------|---------------------------------|----------------------------------------|
| Channel                | Inductive Coupling                           |           |                                 | Capacitive Coupling                    |
| Communication Distance | 15μm                                         | 30μm      | 45μm                            | <b>3</b> μm                            |
| Data Rate              | 11Gb/s                                       | 10.5Gb/s  | 8.5Gb/s                         | 11Gb/s                                 |
| Energy Dissipation     | 1.4pJ/b                                      | 1.5pJ/b   | 1.8pJ/b                         | 0.39pJ/b                               |
| BER                    | <10 <sup>-14</sup>                           |           |                                 | 1.02x10 <sup>-14</sup>                 |
| Layout Area            | 0.015mm <sup>2</sup>                         |           |                                 | 0.016mm <sup>2</sup>                   |
| Modulation             | Pulse                                        |           |                                 | 25GHz ASK                              |
| Process                | 180nm Standard CMOS<br>V <sub>DD</sub> =1.8V |           |                                 | 180nm 3D CMOS<br>V <sub>DD</sub> =1.0V |

AC probes (Cascade Microtech Infinity GSGSG probe) and 2.3 mm coaxial cables. *Txdata* is produced by a pattern generator in a serial bit error rate tester (BERT: Agilent Technologies N4906B). A pseudo random binary sequence (PRBS) data is used for *Txdata*. BER in *Rxdata* is tested by an error detector in the BERT.

Measured BER dependence on the data rate is presented in Fig. 5. For the communication distance of 15  $\mu$ m, the maximum data rate is 11 Gb/s with BER  $< 10^{-14}$ . For the distances of 30  $\mu$ m and 45  $\mu$ m, the maximum data rates are 10.5 Gb/s and 8.5 Gb/s, respectively.

# C. Performance Comparison

Table II compares the measured performance of the inductive-coupling link with the state-of-art capacitive-coupling link [6]. The communication distance of the inductive-coupling link is five times longer for the same data rate, bit error rate, and layout area. Although the data rate is slightly degraded to 10.5 Gb/s, it can communicate over 30  $\mu$ m distance which is ten times longer than that of the capacitive-coupling link. The energy dissipation of the inductive-coupling link is relatively



Fig. 6. Concept of burst transmission.



Fig. 7. Block diagram of inductive-coupling burst transceiver.

high because it operates under the nominal supply voltage in 180 nm standard CMOS.

#### III. INDUCTIVE-COUPLING BURST TRANSMISSION

#### A. Circuit Design

Burst transmission is an area reduction technique. The concept itself is well-known in high-speed serial link technology, such as in [23]. As illustrated in Fig. 6, since the bandwidth of the data link is improved by the above-mentioned asynchronous transceiver, multi-bit data links can be multiplexed into one burst data link. It reduces the number of data links and hence layout area. In face-up and back-to-back chipstacks, coils with large diameter is required due to the long communication distance. Therefore, it is area efficient to reduce the number of coils even if the multiplexer (MUX) and demultiplexer (DEMUX) increase the layout area for the circuits. A technical challenge is in providing a high-frequency burst clock to MUX and DEMUX in a simple way. Of course, a phase-locked loop (PLL) circuit is one approach to generate the high-frequency clock. However, it consumes large layout area. A simple digital circuit solution is required for area reduction.

Fig. 7 depicts a block diagram of our proposed transceiver that supports the burst transmission. Multi-bit data *Mtxdata* are multiplexed into burst data *Txdata* and transmitted by the high-

speed inductive-coupling link. The high-frequency burst clock Txclk for providing timing to MUX is generated by a local ring oscillator (OSC) and a counter. The counter generates the same number of clock waves as the number of data bit after rising of the system clock Sclk. Txclk is transmitted by another inductive-coupling link along with the data link and used for demultiplexing the received burst data Rxdata. Large jitter in the ring OSC can be cancelled out by this source synchronous transmission. In addition, since both clock and data are transmitted by the same inductive-coupling links whose latency is as small as 36 ps, variation in sampling timing  $t_{\rm sample}$  caused by PVT changes can be largely suppressed. A delay is inserted in the clock path to the demultiplexer in order to latch the data in the middle of the data cycle. No other timing control is needed.

Fig. 8(a) shows the timing chart of the burst transmission. In our test chip, the burst transmission is designed for 400 MHz system clock *Sclk*, assuming application to a processor for mobile phones such as in [24]. *Sclk* is assumed to be delivered to the receiver chip by using wire bonding or the inductive-coupling link for the global synchronization. All the circuits (MUX, DEMUX, oscillator, counter, and delay buffer) are implemented in current mode logic (CML) for high-frequency operation and small PVT variations. MUX and DEMUX are designed for 6.4 Gb/s operation, since upper limit of the operating frequency of the MUX/DEMUX is around 10 Gb/s in 180 nm



Fig. 8. (a) Timing chart of burst transmission and (b) simulated PVT variation in  $t_{\rm sample}$ .



Fig. 9. Stacked test chips of inductive-coupling burst transceiver in 180 nm CMOS.

CMOS [25]. The local ring oscillator is thus designed to produce 3.2 GHz Txclk. The counter should be Johnson-type for such high-frequency operation. It generates 8 clock waves in order to multiplex 16-bit at 400 Mb/s Mtxdata into 6.4 Gb/s burst Txdata. Both Txdata and Txclk are transmitted by the inductive-coupling links whose link latency is 36 ps. The delay buffer gives 0.5UI delay in Rxclk so that the sampling timing  $t_{\rm sample}$  is set in the middle of the data cycle ( $\sim$ 78 ps) Simulated PVT variation in  $t_{\rm sample}$  is less than 20 ps (<13%UI) as shown in Fig. 8(b). This wide design margin is obtained by the source synchronous transmission with the low-latency inductive-coupling link.

# B. Test Chip Measurement in 180 nm CMOS

Fig. 9 shows microphotographs of stacked test chips for the burst transmission. Layout area of the burst transceiver including MUX/DEMUX and oscillator is 0.1 mm<sup>2</sup>. All experimental setups are identical with the previous measurement for the high-speed inductive-coupling link. Again, device technology is 180 nm CMOS. The transmitter chip is thinned down



Fig. 10. Measured BER dependence on supply voltage variation in burst transmission.



Fig. 11. Stacked test chips of inductive-coupling burst transceiver in 90 nm CMOS.

and stacked over the receiver chip both face-up. The communication distances are 45, 30, and 15  $\mu$ m. Coil size is 120  $\mu$ m in diameter and the number of coil turns is five. Two coils for the clock and the data link are placed next to each other. Compared to the capacitive-coupling link, the inductive-coupling link causes relatively larger crosstalk [26]. However, in the burst transceiver, crosstalk is small enough since it uses only two inductive-coupling links and the number of crosstalk channels is limited. Theoretical calculation based on [26] shows that the crosstalk-to-signal ratio is lower than -20 dB for the distances shorter than 90  $\mu$ m. In case of using multiple burst links in parallel, space between coils will be required to eliminate the crosstalk. In the space between them, the transceiver circuits and MUX/DEMUX will be placed so that the area efficiency will not be degraded. The stacked chips are mounted on the probe station and tested with a BERT.

The BER of the burst transmission is measured at the maximum data rate of 6.4 Gb/s. BER is less than  $10^{-14}$  and error-free operation is achieved. Tolerance against supply voltage change in the burst transmission is also measured in order to demonstrate the robustness of this system. Fig. 10 presents measured BER dependence on supply voltage. In 6.4 Gb/s burst transmission, BER  $<10^{-14}$  is achieved for  $\pm10\%$  variations of the supply voltage. It is confirmed that the source synchronous transmission by the low-latency inductive-coupling link provides strong immunity against supply voltage change.

#### C. Test Chip Measurement in 90 nm CMOS

As mentioned in Section II, the inductive-coupling channel has large headroom in operation frequency and it does not limit the data rate. The data rate is restricted by the transition frequency of the transistor  $f_T$ . In 90 nm CMOS,  $f_T$  increases to 150 GHz which is 2.5x higher than that of 180 nm CMOS, and thus we can expect data rate enhancement up to 16 Gb/s (=6.4 Gb/s x 2.5). Test chips are designed and fabricated in



Fig. 12. Measured eye pattern of 2:1 DEMUX output in 90 nm CMOS burst transceiver.

90 nm CMOS for demonstrating this performance improvement. Fig. 11 shows the microphotograph of the stacked test chips. A transmitter chip is thinned down to 40  $\mu$ m thickness and stacked over a receiver chip both face-up using 5- $\mu$ m-thick adhesive. The communication distance between the transmitter and the receiver is thus 45  $\mu$ m. Coil diameter is 120  $\mu$ m, the same as in 180 nm CMOS, since the communication distance is identical. Circuits are shrunk due to the device scaling and thus total layout area is reduced to 0.08 mm<sup>2</sup>.

BER is measured by sweeping the operating frequency of the transceiver. The maximum data rate for BER  $< 10^{-14}$  is 15.2 Gb/s which is almost equal to the expected performance improvement. Fig. 12 shows a measured eye pattern of a 2:1 DEMUX output. Wide eye opening is obtained with a timing margin of 112 ps (>85%UI). The source synchronous transmission by the low-latency inductive-coupling link is still effective in providing accurate clock timing for high-operating frequency over 10 Gb/s.

### D. Performance Summary

Table III summarizes measured performance of the burst transceiver in 180 nm and 90 nm CMOS. In 180 nm CMOS,

| Parallel |                                      |           | 180nm CMOS | 90nm CMOS           |
|----------|--------------------------------------|-----------|------------|---------------------|
|          | Aggregated<br>Data Rate              |           | 6.4Gb/s    | 15.2Gb/s            |
|          | Number<br>of Links                   | Parallel* | 16 Links   | 38 Links            |
|          |                                      | Burst     | 2 Links    | 2 Links             |
| Ţ        | Area                                 | Parallel* | 0.3mm²     | 0.72mm <sup>2</sup> |
| Burst    |                                      | Burst     | 0.1mm²     | 0.08mm <sup>2</sup> |
|          | Area Reduction by Burst Transmission |           | 1/3        | 1/9                 |

TABLE III
PERFORMANCE SUMMARY OF INDUCTIVE-COUPLING BURST TRANSCEIVER

the burst transceiver communicates at 6.4 Gb/s. In conventional parallel transceiver [16], 16 data links would have been required for the same aggregate data rate of 6.4 Gb/s. Even though the coil diameter can be reduced to 90  $\mu$ m in the synchronous scheme, a layout area of 0.3 mm² is needed. On the other hand, the burst transceiver requires only two links (data and clock). The layout area is reduced to 0.1 mm², which is 1/3 of the parallel transceiver. In 90 nm CMOS, the data rate is increased to 15.2 Gb/s. The burst transceiver can multiplex 38 inductive-coupling links so that area reduction ratio can be improved to 1/9.

#### IV. CONCLUSION

In this paper, a high-speed inductive-coupling link is presented. An asynchronous scheme is employed for the data recovery, which improves operation speed by eliminating complicated timing controls. Modulation scheme is also modified in order to minimize the number of circuit stages, reducing the link latency down to 36 ps. A replica bias generator provides bias voltage through the center tap of the coil. It precisely controls bias voltage against PVT variations and guarantees reliable operation. Test chip measurement in 180 nm CMOS demonstrates a data rate of 11 Gb/s with BER  $< 10^{-14}$  for a communication distance of 15  $\mu$ m. The data rate is 11 times higher than past inductive-coupling links. The communication distance is 5 times longer than the state-of-the-art capacitive-coupling link for the same data rate, BER, and layout area.

Burst transmission utilizing the high-speed inductive-coupling link is also presented. Low-speed data links are multiplexed into the high-speed burst inductive-coupling link for reducing the number of data links and hence layout area. A high-frequency burst clock for multiplexing is generated by a simple digital circuit solution with a local ring oscillator and a counter. Although clock jitter is large, the area overhead can be minimized. The large jitter is canceled out by source synchronous transmission where the burst clock is transmitted by another inductive-coupling link along with the data link and used for demultiplexing. Moreover, since the clock and data are transmitted by the same inductive-coupling link whose link latency is only 36 ps, immunity against PVT variations can be improved. Test chip measurement in 180 nm CMOS shows

6.4 Gb/s burst transmission with BER  $< 10^{-14}$  for  $\pm 10\%$  variations of the supply voltage. It can multiplex 16 data links and reduce layout area by a factor of three. Device scaling from 180 nm to 90 nm CMOS improves the data rate up to 15.2 Gb/s. Layout area reduction is further enhanced by another factor of three.

#### ACKNOWLEDGMENT

The authors are grateful to M. Tago of NEC Corporation for assistance in stacked-chip assembly.

### REFERENCES

- [1] G. Moore *et al.*, "No exponential is forever: But "Forever" can be delayed!," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2003, pp. 20–23.
- [2] T. Ezaki *et al.*, "A 160 Gb/s interface design configuration for multichip LSI," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2004, pp. 140–141.
- [3] K. Kumagai et al., "System-in-silicon architecture and its application to H.264/AVC motion estimation for 1080HDTV," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2006, pp. 430–431.
- [4] J. Burns et al., "Three-dimensional integrated circuits for low-power, high-bandwidth systems on a chip," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2001, pp. 268–269.
- [5] V. Suntharalingam et al., "Megapixel CMOS image sensor fabricated in three-dimensional integrated circuit technology," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2005, pp. 356–357.
- [6] Q. Gu *et al.*, "Two 10 Gb/s/pin low-power interconnect methods for 3D ICs," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 448–449.
- [7] L. Luo et al., "A 36 Gb/s ACCI multi-channel bus using a fully differential pulse receiver," in *Proc. CICC*, Sep. 2006, pp. 773–776.
- [8] S. Mick et al., "4 Gbps high-density AC coupled interconnection," in Proc. CICC, May 2002, pp. 133–140.
- [9] L. Luo et al., "3 Gb/s AC-coupled chip-to-chip communication using a low-swing pulse receiver," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2005, pp. 522–523.
- [10] A. Fazzi et al., "3D capacitive interconnections with mono- and bi-directional capabilities," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 356–357.
- [11] D. Hopkins et al., "Circuit techniques to enable 430 Gb/s/mm<sup>2</sup> proximity communication," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 368–369.
- [12] A. Fazzi et al., "A 0.14 mW/Gbps high-density capacitive interface for 3D system integration," in *Proc. CICC*, Sep. 2005, pp. 101–104.
- [13] R. Drost et al., "Electronic alignment for proximity communication," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2004, pp. 144–145.
- [14] K. Kanda et al., "A 1.27 Gb/s/ch 3 mW/pin Wireless Superconnect (WSC) interface scheme," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2003, pp. 186–187.
- [15] S. Kuhn *et al.*, "Vertical signal transmission in three-dimensional integrated circuits by capacitive coupling," in *Proc. ISCAS*, Apr. 1995, pp. 37–40.

- [16] N. Miura et al., "A 0.14 pJ/b inductive-coupling inter-chip data transceiver with digitally-controlled precise pulse shaping," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 358–359.
- [17] N. Miura et al., "A 1 Tb/s 3 W inductive-coupling transceiver for interchip clock and data link," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2006, pp. 424–425.
- [18] N. Miura et al., "A 195 Gb/s 1.2 W 3D-stacked inductive inter-chip wireless superconnect with transmit power control scheme," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2005, pp. 264–265.
- [19] N. Miura et al., "An 11 Gb/s inductive-coupling link with burst transmission," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 298–299.
- [20] D. Mizoguchi et al., "A 1.2 Gb/s/pin wireless superconnect based on Inductive Inter-chip Signaling (IIS)," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2004, pp. 142–143.
- [21] Y. Sugimori et al., "A 2Gb/s 15pJ/b/chip inductive-coupling programmable bus for NAND flash memory stacking," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2009, pp. 244–245.
- [22] K. Niitsu et al., "Misalignment tolerance in inductive-coupling interchip link for 3D system integration," in Proc. SSDM, Extended Abstracts, Sep. 2008, pp. 86–87.
- [23] K. Kanda *et al.*, "40 Gb/s 4:1 MUX/1:4 DEMUX in 90 nm standard CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2005, pp. 152–153.
- [24] M. Ito et al., "A 390 MHz single-chip application and dual-mode base-band processor in 90 nm tripple-Vt CMOS," in IEEE ISSCC Dig. Tech. Papers, Feb. 2007, pp. 274–275.
- [25] A. Tanabe *et al.*, "A 10 Gb/s demultiplexer IC in 0.18  $\mu$ m CMOS using current mode logic with tolerance to the threshold voltage fluctuation," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2000, pp. 62–63.
- [26] N. Miura et al., "Crosstalk countermeasures for high-density inductive-coupling channel array," *IEEE J. Solid-State Circuits*, vol. 42, no. 2, pp. 410–421, Feb. 2007.



Noriyuki Miura (S'06–M'08) received the B.S., M.S., and Ph.D. degrees in electrical engineering from Keio University, Yokohama, Japan, in 2003, 2005, and 2007, respectively. During his Ph.D. work for two years, he served as a Fellow researcher of the Japan Society for the Promotion of Science (JSPS).

He is currently a Research Associate at Keio University, working on short-range wireless transceiver circuit design and interconnect technology for 3-D system integration.

Dr. Miura received the IEEE System LSI Award in 2005 and 2007, the 2006 LSI IP Design Award, the 2006 IP/SoC Best Design Award, the 2006 IEEE SSCS Japan Chapter Young Researcher Award, and the 2007 ASP-DAC Outstanding Design Award.



Yoshinori Kohama was born in Aichi, Japan, on January 30, 1985. He received the B.S. degree in electrical engineering in 2007 from Keio University, Yokohama, Japan, where he is currently working toward the M.S. degree. He has been engaged in research on inter-chip communication using inductive-coupling since 2006.



Yasufumi Sugimori received the B.S. degree in electrical engineering from Keio University, Yokohama, Japan, in 2007, where he is currently working toward the M.S. degree. Since 2006, he has been engaged in a research on the 3-D-stacked inductive inter-chip wireless interface for System in a Package.



**Hiroki Ishikuro** received the B.S., M.S., and Ph.D. degrees in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1994, 1996, and 1999, respectively.

In 1999, he joined the System LSI Research and Development Center, Toshiba Corp., Kawasaki, Japan, where he was involved in the development of CMOS RF and mixed-signal LSIs for wireless applications. In 2006, he joined the Department of Electrical Engineering at Keio University as an Assistant Professor, where he became an Associate

Professor in 2008. His current research interests lie in high-speed wireless interface, mixed-signal circuit design, and transceiver architecture for software-defined radio.



**Takayasu Sakurai** (S'77–M'78–SM'01–F'03) received the Ph.D. degree in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1981.

In 1981, he joined Toshiba Corporation, where he designed CMOS DRAM, SRAM, RISC processors, DSPs, and SoC Solutions. He has worked extensively on interconnect delay and capacitance modeling known as the Sakurai model and alpha power-law MOS model. From 1988 through 1990, he was a visiting researcher at the University of California Berkeley, where he conducted research in the field

of VLSI CAD. Since 1996, he has been a Professor at the University of Tokyo, working on low-power high-speed VLSI, memory design, interconnects, ubiquitous electronics, organic ICs and large-area electronics. He has published more than 400 technical publications including 100 invited presentations and several books, and has filed more than 200 patents.

Dr. Sakurai served as a conference chair for the Symposium on VLSI Circuits and ICICDT, vice chair for ASPDAC, TPC chair for the first A-SSCC, and VLSI Symposium, and a program committee member for ISSCC, CICC, A-SSCC, DAC, ESSCIRC, ICCAD, ISLPED, and other international conferences. He is a recipient of the 2005 IEEE ICICDT award, 2004 IEEE ISSCC Takuo Sugano award, 2005 P&I Patent of the Year award, and four product awards. He has given the keynote speech at more than 50 conferences including ISSCC, ESSCIRC, and ISLPED. He is a consultant to startup and international companies. He was an elected AdCom member for the IEEE Solid-State Circuits Society and an IEEE CAS and SSCS distinguished lecturer. He is a STARC Fellow and an IEEE Fellow.



**Tadahiro Kuroda** (M'88–SM'00–F'06) received the Ph.D. degree in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1999.

In 1982, he joined Toshiba Corporation, where he designed CMOS SRAMs, gate arrays and standard cells. From 1988 to 1990, he was a Visiting Scholar with the University of California, Berkeley, where he conducted research in the field of VLSI CAD. In 1990, he was back to Toshiba, and engaged in the research and development of BiCMOS ASICs, ECL gate arrays, high-speed CMOS LSIs for telecommu-

nications, and low-power CMOS LSIs for multimedia and mobile applications. He invented a variable threshold-voltage CMOS (VTCMOS) technology to control VTH through substrate bias, and applied it to a DCT core processor and a gate-array in 1995. He also developed a variable supply-voltage scheme using an embedded DC-DC converter, and employed it to a microprocessor core and an MPEG-4 chip for the first time in the world in 1997. In 2000, he moved to Keio University, Yokohama, Japan, where he has been a Professor since 2002. He has been a Visiting Professor at Hiroshima University, Japan, and the University of California, Berkeley. His research interests include low-power, high-speed CMOS design for wireless and wireline communications, human computer interactions, and ubiquitous electronics. He has published more than 200 technical publications, including 50 invited papers and 20 books/chapters, and has filed more than 100 patents.

Dr. Kuroda served as the General Chairman for the Symposium on VLSI Circuits, the Vice Chairman for ASP-DAC, subcommittee chair for A-SSCC, ICCAD, and SSDM, and program committee member for the Symposium on VLSI Circuits, CICC, DAC, ASP-DAC, ISLPED, SSDM, ISQED, and other international conferences. He is a recipient of the 2005 IEEE System LSI Award, the 2005 P&I Patent of the Year Award, the 2006 LSI IP Design Award, the 2006 IP/SoC Best Design Paper Award, and the 2007 ASP-DAC Best Design Award. He is an IEEE Fellow, an elected AdCom member for the IEEE Solid-State Circuits Society, and an IEEE SSCS Distinguished Lecturer.