# A Monolithic 3D Integration of RRAM Array with Oxide Semiconductor FET for In-memory Computing in Quantized Neural Network AI Applications

Jixuan Wu<sup>1\*</sup>, Fei Mo<sup>2</sup>, Takuya Saraya<sup>2</sup>, Toshiro Hiramoto<sup>2</sup>, and Masaharu Kobayashi<sup>1,2</sup>

<sup>1</sup>System Design Research Center (*d.lab*), <sup>2</sup>Institute of Industrial Science, The University of Tokyo

\* jixuanwu@nano.iis.u-tokyo.ac.jp

### Abstract

We have monolithically integrated RRAM array with oxide semiconductor channel access transistor in 3D stack, achieved uniform memory characteristics of 1T1R cells at each layer, and demonstrated basic functionality of XNOR operation as inmemory computing for binary neural network AI applications, for the first time. The impact of RRAM bit error rate on neural network is also investigated. 3D neural network built by this architecture has high potential to enable area-efficient, low-power and low-latency computing.

### Introduction

In-memory computing has attracted worldwide attention for deep neural network applications because of its high energy efficiency [1]. In particular, RRAM-based neural network has been extensively studied from device to system level [2-4]. Binary neural network (BNN) has been proposed for its simple implementation in digital hardware [5]. RRAM-based BNN has advantages such as stability, noise margin, and testability. XNOR operation for weighted sum calculation in BNN can be simply realized by RRAM cells [6-7] as in-memory computing. One challenge of BNN is the network size. Because of the low expression ability of binary weight and activation, network size needs to be large (Fig. 1). For massive parallel input/output, 2D neural net suffers from large energy and delay in long interconnect wires. 3D neural net is a new direction enabling area-efficient, low power, and low latency computing (Fig. 2).

RRAM-only network suffers from the sneak current and programing disturbance if appropriate selector is not used. So far, 1T1R cell is the most robust structure. To stack 1T1R RRAM array, we need access transistor which can be fabricated by low temperature process in BEOL (Fig. 3). Moreover, the access transistor must have sufficiently high mobility to drive RRAM cell (Fig. 4). Oxide semiconductor such as IGZO is a promising channel material because of its high mobility and low temperature process [8-10].

In this work, we propose and develop a monolithic integration of RRAM array with IGZO access transistor in 3D stack. Then we demonstrate basic functionality of in-memory computing in the 3D neural net. The recognition accuracy of the BNN is estimated as a function of bit error rate of RRAM.

## **Device structure and Fabrication**

1T1R RRAM array with IGZO FET are integrated in spiral 3D stacking architecture where each layer is rotated by 90° from previous layer. In neural network, the layer's output is typically connected to the next layer's input. This architecture avoids interconnect wiring overhead (Fig.5(a)).

Device fabrication flow is designed as simple as possible for proof-of-concept in the university lab (Fig. 6). In each layer, IGZO FET is formed by bottom gate structure and HfO<sub>2</sub> gate insulator. RRAM is formed in the stack of TiN/Ti/HfO<sub>2</sub>/TiN [11]. The process of 1T1R RRAM array is repeated 3 times. Fig. 5(b-d) show the top down images of FETs after completing 1<sup>st</sup>, 2<sup>nd</sup>, and 3<sup>rd</sup> layer. Process temperature is limited to 400°C. From TEM images in Fig. 7(a-e) and Fig. 7(f-k), we confirmed uniform IGZO FET and RRAM at each layer.

### **Results and Discussions**

A. FET, RRAM, and 1T1R cell characteristics

We characterized IGZO FET and RRAM. Fig. 8 and 9 show I<sub>d</sub>-V<sub>g</sub> and I<sub>d</sub>-V<sub>d</sub> curves of IGZO FET for all layers. Each layer shows almost identical characteristics. Normally-off operation, nearly ideal subthreshold slope, and >200µA drive current were obtained. I-V curves of 1R cell and 1T1R cell are compared in Fig. 10. On-current of 1T1R cell is smaller than that of 1R cell because of the series resistance by IGZO-FET. Set and reset voltage of 1R and 1T1R cell are extracted in Fig.11. While 1T1R cell has almost the same set voltage as 1R cell, 1T1R cell has higher reset voltage than 1R cell. This is because series resistance by IGZO FET is relatively larger than the resistance of RRAM when RRAM is in low resistance state (LRS) before reset. Note that reducing the resistance of access transistor by higher mobility is crucial for low voltage operation and small cell area of 1T1R cell [12]. The cycle to cycle (C2C) variation of the resistance of 1T1R cell is shown in Fig.12. LRS has uniform distribution but high resistance state (HRS) has large variation. This is typical for HfO<sub>2</sub>-based RRAM because of the large variability in filament dissociation in HRS. Fig.13 shows I-V curves of 1T1R cells for all layers. The device to device (D2D) resistance variation is extracted from Fig. 13 in Fig.14. Nearly the same distribution with the on/off ratio of >10 was obtained. Endurance and retention characteristics are shown in Fig.15 and 16. No reliability degradation was found by 3D integration.

B. In-memory computing of XNOR for binary neural net

We demonstrate XNOR operation by a pair of 1T1R cells in Fig.17 (a). We choose voltage sensing scheme [7,13]. Weight bit (W) is complementarily written on RRAMs (R, R'). Input bit (x) is complementarily applied on word lines ( $V_{WL}$ ,  $V_{WL}$ '). Bit line (BL) is precharged. Then, BL is discharged with slow or fast speed depending on the input and weight bit. After certain period, BL voltage is compared with reference voltage. The output bit (y) of XNOR is obtained from the comparator. Fig. 17 (b) shows the fabricated 1T1R array. The operation is performed by using the external peripheral circuit in Fig. 17 (c). Fig.18 shows the waveforms and confirmed XNOR operation. XNOR output is digitally counted [5] or aggregated in voltage sensing at each BL [7] for weighted sum calculation.

Based on the RRAM-based XNOR, we estimate the recognition accuracy of MNIST dataset in BNN using the framework in Fig.19. As shown in Fig. 20, although the accuracy is degraded as RRAM bit error rate (BER) increases, it is not very sensitive to BER up to certain level (10ppm in this case), which indicates the property of error-resilience in BNN.

## Summary

We developed monolithic 3D integration of RRAM array with IGZO access transistor in 3D stack, confirmed each layer has uniform and almost identical device characteristics without degradation, and demonstrated functionality of in-memory computing of XNOR and error-resilient BNN for 3D neural net.

## Acknowledgement

This work was supported by JST CREST (16815651), JSPS KAKENHI Grant Number JP18H01489 and Tokyo Electron Ltd.



Time [a.u.] Time [a.u.] Fig. 18 Measured waveform of an XNOR cell with Fig. 17 (a) Schematic of 2T2R the peripheral circuit of Fig. 17 (c) for (a) (R, R') = XNOR cell. (b) Fabricated 1T1R (HIGH, LOW) and (b) (R, R') = (LOW, HIGH). array. (c) External peripheral circuit. V<sub>PC</sub>=0.3V, V<sub>REF</sub>=0.1V, V<sub>WL</sub>=1.5V. 3.3V for circuit. BER due to RRAM cell variability.

Fig. 19 Schematic of the digitally implementable framework of the BNN [5] incorporating RRAM

Fig. 20 Estimated recognition accuracy of MNIST dataset in BNN as a function of RRAM cell BER for layer size 2048.

Bit error rate