## A High Speed and Low Power 8 Bit x 8 Bit Multiplier Design using Novel Two Transistor (2T) XOR Gates



# A High Speed and Low Power 8 Bit x 8 Bit Multiplier Design using Novel Two Transistor (2T) XOR Gates

Himani Upadhyay<sup>1\*</sup>

Shubhajit Roy Chowdhury<sup>2</sup>

<sup>1</sup>Centre for VLSI and Embedded System technology (CVEST), IIIT Hyderabad, Hyderabad, Andhra Pradesh, 500032, INDIA, email id: himani.upadhyay@research.iiit.ac.in

<sup>2</sup>Centre for VLSI and Embedded System technology (CVEST), IIIT Hyderabad, Hyderabad, Andhra Pradesh, 500032, INDIA, email id: src.vlsi@iiit.ac.in

\*corresponding author: Himani Upadhyay

### Address:

International Institute of Information and Technology, Hyderabad Centre for VLSI and Embedded System Technology (CVEST) Gachibowli, Hyderabad, Andhra Pradesh, 500032, INDIA

Telephone no: +91-7842912282

Fax no: +91-4066531413

Email id: <a href="mailto:himaniupadhyay@research.iiit.ac.in">himaniupadhyay@research.iiit.ac.in</a>

Date of Receiving: Date of Acceptance:

## A High Speed and Low Power 8 Bit x 8 Bit Multiplier Design using Novel Two Transistor (2T) XOR Gates

Himani Upadhyay

Shubhajit Roy Chowdhury

#### Abstract

The paper proposes a novel design of two transistor (2T) XOR gate and its application to design an 8 bit x 8 bit multiplier. The design explores the essence of suitably biasing the MOS transistor and engineering the threshold voltage of the MOS transistor through appropriate biasing and device geometry. Using the 2T XOR gates, a full adder has been realised. Detailed simulations have been carried out to compare the proposed 2T XOR gate and 6T full adder against the existing XOR gates and full adders available in literature with respect to power delay product (PDP), noise margin and area. A significant improvement of PDP, area and noise margin has been obtained with the 2T XOR gate with respect to the existing XOR gates. An 8 bit x 8 bit multiplier has also been implemented using the design of 6T adder and its performance has been analysed and compared with similar multipliers designed with peer adders design available in literature. Simulation studies have been carried out using UMC 65-nm, 90-nm and 130-nm CMOS process technologies in Cadence Spectre along-with process, voltage and temperature (PVT) variation analysis. The power delay product (PDP) of the proposed multiplier has been found to be as low as 1.854 pJ using UMC 65-nm CMOS process. The design of the 8 bit x 8 bit multiplier has been extended to the design of 8 bit multiplyaccumulate (MAC) unit, which has been simulated using 65-nm CMOS process. A delay of 3.977 ns and power dissipation of 1.107 mW has been obtained with the MAC unit. The proposed XOR gate is also evaluated through independent gate (IG) mode FINFETs in 32-nm technology as a substitute to CMOS technology.

XOR Gate; Full Adder; Bulk Terminal; Biasing; Threshold Voltage; Arithmetic Circuits; Layout; Multiplier; Multiply-Accumulate Unit; FINFETs

### 1 Introduction

The purpose of integrated electronics is to compress complex electronic circuits in minimum area with reduction in power dissipation and delay. With the era of technological advancement, reducing the number of transistors and ultra-low power design has become the driving force for integration of more and more applications without incurring any overhead in terms of silicon area. The need of transistors in VLSI design is duly elucidated by Moore's Law [1]. Reducing the transistor count and power of circuits have been focus of researchers for so many years and is still continuing [2]. XOR gate is an elementary building block of digital circuits and there is persistent research going on to enhance its performance. Over the decade, designs of XOR gates have been proposed starting with eight transistors [3] and six transistors [3]. Further ongoing researches laid the foundation of four transistor XOR gates [4, 5, 6, 7, 8, 9, 10, 11, 12] and later 3 transistor XOR gates [13]. They were surveyed in all forms and were utilized in many industrial applications. CMOS transmission gate logic XOR gate [3] is replaced by four transistor XOR gates as shown in Figure 1(a) and (b) [5]. Further, power consumption was reduced with XOR gate design without  $V_{DD}$  as shown in Figure 1(c) [6] manifested by Wang, Bui and Jiang. Figure 1(d) depicts a XOR gate with same Silicon (Si) area as shown in Figure 1(c) but has enhanced power delay product (PDP) [7, 8]. A striking progress came up with three transistor XOR gate design by Roy Chowdhury et al combining CMOS logic with pass transistor logic [13] as shown in Figure 1(e). Reducing transistor count, area and power delay product remains the three basic goals to refine XOR gate designs across the years coming and still continuing [1-13]. With the objective of further reducing the transistor count a novel design of a two transistor XOR gate is proposed in the current paper. The XOR gate has been found to be implemented over lesser silicon area with lot of improvement in power-delay product.

Adders are indispensable in VLSI circuits and proficient employment of these adders affect the performance of entire system [14]. In previous years, different adders adopting diverse logic styles have come into picture [3, 4, 7-25]. A traditional low power 28 transistor design of a CMOS full adder adopts pull-up PMOS network and pull-down NMOS network [16] but requires large chip area. Further, 20 transistor [17] and 16 transistor [18] adder designs came up with CMOS transmission gate (TG) logic with full output voltage swings. Later, 14 transistor adder has been designed with pass transistor logic [19] with XOR and XNOR gates where  $\overline{A \oplus B}$  is generated by inverter. With ongoing research to reduce transistor count, many versions of 10 transistor adder were proposed in previous designs [7, 8]. The pass transistor logic based static energy recovery full adder has the shortcoming of speed, severe threshold loss [20, 21, 22]. Liu, Hwang, Shao and Ho proposed a Complimentary and Level restoring carry logic (CLRCL) adder [23]. The CLRCL adder engages 2 to 1 mux, complimentary inputs and CMOS inverters. In order to discard the problems in 10 transistor adder, novel 8 transistor logic as shown in Figure 2 was designed by combining CMOS with pass transistor logic [13]. This design has two drawbacks. Firstly, voltage degradation due to threshold drop and secondly, current feedback due to transistor M1 when the inputs are A=1 and B=0. This paper describes a novel implementation of a XOR gate with reduced transistor count and area. The design of 2T XOR gate is based on two PMOS transistors. The 2T XOR gate is used for the design of a 6T full adder. A depiction of further research is accomplished through study of proposed XOR gate using FINFETs in 32-nm technology based on literature survey.

Adders are important building blocks for multipliers. Multipliers are the fundamental component and multiplication is basic operation in many digital signal processors, general purpose processors and digital filters [26, 27, 28]. Therefore, power dissipation and speed are of primary concern. An 8 bit x 8 bit array multiplier has been implemented using proposed 6T adder to show reduced power dissipation, power delay product delay (PDP) and silicon area. The multiplier as well as 6T full adder and 2T XOR has been found to simulate faithfully up to a frequency of 2 GHz. The main computational kernel of DSP architectures is the multiply-accumulate (MAC) unit [29, 30].

Basically, it computes the product of two numbers and adds the product to an accumulator. The energy consumption at each level will affect the overall MAC unit power. An 8 bit MAC unit has been formulated in 65-nm technology extending the use of 2T XOR gates, 6T adders and 8 bit x 8 bit multipliers for DSP architectural operations and utilization in Application-Specific-Integrated-Circuits (ASICs).

The paper is organized as follows: Section 2 proposes the design of two transistor XOR gate. Section 3 gives simulation and performance analysis of XOR gates. Section 4 employs 2T XOR gate to formulate six transistor adder with only two stages delay besides diminishing logic depth [31]. Section 5 gives the layout view and Section 6 depicts the simulation and performance analysis of six transistor adder. Section 7 describes the application of proposed 6T adder in an 8 bit x 8 bit multiplier and also the performance analysis with respect to power, delay, PDP (power delay product) and area of 8 bit x 8 bit multipliers designed with different adders in literature [13, 16, 17, 19, 21, 23]. The elaboration of the work has been encountered by the design of 8 bit MAC unit in 65-nm technology. To the best of our knowledge, it is the voltage mode full adder with least number of transistor count designed so far. For uniqueness and comparison of model, simulations are executed in three different technologies. The power and delay simulation of XOR gates and adders have been carried along with area comparison of adders. The entire simulation has been carried out using Cadence Spectre.

## 2 DESIGN OF TWO TRANSISTOR XOR GATE

Figure 3 shows the proposed two transistor XOR gate. The design of the 2T XOR gate is based on two PMOS transistors and a voltage source. The central idea is to obtain logic values of XOR logic is through changing  $V_T$  of the circuit and modifying the values of bulk terminal i.e.  $V_{SB}$  [32]. When A=1 and B=0, M2 transistor turns ON, we get the output as logic high and equivalently for A=0 and B=1. However, for A=1 and B=1, both the PMOS transistors are OFF and output goes to uncertain or high impedance state. The relation exhibited between channel length (L), width (W), substrate to bulk voltage ( $V_{SB}$ ) of transistor is as shown in equation (1) and cited in [32].

$$V_T = V_{T0} + \gamma \left( \sqrt{V_{SB} + \phi_0} - \sqrt{\phi_0} \right) - \alpha_v \frac{t_{ox}}{L} V_{DS}$$

$$+ \alpha_w \frac{t_{ox}}{W} (V_{SB} + \phi_0)$$

$$(1)$$

Where,

 $V_{T0}$ : Zero bias threshold voltage,

γ : Bulk threshold coefficient,

 $V_{SB}$ : Bulk Potential,

 $\phi_0$ : Fermi potential,

 $t_{ox}$ : The thickness of the oxide layer,

 $V_{DS}$ : Drain to source voltage

 $\alpha_{v}$ ,  $\alpha_{w}$ : Process dependent parameters.

From equation (1), it is evident that changing values of W, L and  $V_{SB}$  varies the logic levels of XOR gate. When A=1 and B=1 both the PMOS transistors are off and there is need to bring down the logic level from high to low by increasing  $V_T$  to decrease  $V_{DD} - V_T$  value. A reverse bias voltage of 320 mV source is used for that purpose in 65-nm technology.

A reverse bias voltage ( $V_{SB} = V_{ref}$ ) of 320 mV has been engineered through diode connected NMOS voltage divider circuit as shown in Figure 4. The circuit has been examined for all process corners and noise considerations. With noise of -131 dB at frequency of 1 MHz and further improvement at higher frequencies, it justifies all the process corners viz. slow-slow, fast-fast, slow-fast and fast-slow. The extension of bias circuit requires addition of 1.114  $\mu$ m<sup>2</sup> area to the proposed model of XOR gate.

The analysis of the 2 transistor XOR gate is also manifested in 32-nm technology using P-FINFET devices as shown in Figure 5. The FINFET circuit acts as a double gate device with front and back gate [33, 34, 35]. The back gate is devised appropriately with a reverse back bias ( $V_{62}$ ) of 560 mV to

obtain outputs for the XOR gate. The threshold voltage variation of FINFET devices is governed by equation (2) as shown below [36, 37]:

$$V_{G1} = \varphi_{MSG1} + V_{th0} - \gamma [V_{G2} - (\varphi_{MSG2} + V_{th0})]$$
 (2)

Where,

 $V_{G1}$ : Threshold voltage of the front gate of FINFET

 $\varphi_{MSG1}, \varphi_{MSG1}$ : Work function of front gate and back gate

 $V_{th0}$ : Threshold voltage of FINFET without any secondary effect

 $\gamma$ : Fixed parameter depending on the geometry of device

 $V_{G2}$ : Back gate voltage

The FINFETs has advantage over conventional bulk MOSFETs due to biasing flexibility of independent gates and reduces short-channel effect problems due to technology scaling [36]. According to equation (2),  $V_{G2}$  controls the threshold voltage of front gate ( $V_{G1}$ ). The circuit gives logic high level for A=0, B=1 with M1 as the controlling P-FINFET and vice versa for A=1, B=0. For A=1, B=1 reverse back biasing technique controls the threshold voltage giving logic low level by increasing the threshold voltage of the front gate( $V_{G1}$ ).

## 3 SIMULATION AND PERFORMANCE ANALYSIS OF TWO TRANSISTOR XOR GATE

Extensive simulation study of XOR gates has been carried out to compare the proposed design of the 2T XOR gate with existing designs of XOR gates available in literature. All the simulations and extraction of net-lists have been done in Cadence Spectre at 65-nm, 90-nm and 130-nm technologies. The designs are simulated at 50 MHz frequency with 50 ps of rise and fall times. The inputs are driven by inverters.

Table I depicts power, delay and PDP (product of power and delay) of the proposed novel XOR implementation as well as existing XOR designs in the literature. There is gradual decline of power from 6T XOR gate to 2T XOR gate with moderate falling off PDP. A contemporary idea of sacrificing power for delay can be studied from previous designs eventually giving minimal PDP

[23]. As it is evident from Table I, there is a large drop in power consumption for the 2T XOR gates. Table II shows the comparison of noise margins of 2T XOR gate designs with the design of XOR gates available in literature. The noise margin are found to be comparable.  $NM_H$  (High noise margin) and  $NM_L$  (Low noise margin) are studied by performing the DC analysis of circuit in Cadence by finding the voltage transfer characteristics (VTC) and balancing the switching probabilities of the two PMOS transistors at GND(low) and  $V_{DD}$  (high). The XOR gate is studied and the impact of process, voltage and temperature (PVT) variation analysis in 65-nm, 90-nm and 130-nm technologies is analysed. The process variation tolerance incorporates bias voltage change of (+/-) 100 mV (from nominal value of 320 mV in 65-nm technology), temperature variation from -20°C to 70°C and including slow-slow, fast-fast, slow-fast and fast-slow process corners.

Figure 6 shows the simulation results of 2T XOR gates in 65-nm technology in Cadence with 270 mV and 320 mV source values for analysing the effect of changing threshold voltage. The waveforms at different input combination are worth observing when output logic low is altered by change in  $V_T$ . With increase in reverse bias voltage value,  $V_T$  is increased and  $V_{DD}$ - $V_T$  at A=1 and B=1 goes to logic low values. The body biasing method reduces the sub-threshold leakage of the 2T XOR gate and has little effect on dynamic power [38]. The gate leakage power is in range of femto-watts (fW) obtained by summation of leakage power of individual gates of PMOS transistors and is least for 65-nm due to technology scaling [39].

## 4 DESIGN OF SIX TRANSISTOR FULL ADDER

The proposed 2T XOR gate has been used to design a 6T full adder. The two outputs SUM and CARRY (*Cout*) can be generated based on the Boolean equations (3) and (4) of full adder as follows:

$$Sum = A \oplus B \oplus Cin \tag{3}$$

$$Cout = B.Cin + Cin.A + A.B = Cin. (A \oplus B) + A.B$$
(4)

The logic circuit of typical full adder is given in Figure 7. The exclusive ORing realized using wired logic [40] of 2T XOR gate as depicted in equation (3) give rise to sum output and the final carry

output is given by using M5 and M6 pass transistors given by equation (4). The W/L ratio of M5 and M6 transistors are W=300 nm, L=60 nm in 65-nm technology. The W and L of transistors from M1 to M4 is same as defined for XOR gates. The schematic of the proposed six transistor full adder is shown in Figure 8.

A reverse bias voltage of 320 mV is kept in order to concisely represent the appropriate logic high and logic low levels in the output of simulated circuit. Evidently, for the three input combination there is two stages delay for the sum and carry output and the delay for carry output is less than the previously designed eight transistor adder [13] which is the critical delay of the circuit used for further finding the PDP [31]. As two XOR gates are used with reverse bias, there is decline in the logic level with the cascading of exclusive OR gates mainly perturbed at A=1 and B=1. The approach of using minimum width and length is for minimizing the power consumption in the circuit [32]. The aspect ratio of M5 and M6 are maintained for minimizing the threshold drop in the design.

## 5 LAYOUT DESIGN OF PROPOSED SIX TRANSISTOR ADDER

Figure 9, visualizes the layout of full adder in 65-nm technology in Cadence Virtuoso Layout Editor. It is evident that interconnect density is lower than that of 8 transistor full adder [13] leading to lower power delay product [41]. The layout is symmetric with the view of having big sized PMOS on two n-wells and n-wells on p-type substrate.

## 6 SIMULATION AND PERFORMANCE ANALYSIS OF SIX TRANSISTOR ADDER

The proposed 6T adder is simulated in Cadence environment at 65-nm, 90-nm and 130-nm technologies. The waveform for the simulated schematic of adder in Figure 8 is shown in Figure 10. The output waveform is given for all the three input combinational logic as it responds differently for different input patterns. The post layout simulation of adder is performed using the extracted XOR gate. The circuits are simulated at 50 MHz with rise and fall times of 50 ps.

There is voltage degradation in the waveform for the sum output as a result of cascaded XOR gates. Consequently, relevant reverse bias value is used for proper operation of adder. Also, level restorers can be used at different points of the circuit to reduce the effect of voltage degradation and noise [42]. Typically, the width of the transistors used for implementing the actual circuits are 5X to 7X times the minimum width and with such widths the difference in voltage levels between logic high

The performance analysis is done by comparative study of previous adder architectures in literature surveys. The study of different adders in terms of power, delay and PDP has been shown in Table III exploring 28T, 20T, 16T, 14T, 10T, 8T available in literature and also 6T adders. The results indicate that the power delay product of 6T full adder is much less than the other adders available in literature. A considerable variation of average power and delay of proposed 2T XOR gate and 6T from Table I and Table III is on account of switching activity (functional transitions), loading, glitching effect and topology of the circuit which explains the outcome [23]. The switching activity extracts approximately twice the power for 6T adder as compared with 2T XOR gate which can be explained by equation (5) below:

$$P = f. C. V_{DD}^{2}.\alpha_{0\to 1}$$
 (5)

Where,

f: Frequency of operation

and logic low are as high as 80 mV.

C: Capacitance of circuit

 $\alpha_{0\rightarrow 1}$ : Switching activity

 $V_{DD}$ : Supply voltage

The functional transition also give account of glitches which adds almost 15% - 20% of global power. The delay of CMOS logic circuit is 50% rail to rail input to output propagation delay of the gate and measured as depicted in equations (6) and (7):

$$t_{p} = \int_{V1}^{V2} C_{L}(V) / I(V) dV$$

$$t_{p} = 0.52. C_{L}(R_{p} + R_{n}) / 2$$
(6)
(7)

$$t_p = 0.52. C_L(R_p + R_n)/2$$
 (7)

Where,

t<sub>p</sub>: Propagation delay

C<sub>L</sub>: Load capacitance of circuit

I: Current consumption

R<sub>p</sub>, R<sub>n</sub>: Resistance of transistors

by virtue of reverse biasing.

In view of greater power consumption of 6T adder, current is more and thus according to equation (6), delay is less. The resistance also plays an important role for lesser delay as illustrated in equation (7). Thus, topology of the circuit explains the power and delay inconsistencies of the circuits. The gate leakage power limits to femto-watt (fW) range for 6T adder and sub-threshold leakage is reduced

A major drive for coming up with the latest researches is to reduce the chip area [1]. Table IV shows comparative study of area for different adders in three distinct technologies. Based on Table IV, one can easily recognize that the proposed adder with 6T has the smallest chip area with the inclusion of bias circuit area. This novel adder with minimum area allows to implement more applications per area thus increasing the VLSI integration and/or reducing the die area and power dissipation.

The 6T adder is found to behave correctly for all the five process corners namely typical, slow-slow, fast-fast, slow-fast, fast-slow with bias voltage variation of (+/-) 20 mV and temperature variation from -10°C to 40°C in 65-nm, 90-nm and 130-nm technologies. The Monte-Carlo analysis on the circuits is analysed by randomized simulation on circuit components. The dominating factors of MOSFETs i.e. threshold voltage and (W/L) ratios are randomly varied for different values to conclude the analysis. The circuit behave correctly with standard deviation of (+/-) 10 % variation of design parameters with certain tolerance limits. Additionally, PVT analysis gives an overview of the circuit performance at the statistically broader level.

## 7 APPLICATION: 8 BIT X 8 BIT MULTIPLIER

An 8 bit x 8 bit multiplier has been implemented using proposed 6T adder. The result of multiplication is obtained by multiplying two 8 bit numbers in a traditional array architecture as shown in Figure 11 to get the desired 16 bits output. The individual blocks in Figure 11 represents full adder with and gates.

The performance of the proposed 8 bit x 8 bit multiplier has been analysed and compared with 8 bit x 8 bit multipliers designed with adder design styles available in literature [13, 16, 17, 19, 21, 23]. A comparison has been made with respect to power, delay and power delay product (PDP) in 65-nm, 90-nm and 130-nm technologies in Cadence Spectre in Table V. The simulations are carried out at 50 MHz frequency with 50 ps rise and fall times. The results indicate that PDP of multiplier employing six transistor adder is the lowest in all the three technologies.

A comparative study on the silicon area for 8 bit x 8 bit multiplier employing variable transistor count adders is shown in Table VI and the layout of 8 bit x 8 bit multiplier using proposed 6T adder in 65-nm technology is shown in Figure 12. The Table clearly depicts that the multiplier with 6T adder implementation has the least area, thus accounting for more applications on a chip. Even addition of bias circuit with individual XOR gate gives minimum area for the circuit based on Table VI. The process corner analysis has also been performed for all process corners with (+/-) 10% variations in bias voltage and temperature.

The adders and multipliers are the fundamental components in the design of MAC (Multiply and Accumulate) unit as shown in Figure 13 [29] which is widely used in microprocessors and digital signal processors for computationally intensive applications. The main aim of DSP processor design is to enhance the speed and throughput of MAC unit to achieve high performance. There is also need for limited power consumption because of increased portable electronic products and thus low power MACs are gaining importance. The simulation results of adders in Section 6 and results of multiplier in Section 7 clearly indicates the improvement in overall performance of the proposed designs in terms of power, PDP and area. Hence, the proposed architecture is useful for the implementation of

Multiply-Accumulate (MAC) unit for digital signal processors of high speed and low power, accounting for minimal area on the chip. An 8 bit Multiply and Accumulate (MAC) unit has been simulated in 65-nm technology accounting 8 bit x 8 bit multipliers, 6T adders and registers. A true single phased clocked register (TSPCR) as depicted in Figure 14 is used as an accumulator/register unit for implementation of MAC design [43]. A delay of 3.977-ns and power dissipation of 1.107-mW is realized from the model.

## **8** Conclusion

The paper proposed the design of a high performance 8 bit x 8 bit multiplier based on the design of a novel 2T XOR gate which is the XOR gate with smallest transistor count designed so far and compared advantageously from a power delay product (PDP) perspective. The XOR gate implementation compares positively with its peer design in terms of power-delay product and noise margins. The power-delay product is found to be least with noise margin comparable to other designs of XOR gates available in literature. The six transistor full adder designed with XOR gates has smallest transistor count and minimum power delay product. Thus, it results into better performance compared with adders in the literature. The design simulations of multiplier as well as 6T adder and 2T XOR gate works well up to 2 GHz frequency. We also presented in this paper a layout for our implementation and simulation results. An application of the proposed work has also been depicted through 8 bit x 8 bit multiplier with the further implementation of 8 bit Multiply- Accumulate (MAC) unit of digital signal processors in 65-nm technology. Additional research work could be spend on minimising the power and obtaining better voltage swings for adder. A major drive for further research is the promising substitute to CMOS by FINFET technology which are double gate devices. An approach for designing proposed XOR gates is interpreted through independent-gate mode FINFETS as illustrated through literature survey. This eventually will lead to continued technology scaling below 65-nm by overcoming fundamental material and process technology limits in efficient way.

## 9 ACKNOWLEDGEMENT

I would like to express the deepest appreciation to my department CVEST (Centre for VLSI and Embedded Systems Technology) for giving me the opportunity to carry out my research work in the desired area of interest under the supervision of Dr. Shubhajit Roy Chowdhury. He is someone with gracious attitude and the substance of genius. He continually and convincingly conveyed a spirit of adventure in regard to research. Without his extraordinary support, this research work would not have been possible.

I want to extend my gratitude to Ministry of Human Resource Development (MHRD) of India for providing tool support to carry out the simulations in the project.

I thank International Institute of Information and Technology, Hyderabad for enrolling me as MS student and giving me the precious opportunity to be a part of new innovations and researches around the globe.

## REFERENCES

- [1] Gordon E. Moore, "Cramming more components onto integrated circuits", *Electronics*, Volume 38, Number 8, April 19, **1965**.
- [2] M. Hosseinzadeh, S.J. Jassbi, and Keivan Navi, "A Novel Multiple Valued Logic OHRNS Modulo rn Adder Circuit", *International Journal of Electronics, Circuits and Systems*, Vol. 1, No. 4, fall 2007, pp. 245-249.
- [3] Y. Leblebici, S.M. Kang, "CMOS Digital Digital Integrated Circuits", Singapore: Mc Graw Hill, 2nd edition, 1999, Ch. 7.
- [4] D. Radhakrishnan, "Low-voltage low-power CMOS full adder," in *Proc.IEEE Circuits Devices Syst.*, vol. 148, Feb. **2001**, pp. 19-24.
- [5] J. Wang, S. Fang, and W. Feng, "New efficient designs for XOR and XNOR functions on the transistor level," *IEEE J. Solid-State Circuits*, vol. 29, no. 7, Jul. **1994**, pp. 780–786.
- [6] H. T. Bui, A. K. Al-Sheraidah, and Y.Wang, "New 4-transistor XOR and XNOR designs," in *Proc. 2nd IEEE Asia Pacific Conf. ASICs*, **2000**, pp.25–28.
- [7] H.T. Bui, Y. Wang, Y. Jiang, "Design and analysis of 10-transistor full adders using novel XOR– XNOR gates," in *Proc. 5th Int. Conf. Signal Process.*, vol. 1, Aug. 21–25, **2000**, pp. 619–622.
- [8] H. T. Bui, Y. Wang, and Y. Jiang, "Design and analysis of low-power 10-transistor full adders using XOR-XNOR gates," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process*, vol. 49, no.1, Jan. **2002**, pp. 25–30.
- [9] A. M. Shams, T. K. Darwish, and M. A. Bayoumi, "Performance analysis of low-power 1-bit CMOS full adder cells," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 10, no. 1, Feb. **2002**, pp. 20–29.
- [10] K.-H. Cheng and C.-S. Huang, "The novel efficient design of XOR/XNOR function for adder applications," in *Proc. IEEE Int. Conf. Elect, Circuits Syst.*, vol. 1, Sep. 5–8, **1999**, pp. 29–32.

- [11] H. Lee and G. E. Sobelman, "New low-voltage circuits for XOR and XNOR," in *Proc. IEEE Southeastcon*, Apr. 12–14, **1997**, pp. 225–229.
- [12] M. Vesterbacka, "A 14-transistor CMOS full adder with full voltage swing nodes," in *Proc. IEEE Worksh. Signal Process. Syst.*, Oct. 20–22, **1999**, pp. 713–722.
- [13] Shubhajit Roy Chowdhury, Aritra Banerjee, Aniruddha roy, and Hiranmay Saha, "A High Speed 8 Transistor Full Adder Design using Novel 3 Transistor XOR Gates", *International Journal of Electronics, Circuits and Systems*, WASET Fall, **2008**.
- [14] Manoj Kumar, Sandeep K. Arya, Sujata Pandey, "Single bit full adder design using 8 transistors with novel 3 transistors XNOR gate," International Journal of VLSI Design & Communication Systems, Vol. 2, pp. 47-59, Dec. **2011**.
- [15] Ahmed M. Shams and Magdy A, "A structured approach for designing low power adders," Conference Record of the Thirty-First Asilomar Conference on Signals, Systems & Computers, vol.1, pp.757-761, Nov. **1997**.
- [16] R. Zimmermann, and W. Fichtner, "Low-power logic styles: CMOS versus pass-transistor logic," IEEE J. Solid State Circuits, vol. 32, no. 7, pp. 1079-1090, Jul. 1997.
- [17] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, a System Perspective, AddisonWesley, 1993.
- [18] N. Zhuang and H. Wu, "A new design of the CMOS full adder," IEEE J. Solid-State Circuits, vol.27, no. 5, pp. 840–844, May **1992**.
- [19] A. M. Shams and M. Bayoumi, "A novel high-performance CMOS1-bit full adder cell," IEEE Transaction on Circuits Systems II, Analog Digital Signal Process, vol. 47, no. 5, pp. 478–481, May 2000.
- [20] Yingtao Jiang Al-Sheraidah, A. Yuke Wang Sha, E. and Jin-Gyun Chung, "A novel multiplexer based low-power full adder," IEEE Transactions on Circuits and Systems: Express Briefs, vol. 51, no.7, pp.345-348, Jul. **2004**.

- [21] R. Shalem, E. John, and L. K. John, "A novel low-power energy recovery full adder cell," in Proc. Great Lakes Symposium on VLSI, pp. 380–383, Feb. **1999**.
- [22] A. Fayed and M. Bayoumi, "A low-power 10-transistor full adder cell for embedded architectures," in *Proc. IEEE Symp. Circuits Syst.*, Sydney, Australia, May **2001**, pp. 226–229.
- [23] J.F. Lin, Y.T.Hwang, M.H. Sheu, C.C. Ho, "A novel high speed and energy efficient 10 transistor full adder design", *IEEE Trans. Circuits Syst. I, Regular papers*, Vol. 54, No.5, May **2007**, pp. 1050-1059.
- [24] S. Goel. A. Kumar, M. A. Bayoumi, "Design of robust, energy –efficient full adders for deep sub-micrometre design using hybrid-CMOS logic style," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.14, no.12, pp.1309-1321, Dec. **2006**.
- [25] Zhang, M., J. Gu and C.H. Chang, "A novel hybrid pass logic with static CMOS output drive full adder cell," IEEE Int. Symposium on Circuits Systems, vol. 5, pp. 317-320, May **2003**.
- [26] Ma GK, Taylor FJ (**1990**). Multiplier policies for digital signal processing. IEEE ASSP., 7(1): 6-20.
- [27] S.-N. Tang, J.-W. Tsai, and T.-Y. Chang, "A 2.4-GS/s FFT processor for OFDM-based WPAN applications," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 6, pp. 451–455, Jun. **2010**.
- [28] V. Gowrishankar, D. Manoranjitham and P. Jagadeesh, "Efficient FIR filter design using modified carry select adder & Wallace tree multiplier", International Journal of Science, Engineering and Technology Research, Vol. 2, pp. 703-711, March 2013.
- [29] Avisek Sen, Partha Mitra, Debarshi Datta, "Low Power MAC Unit for DSP Processor", International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013.
- [30] Abdelgawad, A., "Low power multiply accumulate unit (MAC) for future Wireless Sensor Networks," Sensors Applications Symposium (SAS), 2013 IEEE, vol., no., pp.129, 132, 19-21 Feb. **2013**.

- [31] P. Balasubramanian, R.T. Naayagi, "Critical Path Delay and Net Delay Reduced Tree Structure for Combinational Logic Circuits", *International Journal of Electronics, Circuits and Systems*, Vol. 1, No.1, **2007**, pp. 19-29.
- [32] Y. Tsividis, *Mixed Analog-Digital VLSI Devices and Technology*, Singapore: McGraw Hill, 1st edition, **1996.**
- [33] Aminul Islam, A. Imran, and Mohd. Hasan, "Variability analysis and FinFET-based design of XOR and XNOR circuit," in International Conference on Computer and Communication Technology, **2011**, pp. 239–245.
- [34] M. Agostinelli, M. Alioto, D. Esseni, and L. Selmi, "Leakage-delay tradeoff in FinFET logic circuits: A comparative analysys with bulk technology," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Vol. 18, No. 2, February 2010.
- [35] Neha Yadav, Ashish Kumar Singhal, "Improvement of Design Issues in FINFET based Design Techniques for XOR and XNOR Circuits in 45-nm Technology" Journal of Electronic Design Technology, ISSN: 2229-6980, May **2014**.
- [36] S. O'uchi, K. Sakamoto, K. Endo, M. Masahara, T. Matsukawa, Y.X. Liu, M. Hioki, T. Nakagawa, T. Sekigawa, H. Koike and E. Suzuki, "Variable-Threshold-Voltage FinFETs with a Control-Voltage Range within the Logic-Level Swing Using Asymmetric Work-Function Double Gates," in VLSI Technology, Systems and Applications, 2008.
- [37] M. C. Wang, "Independet-Gate FinFET Circuit Design Methodology", International Journal of Computer Science, 37:1. Feb. **2010**.
- [38] Amrita Oza, Poonam Kadam, "Techniques for Sub-threshold Leakage Reduction in Low Power CMOS Circuit Designs", International Journal of Computer Applications (0975 8887), Volume 97– No.15, July **2014**.
- [39] A. Muttreja, N. Agarwal, and N.K. Jha, "CMOS logic design with independent gate FinFETs," in Proc. Int. Conf. Computer Design, Oct. **2007**, pp. 560–567.
- [40] M. Morris Mano, *Digital Design*, Prentice Hall of India, 2nd Edition, **2000**.

- [41] A. Yurdakul, "Multiplierless implementation of 2D FIR filters", *Integration: The VLSI Journal*, Vol. 38, No. 4, **2005**, pp. 597-613.
- [42] S. Goel, M.A. Elgamel, M.A. Bayoumi, Y. Hanafy, "Design Methodologies for high performance noise tolerant XOR-XNOR circuits", *IEEE Transactions on Circuits and Systems I: Regular Papers*, Vol. 53, No. 4, **2006**, pp. 867-878.
- [43] Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolic, "Digital Integrated Circuits", A Design Perspective, Second Edition, Prentice Hall Electronics and VLSI Series, **2012**.

## FIGURES AND TABLES

## a) Inverter-based XOR structure [5]



## b) Improved version of inverter-based XOR structure[5]



## c) Powerless XOR [6]



## d) Improved powerless XOR [7, 8]



## e) Three transistor XOR gate [13]



Figure 1. Prior designs of XOR gates: a) Inverter-based 4T XOR gate. b) Improved version of inverter-based XOR gate. c) Powerless 4T XOR gate. d) Improved powerless 4T XOR gates with improved power-delay product (PDP). e) 3T XOR gate



Figure 2. Design of 8 transistor full adder using 3 transistor XOR gates



Figure 3. Proposed 2 transistor XOR gate



Figure 4. Diode connected NMOS



Figure 5. Two transistor XOR gate using P-FINFET

## a) XOR gate at 320 mV



## b) XOR gate at 270 mV



Figure 6. Input and output waveforms of: a) XOR gate at reverse bias of 320 mV. b) XOR gate at reverse bias of 270 mV



Figure 7. Logic circuit of full adder



Figure 8. Proposed 6 transistor full adder



Figure 9. Layout view of 6 transistor full adder



Figure 10. Post layout simulation of 6 transistor adder in 65-nm technology



Figure 11. An 8 bit x 8 bit array multiplier



Figure 12. Layout design of an 8 bit x 8 bit multiplier





Figure 14. True single phased clocked register

TABLE I
SIMULATION RESULTS OF XOR GATES

| Types of XOR       | ypes of XOR Technology Avg. Av |        | Avg.   | PDP                    |
|--------------------|--------------------------------|--------|--------|------------------------|
| gate               | (nm)                           | power  | Delay  | $(10^{-18}  \text{J})$ |
|                    |                                | (µW)   | (ps)   |                        |
| 6T [4]             | 65                             | 2.1880 | 15.600 | 34.132                 |
| 4T (Fig(1a))[5]    | 65                             | 0.8660 | 8.125  | 7.036                  |
| 4T (Fig(1(b))[5]   | 65                             | 0.0650 | 13.250 | 0.861                  |
| 4T (Fig(1(c))[6]   | 65                             | 0.0340 | 14.625 | 0.497                  |
| 4T (Fig(1(d))[7,8] | 65                             | 0.0470 | 9.125  | 0.428                  |
| 3T (Fig(1(e))[13]  | 65                             | 0.0320 | 4.253  | 0.136                  |
| 2T                 | 65                             | 0.0035 | 9.375  | 0.033                  |
| 6T [4]             | 90                             | 6.6300 | 18.625 | 123.483                |
| 4T (Fig(1a))[5]    | 90                             | 0.1790 | 11.375 | 2.036                  |
| 4T (Fig(1(b))[5]   | 90                             | 0.1430 | 11.500 | 1.644                  |
| 4T (Fig(1(c))[6]   | 90                             | 0.0960 | 12.125 | 1.164                  |
| 4T (Fig(1(d))[7,8] | 90                             | 0.1020 | 11.125 | 1.134                  |
| 3T (Fig(1(e))[13]  | 90                             | 0.0410 | 9.750  | 0.399                  |
| 2T                 | 90                             | 0.0039 | 19.375 | 0.075                  |
| 6T [4]             | 130                            | 19.884 | 35.750 | 710.853                |
| 4T (Fig(1a))[5]    | 130                            | 4.4520 | 14.250 | 63.441                 |
| 4T (Fig(1(b))[5]   | 130                            | 0.4896 | 26.750 | 13.096                 |
| 4T (Fig(1(c))[6]   | 130                            | 0.2788 | 27.194 | 7.582                  |
| 4T (fig(1(d))[7,8] | 130                            | 0.3147 | 23.740 | 7.471                  |
| 3T (Fig(1(e))[13]  | 130                            | 0.1179 | 18.750 | 2.210                  |
| 2T                 | 130                            | 0.0090 | 35.370 | 0.318                  |

TABLE II
NOISE MARGIN OF XOR GATES

| Types of XOR       | Technology | Vон   | Vol   | Vih   | V <sub>IL</sub> | NM <sub>H</sub> | $NM_L$       |
|--------------------|------------|-------|-------|-------|-----------------|-----------------|--------------|
| gate               | (nm)       | (V)   | (V)   | (V)   | ( <b>V</b> )    | ( <b>V</b> )    | ( <b>V</b> ) |
| 6T [4]             | 65         | 1.000 | 0.000 | 0.690 | 0.318           | 0.310           | 0.318        |
| 4T (Fig(1(a))[5]   | 65         | 1.000 | 0.000 | 0.667 | 0.357           | 0.333           | 0.357        |
| 4T (Fig(1(b))[5]   | 65         | 1.000 | 0.000 | 0.690 | 0.460           | 0.310           | 0.460        |
| 4T (Fig(1(c))[6]   | 65         | 1.000 | 0.000 | 0.600 | 0.420           | 0.400           | 0.420        |
| 4T (Fig(1(d))[7,8] | 65         | 1.000 | 0.000 | 0.630 | 0.450           | 0.370           | 0.450        |
| 3T (Fig(1(e))[13]  | 65         | 1.000 | 0.000 | 0.520 | 0.240           | 0.480           | 0.240        |
| 2T                 | 65         | 1.000 | 0.000 | 0.650 | 0.280           | 0.350           | 0.280        |
| 6T [4]             | 90         | 1.000 | 0.000 | 0.680 | 0.480           | 0.320           | 0.480        |
| 4T (Fig(1(a))[5]   | 90         | 1.000 | 0.000 | 0.699 | 0.372           | 0.301           | 0.372        |
| 4T (Fig(1(b))[5]   | 90         | 1.000 | 0.000 | 0.640 | 0.480           | 0.360           | 0.480        |
| 4T (Fig(1(c))[6]   | 90         | 1.000 | 0.000 | 0.600 | 0.360           | 0.400           | 0.360        |
| 4T (Fig(1(d))[7,8] | 90         | 1.000 | 0.000 | 0.600 | 0.400           | 0.400           | 0.400        |
| 3T (Fig(1(e))[13]  | 90         | 1.000 | 0.000 | 0.680 | 0.440           | 0.320           | 0.440        |
| 2T                 | 90         | 1.000 | 0.000 | 0.640 | 0.280           | 0.360           | 0.280        |
| 6T [4]             | 130        | 1.200 | 0.000 | 0.960 | 0.320           | 0.320           | 0.240        |
| 4T (Fig(1(a))[5]   | 130        | 1.200 | 0.000 | 0.920 | 0.360           | 0.301           | 0.280        |
| 4T (Fig(1(b))[5]   | 130        | 1.200 | 0.000 | 0.840 | 0.360           | 0.360           | 0.360        |
| 4T (Fig(1(c))[6]   | 130        | 1.200 | 0.000 | 0.960 | 0.264           | 0.400           | 0.240        |
| 4T (Fig(1(d))[7,8] | 130        | 1.200 | 0.000 | 0.984 | 0.312           | 0.400           | 0.216        |
| 3T (Fig(1(e))[13]  | 130        | 1.200 | 0.000 | 0.910 | 0.320           | 0.320           | 0.290        |
| 2T                 | 130        | 1.200 | 0.000 | 0.970 | 0.280           | 0.360           | 0.230        |

TABLE III
PERFORMANCE ANALYSIS OF ADDERS

| Types    | Technology | Avg.      | Avg.   | PDP           |
|----------|------------|-----------|--------|---------------|
| of adder | (nm)       | power     | delay  | $(10^{-18}J)$ |
|          |            | $(\mu W)$ | (ps)   |               |
| 28T [16] | 65         | 0.481     | 11.875 | 5.711         |
| 20T [17] | 65         | 0.317     | 7.812  | 2.476         |
| 16T [21] | 65         | 0.393     | 4.625  | 1.817         |
| 14T [19] | 65         | 0.511     | 3.187  | 1.628         |
| 10T [23] | 65         | 0.129     | 11.625 | 1.499         |
| 8T [13]  | 65         | 0.127     | 8.625  | 1.095         |
| 6T       | 65         | 0.439     | 1.935  | 0.849         |
| 28T [16] | 90         | 0.806     | 21.750 | 17.530        |
| 20T [17] | 90         | 0.281     | 9.812  | 2.757         |
| 16T [21] | 90         | 0.318     | 8.500  | 2.703         |
| 14T [19] | 90         | 0.610     | 4.320  | 2.635         |
| 10T [23] | 90         | 0.665     | 3.750  | 2.493         |
| 8T [13]  | 90         | 0.232     | 9.500  | 2.204         |
| 6T       | 90         | 0.685     | 2.625  | 1.798         |
| 28T [16] | 130        | 7.107     | 20.680 | 146.972       |
| 20T [17] | 130        | 3.768     | 14.750 | 55.578        |
| 16T [21] | 130        | 5.547     | 8.500  | 47.149        |
| 14T [19] | 130        | 6.572     | 3.375  | 22.180        |
| 10T [23] | 130        | 1.510     | 14.218 | 21.469        |
| 8T [13]  | 130        | 1.590     | 10.437 | 16.594        |
| 6T       | 130        | 3.962     | 3.875  | 15.352        |

TABLE IV
COMPARATIVE STUDY OF AREA OF DIFFFERENT ADDERS

| Types of | Technology | Area        |
|----------|------------|-------------|
| adder    | (nm)       | $(\mu m^2)$ |
| 28T [16] | 65         | 114.519     |
| 20T [17] | 65         | 83.723      |
| 16T [21] | 65         | 78.723      |
| 14T [19] | 65         | 60.083      |
| 10T [23] | 65         | 44.208      |
| 8T [13]  | 65         | 39.214      |
| 6T       | 65         | 14.517      |
| 28T [16] | 90         | 259.364     |
| 20T [17] | 90         | 155.703     |
| 16T [21] | 90         | 146.577     |
| 14T [19] | 90         | 116.741     |
| 10T [23] | 90         | 81.624      |
| 8T [13]  | 90         | 75.247      |
| 6T       | 90         | 37.953      |
| 28T [16] | 130        | 290.565     |
| 20T [17] | 130        | 195.048     |
| 16T [21] | 130        | 179626      |
| 14T [19] | 130        | 127.110     |
| 10T [23] | 130        | 93.427      |
| 8T [13]  | 130        | 85.140      |
| 6T       | 130        | 45.283      |

| Types of adder | Technology | Avg.          | Avg.          | PDP           |
|----------------|------------|---------------|---------------|---------------|
| adder          | (nm)       | power<br>(μW) | delay<br>(ps) | $(10^{-18}J)$ |
| 28T [16]       | 65         | 42.600        | 198.557       | 8458.528      |
| 20T [17]       | 65         | 36.500        | 180.468       | 6587.082      |
| 16T [21]       | 65         | 27.460        | 140.083       | 3846.679      |
| 14T [19]       | 65         | 24.570        | 136.774       | 3360.537      |
| 10T [23]       | 65         | 23.630        | 122.866       | 2903.323      |
| 8T [13]        | 65         | 20.140        | 118.018       | 2376.882      |
| 6T             | 65         | 15.220        | 121.816       | 1854.039      |
| 28T [16]       | 90         | 82.120        | 339.278       | 27861.509     |
| 20T [17]       | 90         | 80.470        | 330.829       | 26621.809     |
| 16T [21]       | 90         | 50.290        | 315.788       | 15880.978     |
| 14T [19]       | 90         | 48.460        | 285.399       | 13830.435     |
| 10T [23]       | 90         | 45.900        | 220.926       | 10140.503     |
| 8T [13]        | 90         | 32.168        | 207.338       | 6669.648      |
| 6T             | 90         | 29.060        | 216.472       | 6290.676      |
| 28T [16]       | 130        | 228.132       | 392.806       | 89611.618     |
| 20T [17]       | 130        | 223.800       | 364.718       | 81623.888     |
| 16T [21]       | 130        | 201.840       | 306.412       | 61846.198     |
| 14T [19]       | 130        | 148.440       | 303.844       | 45102.603     |
| 10T [23]       | 130        | 144.960       | 301.091       | 43646.151     |
| 8T [13]        | 130        | 105.336       | 258.312       | 27209.552     |
| 6T             | 130        | 96.420        | 264.900       | 25541.658     |

TABLE VI
COMPARATIVE STUDY OF AREA OF 8 BIT x 8 BIT MULTIPLIER USING
DIFFERENT ADDERS

| Types of adder | Technology | Area        |
|----------------|------------|-------------|
|                | (nm)       | $(\mu m^2)$ |
| 28T [16]       | 65         | 11740.087   |
| 20T [17]       | 65         | 8411.571    |
| 16T [21]       | 65         | 8264.371    |
| 14T [19]       | 65         | 6466.196    |
| 10T [23]       | 65         | 4276.463    |
| 8T [13]        | 65         | 3735.890    |
| 6T             | 65         | 1581.069    |
| 28T [16]       | 90         | 26668.557   |
| 20T [17]       | 90         | 18246.264   |
| 16T [21]       | 90         | 18107.889   |
| 14T [19]       | 90         | 13507.574   |
| 10T [23]       | 90         | 12504.680   |
| 8T [13]        | 90         | 11535.308   |
| 6T             | 90         | 6253.650    |
| 28T [16]       | 130        | 28488.613   |
| 20T [17]       | 130        | 21300.440   |
| 16T [21]       | 130        | 20619.317   |
| 14T [19]       | 130        | 18231.370   |
| 10T [23]       | 130        | 15272.972   |
| 8T [13]        | 130        | 12391.646   |
| 6T             | 130        | 7719.373    |

## **BIOGRAPHIES**

**HIMANI UPADHYAY**, B.TECH in Electronics and Communications and currently pursuing MS in Centre for VLSI and Embedded System Technology (CVEST), IIIT Hyderabad.

**Dr. SHUBHAJIT ROY CHOWDHURY**, (Ph.D., FSAB, MIEEE) is currently an Assistant Professor at the Centre for VLSI and ES Technology, IIIT Hyderabad. He completed his Ph. D from Jadavpur University in 2010. He received university gold medals for his B.E. and M.E. respectively, Altera Award, four best paper awards, IEI Young Engineers Award. He has published over sixty papers in international journals and conferences. His research interests span around the development of Biomedical Embedded Systems.