## **EECS 312 Final Project Report** Team Name: Transistor Titans | Justin Park, Dhruv Dighrasker, Yan Cheng Poon #### Overview We were tasked with creating two versions of an 8-bit dual-mode ripple-carry adder, one that accounted for high speed and the other that accounted for low power consumption. To achieve both these objectives, we created two circuit schematics that each used a different logic family (dual-rail domino logic for high speed and PTL + TGL logic for low power consumption). ## Optimal Speed Design Description & Techniques For our high-speed design, we chose the dynamic logic family to optimize speed. In dynamic logic, the pull-up network (PUN) is replaced by a clocked pull-up transistor at the top, a clocked pull-down transistor at the bottom, and a load capacitance at the output. Both transistors are controlled by the "CK" signal, which oscillates between high and low. During the precharge phase (CK low), the output node is connected to Vdd and precharged. In the evaluation phase (CK high), the pull-down network is evaluated, discharging the capacitor if conditions are met. This approach reduces the number of slower PMOS devices and enables faster discharging through the pull-down network, resulting in a design with improved speed and reduced delay compared to static CMOS. To correctly perform inverting gates such as XOR, we decided to use the dual-rail design style from the domino logic family. Dual-rail, as the name suggests, generates two wires of output: output\_high and output\_low simultaneously, allowing us to perform inverting and non-inverting functions. The design choice of dual-rail in RCA is also heavily supported by many research including the research [1], which in Table 4 shows that RCA dual-rail XOR has a propagation delay of 0.041 ns, which is significantly faster than other implementations such as *Static XOR - 79.5ns* and *CPL XOR - 10.01ns*. This further supported our design choice of dual-rail logic for optimal speed implementation. To address charge leakage in our design, we introduced keeper transistors. During the evaluation phase, leakage currents through the pull-down network or parasitic capacitances can cause unintended voltage drops if the output node remains at logic high. To counter this, we added a weak PMOS transistor (keeper) between the output and dynamic nodes to maintain the precharged capacitor at a high state. By sizing and configuring the keeper transistor using formulas from research [2], we finalized our dual-rail Full Adder design with keeper transistors to ensure voltage stability during the precharge phase. Our finalized design approach should in theory be extremely delay-efficient due to the significant reduction of slow PMOS devices as well as the parallel generation of the actual bit and the inverted version of it, allowing us to use the inverted data bits without having to spend time inverting them. Optimal Power Consumption Design Description & Techniques Since fewer transistors lead to a more optimal power design, we essentially eliminated all logic families that have a transistor count ranging from 24 to 54, leaving Pass Transistor Logic (PTL) and Transmission Gate Logic (TGL) as both have relatively low transistor count and circuit area, which indicates that they are good options for optimal power designs. To fully optimize power, we have to consider the different types of power consumption in CMOS circuits, namely: - 1. *Switching Power*: power consumed in charging and discharging of circuit capacitances during transitioning. - 2. *Short-Circuit Power*: power consumed by the short circuit current from VDD to GND during transitioning. - 3. Static Power: power consumed by charge leakage at a stable state. While static power is negligible with a well-designed CMOS circuit, we can aim to reduce switching power and short-circuit power by reducing the number of inverters used and potential VDD to GND short-circuit connections. According to research, PTL outperforms TGL by a factor of ~1.5 in XOR implementation [5] (Table III), but the paper also shows that TGL performs better in certain other gates including OR gate. Hence, we decided to approach our design by using PTL for XOR/XNOR implementation between input A and B, and then using TGL for the logic implementation between the carry-in bit and the XOR/XNOR result from the previous step, effectively combining PTL and TGL for the most optimal power design. However, this FA design has a major flaw: It has weak driving capabilities as PTL and TGL both use inputs to drive outputs. Hence, we modified the logic to include two static CMOS inverters between stage 4 and stage 5 adders, acting as buffers to increase driving capabilities. Our finalized design has very minimal VDD connections and inverters, which in theory would result in an extremely power-efficient design. ## Analysis & Calculations #### **Static CMOS Mux** Input Configuration for Worst Case Delay: $\{A, B, S\} = \{1, 0, 0\}$ to $\{1, 0, 1\}$ Worst Case Delay of Mux: 48ps [See Appendix for Cadence Waveform Measurements (Fig 6)] #### **Master-Slave Register** Setup Time: 100 ps [See Appendix for Cadence Waveform Measurements (Fig 7)] CLK to Q Delay: 56 ps [See Appendix for Cadence Waveform Measurements (Fig 8)] ## TSPC-D Flip-Flop Setup Time: <u>0 ps</u> [See Appendix for Cadence Waveform Measurements (Fig 9)] #### **Dual Rail Logic 8-bit Adder + Mux Delay** Input Configuration for Worst Case Delay: $\{A, B\} = \{0xFF, 0x01\}$ Delay: 321 ps [See Appendix for Cadence Waveform Measurements (Fig 10)] #### **Low Power Logic 8-bit Adder** Input Configuration for Worst Case Delay: $\{A, B\} = \{0xFF, 0x01\}$ 8-bit Adder Delay: 497 ps [See Appendix for Cadence Waveform Measurements (Fig 11)] <u>Critical Path</u> [See Appendix for Cadence Waveform Simulations of Critical Path (Fig 3,4)] Input Configuration for Worst Case Delay: {A, B, CTRL, RESET} = {0xFF, 0x01, 0, 0} ## [Dual Rail] Overall Schematic Design Max Frequency Calculation #### **Theoretical Clock Period Needed** - = Evaluation Phase Critical Path Delay \* 2 - = (8-bit Adder Delay + Mux Delay + Setup Time)\*2 - =(321+0)\*2 - = 642 ps #### **Theoretical Max Clock Frequency** - $= 1 / (642*10^{-12})$ - $= 1.56 \, \text{GHz}$ #### [Low Power] Overall Schematic Design Max Frequency Calculation #### **Theoretical Clock Period Needed** - = CLK-to-Q Delay (A, B, CTRL in parallel) + Mux Delay (Adder/Accumulator) + 8-bit Adder Delay + Mux Delay (Reset Mux) + Setup Time (S Register) - = 56 + 48 + 497 + 48 + 100 - = 749 ps #### **Theoretical Max Frequency** $= 1 / (749*10^{-12})$ = 1.34 Ghz #### **Low Power Total Transistor Width** 1-Bit Logic: 7200 nm 8-Bit Logic: 8 \* (1-Bit Logic) = 57600 nm Registers: 4\*(720) = 2880 nmMuxes: 3\*(5400) = 16200 nm **Total Transistor Width** = $16200 + 2880 + 57600 = 76680 \text{ nm} = 76.680 \mu\text{m}$ #### **Dual Rail Total Transistor Width** 1-Bit Logic: 12480 nm 8-Bit Logic: 8 \* (1-Bit Logic) = 99840 nm Registers: 4\*(720) = 2880 nmMuxes: 3\*(5400) = 16200 nm **Total Transistor Width** = $16200 + 2880 + 99840 = 118920 = 118.920 \, \mu \text{m}$ ### Test Results & Discussion #### **Actual Clock Frequency:** Dual Rail Integration: <u>1.51 GHz</u> Low Power Integration: <u>1 GHz</u> [See Appendix for Cadence Waveform Simulations of Critical Path at Clock Freq. (Fig 3,4)] #### Power: Dual Rail Integration: 3.594e 12 W Low Power Integration: 1.899e12 W The dual rail adder performs much better than low power in speed, outperforming by ~0.5 GHz clock frequency, at the cost of nearly double power consumption. We were able to verify our design at 1.51 GHz for High Speed and 1 GHz for Low Power with our more comprehensive, extended testbench at Appendix Fig 5 but were unable to make them work under any lower frequencies [See Appendix for Cadence Test Results (Fig 1,2) and Extended Test Bench (Fig 5)]. Although our tests do not necessarily pass the theoretical max frequency, they came very close to the calculations; We do believe that there could be factors that lead to this, including imprecise values from analysis and continual changes to our design (for muxes and registers) and transistor sizings throughout the project. For future improvements, we would be looking into optimizing the aforementioned factors to achieve our theoretical max clock frequency or higher. # Appendix Fig 1. High Speed Full Adder Integration Results (Fully Verified at 1.51 GHz) Fig 2. Low-Power Full Adder Integration Results (Fully Verified at 1 GHz) Fig 3. High Speed Design Critical Path Waveform Simulation (Input Configuration for Worst Case Delay $\{A, B, CTRL, RESET\} = \{0xFF, 0x01, 0, 0\}$ Fig 3. Low Power Design Critical Path Waveform Simulation (Input Configuration for Worst Case Delay $\{A, B, CTRL, RESET\} = \{0xFF, 0x01, 0, 0\}$ Fig 5. Extended Testbench (Left: 1GHz for Low Power, Right: 1.51 GHz for High Speed) Fig 6. Static CMOS Mux Propagation Delay Fig 7. Set-up Time Violation for MSR (Master-Slave) Register Fig 8. MSR (Master-Slave) Register Clock to Q Delay Fig 9. Set-up Time Violation for TSPC Register Fig 10. Dual Rail 8-Bit Adder Critical Path + Mux Delay Fig 11. Low Power 8-Bit Adder Critical Path Delay ## References [1] G. Sasi, G. Athisha, S. Surya Prakash (2019). Performance Comparison for Ripple Carry Adder Using Various Logic Design. IJITEE. ISSN: 2278-3075, Volume-8 Issue-4S2.https://www.ijitee.org/wp-content/uploads/papers/v8i4s2/D1S0081028419.pdf [2] Giustolisi, Gianluca & Palumbo, Gaetano. (2022). Analysis and Comparison in the Energy-Delay Space of Nanometer CMOS One-Bit Full-Adders. IEEE Access. 10. 1-1. 10.1109/ACCESS.2022.3192016. [3] N. Siddaiah, J. S. B. B. Bhaskar, B. Vidya and V. N. Reddy, "Implementation and Performance Analysis of 2:1 Multiplexer using Different Logic Families at 130nm Technology," 2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, 2021, pp. 1122-1126, doi: 10.1109/ICAC3N53548.2021.9725706. [4] M.N, Saranya & Rao, Rathnamala. (2024). Design and Verification of an Asynchronous NoC Router Architecture for GALS Systems. Journal of Electronic Testing. 40. 1-14. 10.1007/s10836-024-06104-y. [5] S. Fairooz, P. Thanapal, P. Ganesan, M. S. Prakash Balaji and V. Elamaran, "Revisiting the Utility of Transmission Gate and Passtransistor Logic Styles in CMOS VLSI Design," 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 2021, pp. 276-280, doi: 10.1109/ICSPC51351.2021.9451645. **[6]** Jyh-Ming Wang, Sung-Chuan Fang and Wu-Shiung Feng, "New efficient designs for XOR and XNOR functions on the transistor level," in IEEE Journal of Solid-State Circuits, vol. 29, no. 7, pp. 780-786, July 1994, doi: 10.1109/4.303715. [7] A. M. Shams and M. A. Bayoumi, "A novel high-performance CMOS 1-bit full-adder cell," in IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, no. 5, pp. 478-481, May 2000, doi: 10.1109/82.842117. ## **Team Contribution Report** Yan Cheng Poon - Researched and implemented Low Power Schematic Design - Designed Static CMOS 2:1 Mux - Set up Cadence schematic for power measurements #### **Dhruv Dighrasker** - Created schematics for both versions of the 8-bit adder, muxes, and registers - Organized transistor sizing information for all devices and calculation of total transistor widths #### **Justin Park** - Designed and tested cadence schematic for dual rail adder - Created optimizations for muxes and registers to increase performance to 1.5 GHz All team members contributed equally to the testing and debugging of both schematics of full adder, as well as writing the final report.