

# Level 4 Autonomous Driving SoC, leveraging chiplet, advanced package and UCIe

Vinayak Agrawal, Francois Piednoel, Igor Elkanovich, Dwaipayan Sil, Mirza Jahan



# **ADAS**

Advanced Driving Assistance Systems



### Advanced Driving Assistance Systems From Level 2 to Level 4

Society of Automobile Engineers Defines various levels of ADAS automation

Many cars in showrooms today have Level2 ADAS systems

By 2030, about a tenth of the cars are estimated to have Level4 ADAS by some estimates' lowest estimates are about 2%

Compute requirements for Level4 ADAS systems are expected to be in over 2000TOPS range



LEVELS OF DRIVING AUTOMATION



Mercedes Level4 test vehicle Several Dozen sensors

intel

Intel Confidential Intel Foundry services



### HPC: Auto vs Datacenter – what is different?

- Regulated Industry:
  - Safety regulations NHTSA, NTSB and other agencies world over
  - Consumer protection "Lemon Law" in US, state laws
  - Tied at-the-hip with insurance industry
  - Reporting requirements
- Very high-tech consumer goods, with long use life
  - More software than a PC
  - Brands matter a lot
- "Ecosystem" R&D model:
  - Lots of re-use between rivals
- Relatively New Growing FAST



## Reliability



Power



Cost



# The New Moore's Law





# Gordon Moore Predicted "Day of Reckoning"

It may prove to be more economical to build large systems out of smaller functions, which are separately packaged and interconnected

<u>Cramming More Components onto Integrated Circuits</u>

April 19, 1965 issue of *Electronics* magazine (Vol 38, page114-117)



## Die Scaling slowing

- Scaling has significantly slowed over the last few years
  - SRAM bitcell sizes are not reducing any more
  - Analog Circuit sizes increase
  - Overall node-to-node gains are of the order of 1.3x to 1.5x (used to be more than 3x 15yrs ago)

While system requirements increase, Moore's law no longer helps keep chip sizes in check



TSMC's 5nm 0.021um2 SRAM Cell Using EUV and High Mobility Channel with Write Assist at ISSCC2020 - SemiWiki



## Large monolithic SoC

1. In photo-lithography, only a part of the wafer, called reticle is exposed at a time. Standard reticle size is 25mm\*30mm max



| Die Size<br>[mm2] | 100 | 200  | 400  |  |
|-------------------|-----|------|------|--|
| Mature<br>Yield   | REF | -13% | -32% |  |

- 2. Wafers always have a few defects
  - Yields of larger chips are worse for the same number of defects
- 3. Different nodes are optimal for different purposes
  - IO and analog are better off with older and more mature nodes
  - High performance compute typically requires state of the art logic node



# Multi-Chip ADAS Systems



## Multi-Chip PCB

- Make specialized chips
  - CPU, GPU, Accelerators, los
- Connect them on a PCB



#### Chips need to be connected to each other

#### Power:

- State of the art SerDes IP takes 5.5pJ/bit @100% bus utilization
- In other words, a bus with 512GByte/sec average bandwidth, with 20% utilization will consume 110Watts of power
  - A typical EV cosumes about 5kW total at 55mph

#### Practical Limits:

- A PCle5 Serdes will need about 700 balls (about 400sgmm of package underbody) to support 512GBytes/sec
- Better Chip-to-chip protocols with very-short range developed with 1/3<sup>rd</sup> the power; but at the expense of even more BGA balls

Too many chips interconnected – not scalable



# Multi-Chip-Package; Chiplets



## System In Package

- Multi-chip-module packages have been around
  - Camera, Smartphone systems
- Presently they standard in High Performance Computing



- Standard package with Organic Substrate
- Embedded Silicon Bridge based
- Full Interposer based (Silicon or organic)
- Organic Interposer with embedded silicon bridge based



Intel Sapphire Rapids



(b. Packaging Options: 2D and 2.5D)



### Die Size and Yield

- Disaggregation to improve yield and reduce die cost
- Disaggregation also has area overhead
  - D2D, Scribe/ER etc.
    - $A_{c1} + A_{c2} > A_{die}$
    - ~ 10% area increase
- Example: ~20% yield improvement by disaggregating 400mm^2 die into 2x 220 mm^2 die

| Die Size [mm2] | 100 | 400  | 220  |  |
|----------------|-----|------|------|--|
| Mature Yield   | REF | -32% | -15% |  |





# Glink: Connecting Chiplets

GUC proprietary die-to-die IO

- Single ended, source-synchronous DDR signaling
- No Equalizers needed; very simple static phase CDR shared across lanes
- Very low power (0.3pJ/bit); excellent eye margins in 5nm testchip;
- CRC based error handling

Special Feature:

Intel Foundry services

 Run-time eye measurement for earlydetection of failure





#### Medium Resistance Sensitivity Measured in GUC's Testchip



Fault Detection/Prevention





Measure Eye margin 5nm silicon. AVDD is Tx/Rx power supply, AVDD12 is PLL/clocking supply



## **Automotive Chiplet Interconnect**

#### Safety

- Goal:
  - Avoid Injury at all costs
  - Avoid Damage
- How?
  - Preventive Monitoring
    - Every few months
    - All the time

#### Serviceability

- Goal:
  - Brand Image
  - Reduced Financial Burden
- How?
  - Automatic Field Repairs

#### **Ecosystem and Costs**

- Goal:
  - Share and Re-use chiplets
  - Reduce certification costs
  - Reduce time to market
- How?
  - Standardize on an open-standard



# Universal Chiplet Interconnect Express

http://www.uciexpress.org

### **UCle Overview**



# Standard Specifications | Protocols Form Factors Management CXLTM/PCIe®

#### **INITIAL FOCUS**

**Physical Layers** 

Initial D2D I/O

- Physical Layer: Die-to-Die I/O with industry leading KPIs
- Protocol: CXL/PCIe for near-term volume attach
- Well-defined specification: ensure interoperability & evolution

#### **CURRENT PLANS**

| Process Node | IP List  | Supplier       | F/E View <sup>1</sup> | B/E View <sup>2</sup> | Silicon<br>Report |
|--------------|----------|----------------|-----------------------|-----------------------|-------------------|
| Intel 3      | UCIe PHY | Intel          | '23.Q2                | '24.Q1                | '24.Q3            |
| Intel 18A    | UCIe PHY | Eco-<br>system | '23.Q2                | '24.Q1                | '24.Q3            |

Auto grade qualification schedule TBD

#### **FUTURE GOALS**

- Additional protocols
- Advanced chiplet form-factors
- Chiplet management
- And much more!







Different flavors of packaging options supported to build an open ecosystem



## Jumpstarting UCle

Intel initiates open standard specification and development

Focus of UCle 1.0 Specification

foundry services

- Physical Layer: Die-to-Die I/O with industry-leading KPIs
- **Protocol:** CXL<sup>TM</sup>/PCle® for near term volume attach
  - SoC construction issues are addressed since CXL/PCle is a board-toboard interface
  - CXL/PCIe addresses common use cases
    - I/O attach with PCIe/CXL.io
    - Memory use cases: CXL.mem
    - Accelerator use cases: CXL.cache
- Well defined specification: ensure interoperability and future evolution









(Multiple Advanced Package Choices)

(Different flavors of packaging options supported to build an open ecosystem)



### **UCle Auto**

- Preventive Monitoring
  - Eye Margin and other measurements supported as part of training
  - Proprietary measurements also supported
  - Register map defined for reporting of per lane (and other) measured data
- Very low guaranteed FIT rates
  - "guaranteed" is key
  - Standard will not allow parity based error counters in test mode for BER test
  - Test can be run weekly or monthly in parked car
- Continuous BER monitoring
  - Software based per-lane test (in addition to CRC) in mission mode
  - Accurate computation of BER per lane

- Streaming Mode with CRC
  - To allow development of protocols other than PCIe/CXL on UCIe
- Field Repair
  - Repairability present in 1.0 already
  - Monitoring to turn in into "automatic field repair"



# Additions to UCle in our System

Intel Foundry services Intel Confidential intel



### UCle Auto – In System

- Link Monitor in mission mode
  - Analog circuit measurements without breaking link between two chiplets
- Cloud Based Diagnostics
  - Measured parameter data will be uploaded to cloud
  - Data analysis across a fleet of cars; across temperature, age, driving conditions, batch







## UCle-Bumpmap

- UCle in Advanced version defined a bumpmap that allows straight routes between chiplets, of about equal length
- Eases routing design, eliminated skew
- Rotation symmetric (e.g. a die can connect with itself)
  - Two chiplet with compliant bumpmap will be routable if IP locations can be aligned
  - However for multi-module usage, it may not always be possible









### Standard Form Factor Defined

- For this program and in future programs we plan to use a 6 module + 1 module form factor as shown
- Total peak bandwidth this can support is over 1TByte/sec (512GByte/sec in each direction) on main link
  - Additional module supported with it's own power supply and higher ECC for Functional Safety subsystems
  - Can connect to 4+1 or 2+1 configurations also
- This will not have rotational-symmetry so additional system requirements needed



Intel Confidential



# Putting it all Together



## Standard Chiplet Based Architecture

- A central die with mirrored (not rotated) 6+1 configuration and "satellite chiplets" with unrotated 6+1 configuration will be used as shown
  - EMIB based package will be used to connect these in a single package
- While exact dimensions are not shown here, total Silicon Area without HBM memory will be significantly larger than 1000sqmm
  - Central die will contain mostly RAM but also Network on Chip and some processing;
  - CPU will be multi-core CPU
  - GPU is a GPU/Tensor chiplet
  - Custom Chiplet is an independent SoC that will support lower power modes (e.g. parked car, or Level 2 driving in conditions where higher is not possible)
  - Sensor/IO complex has interfaces for cameras, radard, Lidars, Ethernet and PClexpress





### Level 4+ ADAS

- We anticipate adding more features as time go by
- Multiple Central chiplets can be connected within the same package using UCIe-S version of the PHY (designed to work with standard organic substrate)
- The choice of chiplet geometries will allow some mix-and-match capabilities post-manufacture
  - For example, an additional GPU can be used instead of one sensor die





# Thank You

Intel Foundry services Intel Confidential intel.