

# **Custom Memory Solutions** for AI Applications

**JAN 2024 US Engineering Center** Seongju Lee







### **Specialties of Our Solutions 1:** Competitive Price, Fast Time-to-Market





- Using cell area of existing DRAM without design modifications
- SKH IP integrated into XPU to operate the cell area
- DRAM & XPU are integrated in a package using advanced packaging technologies
- No modification on DRAM leads competitive price and fast time-to-market even though custom features

# **Specialties of Our Solutions 2: Power & Latency Reduction**





- Tightly-coupled architecture of XPU and memory without complex high-speed PHYs on both sides
- PHY-less architecture reduces over 30% of power<sup>\*\*</sup> and 10~20ns latency

## **Specialties of Our Solutions 3: XPU Size Reduction**





 High-speed complex PHY is located across the bump area

- In PHY-less architectures, the area between the bumps is almost empty except for simple CMOS TX/RX
- Memory controller w/ DFI2GIO can be located the empty area; overall XPU size can be reduced
- Two options are available;
  - ✓ option 1 SKH to provide DFI2GIO IP
  - ✓ option 2 SKH to provide memory controller IP including DFI2GIO

#### Custom Memory Solution 1. LPHBM for Edge Devices — XR, On-Device AI





- SK developed several ULPs, low power/high bandwidth memories, required large resources and development of dedicated custom ULP PHY
- Our solution, LPHBM, **HBM cell die w/o base die** is used as a media and **Interface IP(DFI2GIO**) is placed on XPU to operate it
- Power/latency can be significantly improved by eliminating the PHYs from both of XPU & DRAM
- Fast time to market by developing Interface IP only, rather than new DRAM media

# **Custom Memory Solution 1. LPHBM for Edge Devices — XR, On-Device AI**

**Standardized** 

**SoC Interface** 





LPHBM can be implemented in a variety of ways depending on the needs

Capacity & Bandwidth Expansion





## **Custom Memory Solution 2. 3D LPHBM for Edge Devices – XR, On-Device AI**





- 3D-structure can further increase bandwidth and decrease power
- Interface IP to operate HBM cell die integrated onto XPU
- TSV is not required on the XPU
- XPU communicates with the outside through the TSVs on DRAM
- Capacity and bandwidth can be expanded by stacking the DRAMs

|           | 1 Stack    | 2 Stack    |            | 3 Stack    |             |
|-----------|------------|------------|------------|------------|-------------|
|           | 4 Channels | 4 Channels | 8 Channels | 4 Channels | 12 Channels |
| Capacity  | 3GB        | 6GB        | 6GB        | 9GB        | 9GB         |
| Bandwidth | 256GB/s    | 256GB/s    | 512GB/s    | 256GB/s    | 768GB/s     |





## **Appendix: PHY-less High Capacity Cell Die Stack**







- SK is developing cost-effective 3DS media using wire-bonding; the media has both of cell area and PHY area
- The media can be used in our solutions; although the media has PHY, PHY-less operation is possible through cell die stack
- The cell die stack is no more DDR; new concept of media w/ DDR capacity & 1/3 Lower power than LPDDR



#### **Custom Memory Solution 3. PHY-less Post LPDDR**







- for small form-factor
- Bandwidth is lowered due to shared IO between the stack
- XPU heat & TSV cost must be considered



32GB Density, 25.6~38.4GB/s Bandwidth,

400~600Mbps



## **Custom Memory Solution 4. PHY-less Post DDR**





#### **Custom Memory Solution 5. PHY-less CXL Device**





- The PHY-less structure using the cell die stack can also be applied to a CXL device
- Long latency in conventional CXL device can be reduced by 20ns
- No high speed module-level validation is required

#### **Custom Memory Solution 6. LPHBM for Gaming GPU**





230% Higher Capacity

- LPHBM can be applied to GPU system instead of GDDR6
- Applying LPHBM on chiplet based GPU is relatively easy by modifying MCD only & reusing GCD
- 2-tier memory system replacing some media with cell die stack can expand the capacity largely

Tier 2 Memory: Cell Die Stack x2 (512Gb, 512 GIO, 25.6GB/s 1.1v)

Tier 1 Memory: **HBM3E** Core Die (24Gb, 1K GIO, 256GB/s, 1.1v)

#### **Custom Memory Solution 7. 3D LPHBM for AI Server GPU**





- 3D LPHBM can be applied to Al server GPU system instead of HBM
- PHY-less 3D structure without base die and silicon interposer can significantly reduce power, latency, and cost
- Capacity can be expanded by further deploying LPDDR PKG by taking advantage of the smaller GPU PKG size