TOPIC · 3 entries · 3 thinkers

Hardware Architecture

Thinkers posting on this topic

Evan Williams1 Dwarkesh Patel1 AI Engineer1

No compiled wiki article for this topic yet. Raw entries below are the source material — a wiki article can be generated on demand from /admin/triggers.

All entries on this topic (3)

paper · Dwarkesh Patel · 30d ago

DigitsOnTurbo Harnesses SIMD via Data-Parallel Restructuring for Large-Number Arithmetic Acceleration

DigitsOnTurbo (DoT) overcomes SIMD adoption barriers in large-number arithmetic by restructuring computations into independent data-parallel operations, bypassing dependencies in conventional algorithms. It delivers 1.85x speedups for addition/subtraction and 2.3x for multiplication versus prior SIM…

simd-optimization large-number-arithmetic parallel-computing cpu-acceleration cryptography-performance scientific-computing

paper · AI Engineer · 30d ago

TRACE Boosts CXL Bandwidth for LLM Inference via Channel-Major Bit-Plane Layout and KV Transforms

TRACE addresses CXL bandwidth bottlenecks in LLM inference by reorganizing tensors into channel-major, disaggregated bit-plane layouts and applying KV-specific transforms before lossless compression with commodity codecs. This enables 25.2% BF16 weight and 46.9% BF16 KV footprint reduction, with per…

cxl-memory lossless-compression llm-inference kv-cache memory-bandwidth precision-scaling

paper · Evan Williams · 43d ago

Open-Source PyTorch-to-SystemVerilog Compiler Rivals Vitis HLS for FPGA ML Acceleration

This toolchain compiles PyTorch ML models to synthesizable SystemVerilog via Allo, Calyx IR, and CIRCT under LLVM. It includes compiler passes for memory partitioning to enable parallelism in memory-intensive workloads. Experiments show it generates optimized FPGA hardware competitive with Vitis HLS…

pytorch-compiler calyx-ir ml-accelerators hardware-compilation fpga-design open-source-toolchain