Show HN: Llama 3.1 70B Runs on Single RTX 3090 via NVMe

In an impressive feat of optimization, a developer has demonstrated running the massive Llama 3.1 70 billion parameter language model on a single consumer-grade RTX 3090 GPU. This achievement, which would typically require multiple high-end GPUs or cloud instances, was made possible by a technique that streams model weights directly from fast NVMe solid-state storage, bypassing the CPU's RAM entirely. The method cleverly works around the GPU's limited VRAM (24GB on the RTX 3090) by loading part

Leer original

Noticias relacionadas

La memoria ahora representa dos tercios de los costos de componentes de los chips de IA2026-05-25 · Hacker News
El software de código abierto ayuda a los robots a pensar2026-05-24 · IEEE Spectrum AI
Granite Embedding Multilingual R2 lanzado como código abierto2026-05-18 · Hugging Face Blog
IBM Granite Embedding Multilingual R2: Modelo de Código Abierto2026-05-16 · Hugging Face Blog
Anthropic restablece el uso de OpenClaw con condiciones2026-05-14 · VentureBeat