<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Human Harness</title><description>The most technically credible, reproducible content for engineers running AI in production — from silicon to workflow orchestration.</description><link>https://humanharness.ai/</link><item><title>You don&apos;t need an H100: matching GPU workload to hardware</title><link>https://humanharness.ai/blog/gpu-workload-matching-case-study/</link><guid isPermaLink="true">https://humanharness.ai/blog/gpu-workload-matching-case-study/</guid><description>A real diffusion-TTS pipeline case study. Why memory bandwidth — not parameter count — decides your GPU, and how to burst to cloud GPUs for $0.40 a render.</description><pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate><category>silicon</category><category>gpu</category><category>diffusion</category><category>runpod</category><category>cost</category><category>case-study</category></item><item><title>vLLM in 2026: the complete production setup guide</title><link>https://humanharness.ai/blog/vllm-production-setup-2026/</link><guid isPermaLink="true">https://humanharness.ai/blog/vllm-production-setup-2026/</guid><description>Install, serve, benchmark and tune vLLM for production inference — with a fully reproducible config and real TTFT/throughput numbers on an RTX 4090.</description><pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate><category>silicon</category><category>vllm</category><category>serving</category><category>rtx-4090</category><category>benchmark</category><category>quantization</category></item></channel></rss>