Machine Prices for AI Training: A Complete GPU Server Buyer’s & Rental Guide

Building or renting a GPU cluster for deep learning is one of the most impactful financial decisions an AI team makes. This guide breaks down the real-world costs of multi-GPU servers — from dirt-cheap legacy hardware to cutting-edge HGX systems — and helps you decide whether to buy or rent.


1. Budget Tier: 8x NVIDIA Tesla P100 (16GB) — $2,500 – $3,500

A machine equipped with 8 NVIDIA Tesla P100 GPUs (16GB) typically costs between $2,500 and $3,500 for a pre-owned or refurbished deep learning server (e.g., the Supermicro 4028GR series).

What You Get for the Price

Component Typical Spec
Chassis Supermicro SYS-4028GR-TXRT or equivalent 4U rack server (Inspur, Dell)
GPUs 8x Tesla P100 16GB (SXM2 interface with NVLink interconnect)
CPUs 2x Intel Xeon E5-2600 v4 series
RAM 128 GB – 256 GB DDR4
PSUs 4x ~2000W redundant power supplies

Cost Breakdown

  • Individual P100 GPUs sell for around $75 – $150 each on the secondary market.
  • A pre-assembled 4U 8-GPU server averages about $3,450 from secondary market vendors — significantly cheaper than buying individual cards plus a chassis.
  • Barebones / CTO builds (chassis + motherboard + PSU + 8 GPUs, no CPU/RAM) can be found for as low as $1,899 from liquidators like Unix Surplus.

Where to Buy

Why So Cheap?

Enterprise AI hardware depreciates far faster than consumer GPUs. The P100 has three fatal flaws that drove its price to rock bottom:

  1. No Tensor Cores: The P100 uses the 2016 Pascal architecture. Starting with the V100, NVIDIA introduced Tensor Cores that deliver 10x+ hardware acceleration for deep learning (especially FP16 half-precision compute).
  2. Cannot efficiently run modern LLMs: Modern fine-tuning and inference frameworks (PyTorch, DeepSpeed) rely heavily on Tensor Cores. Running modern billion-parameter models (Llama 3, DeepSeek) on 8x P100 would be unacceptably slow, which is why enterprises dumped these machines in bulk.
  3. Terrible power efficiency: The SXM2 P100 draws up to 300W per card. With 8 GPUs, dual CPUs, and high-speed fans, total system power approaches 3000W. The electricity cost of keeping them running exceeds the cost of buying newer cards.

Bottom line: These are data center castoffs that have been running at full load for years. They are excellent for foundational deep learning education (CNNs, image classification) and HPC workloads, but glacially slow for modern LLM training.


2. Premium Tier: 8x NVIDIA H100 (80GB) — $270,000 – $400,000

At the opposite end, a top-tier AI server with 8 NVIDIA H100 GPUs costs $270,000 – $400,000 (approximately ¥1.95M – ¥2.9M). With the Blackwell (B200) architecture now shipping, H100 prices have softened slightly from their 2024 peak, but remain extremely expensive.

Two Main Configurations

HGX H100 8-GPU Server (SXM5 — Flagship Performance)

Spec Detail
Price $290,000 – $400,000
GPUs 8x H100 SXM5 (700W TDP each)
Interconnect 4th-gen NVLink on HGX baseboard, 900 GB/s GPU-to-GPU bandwidth
Assembled by Supermicro, Dell, or NVIDIA (DGX H100)
Use case Training 10B–100B+ parameter LLMs

8x H100 PCIe Server (PCIe 5.0 — Economy / Inference)

Spec Detail
Price $270,000 – $320,000
GPUs 8x H100 PCIe (350W TDP each)
Interconnect Standard PCIe — no full-speed NVLink
Use case High-concurrency inference, single-card-independent training tasks

Why 100x More Expensive Than P100?

Spec 8x P100 Server (Legacy) 8x H100 SXM5 Server (Modern) Gap
Total VRAM 128 GB HBM2 640 GB HBM3 5x capacity, ~3x bandwidth
Architecture Pascal (2016) Hopper (2023+) 4 generations apart
Tensor Cores None 4th-gen with Transformer Engine Orders-of-magnitude acceleration
FP8 Throughput Not supported ~31.6 PFLOPS Enables hundreds-of-times faster LLM training
System Power ~2.5 – 3 kW 10.2 kW+ Requires liquid cooling or heavy air infrastructure

3. Mid-Range Options: The Sweet Spot

Between the bargain-bin P100 and sky-high H100, several compelling “middle ground” options exist.

Option A: 8x NVIDIA A100 (80GB / 40GB) — Industry Standard Compromise

  • 8-GPU server price: $80,000 – $130,000 (depending on 40GB vs. 80GB variant)
  • Total VRAM: 320 GB – 640 GB with NVLink interconnect
  • Architecture: Ampere — full Tensor Core support, perfect ecosystem compatibility
  • Capability: 8x A100 (80GB) can full fine-tune 70B parameter models. Performance is roughly half of H100, but the price is about one-third.
  • Status: Still the absolute workhorse at most AI companies. Prices have dropped ~30% as H100 supply stabilizes.

Option B: 8x RTX 4090 D / RTX 3090 — Best Bang for Buck

For budgets in the $8,000 – $14,000 range, consumer/workstation GPUs offer the best value.

Config Price Total VRAM Notes
4x RTX 4090 D (24GB) $8,000 – $12,000 96 GB Ada Lovelace architecture, strong single-card perf
8x RTX 3090 (24GB) used $9,000 – $14,000 192 GB Massive VRAM pool at rock-bottom pricing
  • Ideal for: Running Llama 3 8B/70B or DeepSeek-R1 distilled models via vLLM, DeepSpeed, or data parallelism.
  • Caveat: No native high-bandwidth NVLink — multi-card communication over PCIe is slower for distributed training. Consumer cards are also prohibited in large-scale data centers.

Option C: 8x NVIDIA L40S / A6000 Ada — Enterprise Lightweight Compromise

  • 8-GPU server price: $45,000 – $65,000
  • Total VRAM: 384 GB (48GB per card)
  • Architecture: Ada Lovelace — latest generation, inference performance rivals A100 in some benchmarks
  • Advantage: Standard PCIe card — no special HGX baseboard needed. Lower power and cooling requirements than H100/A100 SXM. Can run in a standard server room.

Summary Comparison

Option 8-GPU Server Price Total VRAM Best For Key Weakness
8x A100 (40G/80G) $80K – $130K 320 – 640 GB Full fine-tuning of 10B–100B+ models Mostly used/refurb; high power draw
8x L40S / A6000 (48G) $45K – $65K 384 GB Multi-user inference, LoRA fine-tuning No NVLink; limited for very large distributed training
4–8x RTX 4090D / 3090 $8K – $14K 96 – 192 GB Small team dev/test, small-to-mid model fine-tuning Consumer-grade; self-maintained; no data center use

4. Cloud Rental Pricing (International)

Most AI teams rent rather than buy. Here are typical hourly rates for 8-GPU servers on platforms like RunPod, Vast.ai, and Lambda Labs:

Configuration Hourly Rate (8-GPU) Monthly Rate (8-GPU)
8x RTX 4090 / 3090 $3.50 – $6.50 $2,200 – $3,800
8x L40S (48GB) $7.50 – $11.00 $4,500 – $6,500
8x A100 80GB SXM $12.00 – $18.00 $7,500 – $11,000
8x H100 SXM5 $16.00 – $24.00
Hourly cost visualization (8-GPU server):

[ 8x RTX 4090 ]  ██                     ~$5.00/hr
[ 8x L40S    ]   ████                   ~$9.00/hr
[ 8x A100    ]   ███████                ~$15.00/hr
[ 8x H100    ]   ████████████████████   ~$22.00/hr

5. Cloud Rental Pricing (China Domestic Market)

China’s domestic GPU cloud market is extremely competitive — prices are typically 30% to 50%+ cheaper than international providers due to massive oversupply of idle GPUs and the absence of premium cloud service markups.

Key Pricing (RMB / 8-GPU Server)

Configuration Hourly Rate (8-GPU) Monthly Rate (8-GPU)
8x RTX 4090 / 4090D ¥12 – ¥20 (~$1.7 – $2.8) ¥6,500 – ¥9,000
8x L40S (48GB) ¥35 – ¥50 (~$5 – $7) ¥18,000 – ¥26,000
8x A100 80GB SXM ¥65 – ¥110 (~$9 – $15) ¥35,000 – ¥50,000
8x Huawei Ascend 910B ¥28,000 – ¥38,000

Why Is China So Much Cheaper?

  1. Massive idle GPU inventory: During the 2023–2024 AI boom, Chinese companies, universities, and gaming studios stockpiled RTX 3090/4090 cards. As hype cycles shifted, these GPUs flooded into managed hosting as consumer-facing cloud instances — severe oversupply crushed prices.
  2. No cloud premium: Platforms like AWS and Google Cloud bundle brand premium, enterprise-grade networking, and compliance overhead into their pricing. Chinese platforms like AutoDL offer “bare compute” with no frills, pushing prices to the bare minimum.
  • AutoDL: The go-to “national training furnace” for Chinese students and researchers. Rock-bottom RTX 3090/4090 pricing, hourly billing, PyTorch/TensorFlow-friendly UI.
  • Volcano Engine (ByteDance) / Tencent Cloud: Enterprise-grade options. Watch for startup compute subsidy coupons — can bring prices down to 20–30% of list price.

Note on Huawei Ascend 910B: Due to U.S. export controls restricting A100/H100 supply in China, the domestic Ascend 910B has become an important alternative. After PyTorch ecosystem adaptation, its training performance approaches the A100, and pricing is competitive.


6. Buy vs. Rent: Decision Framework

Scenario Recommendation
Project duration < 3 months Rent — no question
Project duration 3–12 months Rent — unless utilization is 24/7
Stable 24/7 workload > 18 months Buy — CapEx breaks even vs. rental
Uncertain workload / experimentation Rent — flexibility is worth the premium

Break-Even Example

A $50,000 8x L40S server needs to run continuously at full load for approximately 10 months before its cost equals the cumulative rental expense — and that calculation excludes electricity, cooling, rack space, and maintenance costs. Only teams with 12+ months of guaranteed 24/7 workload should consider purchasing.


References

  1. eBay — 8-GPU P100 Server Listing
  2. Reddit r/MachineLearning — P100 Cost Discussion
  3. eBay — NVIDIA Tesla P100 Search
  4. Newegg — NVIDIA P100
  5. Kaggle — P100 Discussion
  6. GMI Cloud — H100 Buy vs. Rent Analysis
  7. CloudZero — H100 GPU Cost
  8. Dihuni — HGX H100 Server
  9. Network Outlet — HGX H100 Board
  10. TRG Data Centers — H100 Price
  11. Jarvis Labs — H100 Price Overview
  12. Northflank — H100 GPU Cost
  13. Unix Surplus — 8x P100 Server