Implementing Direct Water Cooling (DWC) in Factories
By Riley Quinn on May 7, 2026
The math is brutal: an NVIDIA GB200 NVL72 rack generates 120 kilowatts of continuous heat. The ceiling for air cooling is around 25 kilowatts per rack. That's a 5× gap on the conservative end and a 15× gap on the aggressive end — and no amount of bigger fans, hot-aisle containment, or chilled-aisle gymnastics closes it. Liquid is roughly 3,500× better than air at carrying heat away. Direct Water Cooling (DWC) is the architecture every factory deploying Blackwell-class AI hardware is converging on, because it's the only physics that works. The retrofit isn't a luxury — it's a precondition. NVIDIA officially mandates liquid cooling for GB200 NVL72; deviation from inlet temperature, flow rate, or pressure-drop specifications triggers automatic 60% performance throttling at the silicon level. This guide walks through how to plan a DWC retrofit for an existing factory: the thermal loop architecture, the four-phase rollout sequence, the maintenance-monitoring stack, and the per-plant economics. Sign up free to see DWC monitoring and predictive maintenance running on your cooling infrastructure.
MAY 12, 2026 5:30 PM EST , Orlando
Upcoming OxMaint AI Live Webinar — Implementing Direct Water Cooling (DWC) in Factories
Live session for facility managers, plant engineering teams, data center architects, and reliability leaders planning DWC retrofits to support next-generation AI server hardware. We'll walk through the full thermal-loop architecture from cold plate to facility water, demonstrate the four-phase retrofit sequence, show real-time CDU and manifold monitoring with AI-driven anomaly detection, and walk through the OxMaint AI deployment that ships pre-trained and ready to run in 6–12 weeks.
This isn't a marginal upgrade. It's a structural shift forced by physics. Air cooling has a hard ceiling that hasn't moved in 20 years; the GPU thermal envelope kept moving, and around the GB200 generation it sailed past the air-cooling line and never looked back. The bar chart below shows the per-rack capacity each cooling architecture supports versus what current and next-gen AI hardware actually demands.
AIR COOLING Traditional CRAC/CRAH
8–25 kW
Ceiling. Limited by die-surface heat flux.
REAR-DOOR HX Hybrid retrofit
35–50 kW
Bridges to liquid without rack plumbing.
DIRECT WATER COOLING Cold-plate DWC
120–200+ kW
Required for GB200 / GB300 NVL72.
25 kW · air ceiling
120 kW · GB200 demand
The 5–15× gap matters: GB200 NVL72 die-surface heat flux exceeds 500 W/cm² — beyond what any air-cooled heatsink can transfer to ambient. Liquid is the only physics that works above the line.
The Thermal Loop — Cold Plate to Facility Water
A DWC system isn't one device; it's a chain of seven components each doing one job in sequence. Heat starts at the GPU die. It transfers across a thermal interface material to a copper cold plate. Coolant flows through micro-channels in the cold plate and carries the heat to a rack manifold. The manifold collects flow from every server in the rack and routes it to a Coolant Distribution Unit (CDU). Inside the CDU, a liquid-to-liquid heat exchanger transfers heat from this server-side loop (the Technology Cooling System, or TCS) to the building's facility water loop (FWS). The facility loop sends the heat outside via cooling tower, chiller, or dry cooler. Get any link wrong and the chain breaks. Book a demo to walk through the thermal-loop monitoring stack on your facility.
1
Cold Plate
Cu micro-channel · ≤0.03°C/W
Vacuum-brazed copper plate sits directly on the GPU die. Internal micro-channels maximize heat transfer to coolant.
2
Server Loop
~2.5 L/min per GPU
Coolant routes through the server tray, picking up heat from all GPU and CPU cold plates in sequence.
3
Rack Manifold
Blind-mate quick-connect
Vertical manifold collects supply and return from every server in the rack. Quick-disconnect ports for hot-swap servicing.
4
CDU
800 kW – 2 MW capacity
The "heart" of the system. Pumps, filters (50µ), leak detection, condensation control, and the liquid-to-liquid heat exchanger.
5
L-to-L HX
Plate, ≤3°C approach
Plate-and-frame heat exchanger inside the CDU. Isolates the engineered TCS coolant from facility water — no contamination crossover.
6
Facility Loop
FWS · building-wide
Building's chilled-water plant carries heat from CDU heat exchangers to outdoor heat-rejection equipment.
7
Heat Rejection
Cooling tower / dry cooler
Final stage. Heat dumps to ambient air outside. Free-cooling hours grow dramatically with warmer-water DWC operation.
The Four-Phase Retrofit — From Audit to Commissioning
Retrofitting an existing factory or industrial site for DWC isn't a weekend project, but it's not a complete rebuild either. The smart approach is phased: assess what's already there, drop in the CDU and pipe-up to the rack rows, plumb individual racks during scheduled outages, then commission and tune. A typical mid-density facility (4-6 GB200-class racks) takes 12-16 weeks end to end. Sign up free to see retrofit phase tracking and CMMS integration on your project.
PHASE 1
Weeks 1–3
Assessment & Design
Audit existing facility water capacity, electrical service, structural floor loading. CFD model the planned rack layout. Verify FWS supply temperature ≤25°C and capacity ≥1.2× projected IT load.
Output: thermal design specification
PHASE 2
Weeks 4–7
CDU & Plant Install
Install Coolant Distribution Units in-row or end-of-row. Run primary FWS piping from existing chilled-water plant to CDU inlets. Pressure-test, fill, treat, and commission the secondary loop with engineered coolant.
Output: live CDU + facility plumbing
PHASE 3
Weeks 8–12
Rack Plumbing & Server Migration
Mount in-rack manifolds. Connect blind-mate quick-disconnects to each server tray. Migrate workloads from legacy air-cooled racks. Run leak-detection burn-in. Validate inlet temperature 20–25°C, flow 80 L/min, ΔP ≤1.5 bar.
Output: production-ready DWC racks
PHASE 4
Weeks 13–16
Commissioning & AI Monitoring
Tune control loops. Establish healthy baselines for AI anomaly detection. Connect telemetry (1M+ metrics/sec from a single 72-GPU rack) to OxMaint AI predictive-maintenance pipeline. Train staff on CMMS work-order flow.
Output: optimized + AI-monitored
Specifications That Cannot Slip — The DWC Operating Envelope
NVIDIA published these as hard requirements, not preferences. Operating outside the envelope on inlet temperature, flow rate, or pressure drop triggers automatic GPU throttling at the silicon level — up to 60% performance reduction. The reference table below is the spec sheet your CDU controller monitors continuously and the maintenance team escalates against. Sign up free to load your CDU spec data into the DWC compliance dashboard.
Swipe for full specs
Parameter
Spec (GB200)
Tolerance
Failure Action
Coolant Inlet Temperature
20–25 °C
±1 °C
GPU throttle, alarm
Flow Rate (per rack)
80 L/min
≥75 L/min
Throttle, then shutdown
Pressure Drop (rack)
≤1.5 bar
+5%
Pump alarm
GPU Junction Temperature
<75 °C
Hard limit
Automatic throttle
Cold Plate Thermal Resistance
≤0.03 °C/W
By design
Replace plate
Coolant Filtration
50 µm
25 µm optional
CDU alarm
Per-Rack Heat Generation
120 kW (NVL72)
140 kW (GB300)
Capacity planning
Power Architecture
4× 30kW shelves
480V three-phase
97% conversion eff.
Owned, Not Rented — The OxMaint AI Cooling Stack
The OxMaint AI Cooling deployment isn't a SaaS subscription you pay every month forever. It's a pre-configured AI server bundled with edge sensors for CDU monitoring, manifold flow telemetry, leak detection, and the predictive-maintenance pipeline that ingests up to 1 million metrics per second from a fully-loaded rack. Get a quote and order it like the hardware it is — pre-configured, pre-tested, ready to start ingesting cooling-system telemetry within days, and owned outright the day delivery completes.
Perpetual License
No monthly fees, no per-rack metering, no per-CDU billing. Future costs are entirely optional and at your discretion.
Data Sovereignty
Cooling telemetry, baselines, anomaly histories all live on your server, behind your firewall. Never uploaded.
Source Access
Source code and modification rights included. Adjust thermal models, add custom CDU geometries, retrain freely within your org.
AI-Native Core
CDU anomaly detection, leak prediction, flow-rate analytics, NLP work orders — built in, not bolted on.
Pre-Configured · Telemetry-Ready · Ships in 6–12 Weeks
Order an OxMaint DWC Monitoring Stack — Pre-Loaded
A complete on-prem AI monitoring deployment for factory liquid-cooling infrastructure. Edge sensors at every CDU, manifold, and pump. AGX Orin appliances running per-loop autoencoders. RTX PRO 6000 Blackwell central server running thermal anomaly detection, leak prediction, and the OxMaint AI dashboard. Automatic CMMS work-order generation when flow, pressure, or temperature deviates from spec. Pre-trained on industrial liquid-cooling datasets, ready to fine-tune within days.
The OxMaint DWC Monitoring Stack uses the standard per-plant architecture: central RTX PRO 6000 Blackwell server plus two AGX Orin edge appliances, with flow/pressure/temperature sensors mounted at every CDU and rack manifold. Anomaly detection, leak prediction, retrofit-phase tracking, and CMMS connectors all included in the OxMaint AI Software + Integration line. Book a demo to walk through per-plant pricing for your DWC footprint.
Swipe to see breakdown
Component
Unit Cost
Per Plant
Notes
RTX PRO 6000 Blackwell 96GB Server
$19,000
$19,000
Anomaly detection + dashboard
NVIDIA AGX Orin #1 (CDU Edge)
$4,000
$4,000
Per-CDU autoencoder + leak prediction
NVIDIA AGX Orin #2 (Manifold Edge)
$4,000
$4,000
Manifold flow + pressure analytics
Industrial Ethernet Switch + Cabling
~$2,500
~$2,500
Plant-floor switch, Cat6A, SFP modules
Local Electrical / Instrumentation
$8,000–$12,000
~$10,000
Sensor mounting, Modbus gateways
OxMaint AI Software + Integration
$35,000–$55,000
$45,000 avg
Cooling models, CDU library, CMMS connectors
Per-Plant Total
$72,500–$94,500
~$84,500 avg
4-month delivery per plant
4-Plant Full Rollout (with Enterprise AI)
~$420,000–$520,000
Total programme
Parallel delivery + DGX Station GB300 Ultra
$84.5K
Avg per plant
4 mo
Delivery
$0
Recurring fees
∞
Perpetual
Perpetual · Owned · Source Access · Data Sovereignty
Stop Reacting to Cooling Failures — Predict Them, Owned
Real-time CDU monitoring, leak prediction, flow-rate anomaly detection, retrofit-phase tracking, automatic CMMS work-order generation, and the full OxMaint software stack. Your team owns the platform, the AI models, the cooling library, and the source code outright. The architecture every modern factory adopting Blackwell-class AI hardware needs alongside the cooling itself.
Do I really need DWC, or can I get away with rear-door heat exchangers?
It depends on what you're cooling. Rear-door heat exchangers (RDHx) are a great hybrid retrofit option that pushes per-rack capacity from the 8-25 kW air-cooling ceiling up to about 35-50 kW without rack-level plumbing — they're plumbed at the rack rear and the servers themselves stay air-cooled internally. That works for H100-generation hardware and most legacy AI workloads. But RDHx hits its own ceiling well below GB200 NVL72's 120 kW per rack and GB300's projected 140 kW. If you're deploying Blackwell-class hardware, NVIDIA officially mandates direct-to-chip liquid cooling. There's no RDHx workaround. The gap isn't just thermal capacity — it's the GB200 die's 500+ W/cm² heat flux, which exceeds what any heatsink-and-air interface can transfer regardless of how much air you push past it. The pragmatic path for many factories is RDHx for legacy AI in racks 1-3 and DWC for new Blackwell builds in racks 4-6, on the same facility water plant.
How does the engineered coolant differ from regular water?
The secondary loop (Technology Cooling System) uses a treated 75/25 to 80/20 water/glycol mix with corrosion inhibitors, biocides, and pH stabilizers — typically meeting <5 ppm chloride, <10 ppm sulfate, and <3 ppm total suspended solids. This is way cleaner than facility water, which typically carries 50-200 ppm dissolved solids and varying microbial load depending on the source. The reason for the engineered fluid is that micro-channel cold plates have channels 0.4-2 mm wide; mineral scale or biofilm buildup chokes flow within months and triggers GPU throttling. The CDU's liquid-to-liquid heat exchanger isolates this clean secondary loop from the dirtier facility water, which is the whole point of having a CDU. Coolant chemistry is monitored continuously — pH drift, conductivity changes, glycol depletion all become predictive-maintenance signals long before performance degrades.
What happens if a leak develops?
Modern CDUs ship with multi-layer leak detection: capacitive sensors at every quick-disconnect, conductive ribbon under the rack drip pan, and pressure-decay monitoring on the secondary loop. Detection latency is typically under 5 seconds. Standard response: the CDU's secondary pumps cut power, isolation valves close, the affected rack receives a controlled-shutdown signal, and the maintenance team gets an SMS alert — all automatic. The engineered glycol-water mix used in TCS loops is non-conductive when fresh, which buys additional protection if a few drops contact electronics before isolation completes. The OxMaint AI predictive-maintenance pipeline goes further: it watches for the leak's precursors — slow pressure drop, micro-flow imbalance between supply and return manifolds, gradual reservoir level changes — and flags developing leaks days or weeks before they become acute. Most leaks in production deployments are now caught at the maintenance-scheduling stage rather than the fail-and-respond stage.
Can I use my existing chilled-water plant, or do I need a new one?
Most factories with an existing chilled-water plant for HVAC or process cooling can extend the same plant to support DWC, with two important checks. First, supply-water temperature: the facility loop needs to deliver water at ≤25°C to feed the CDU's primary side; many process-cooling plants run at 7-12°C, which is fine, while some HVAC chilled-water plants run at 4-6°C, which is over-cold and wastes energy. Second, capacity headroom: a single GB200 NVL72 rack at 120 kW heat is roughly equivalent to 34 tons of cooling, so a four-rack deployment needs ~135 tons of dedicated capacity beyond existing loads. The smart approach is a heat audit during Phase 1 of the retrofit: measure existing chilled-water load profile, calculate IT load addition, identify whether existing plant has the capacity or needs supplementing. In many cases an existing plant has 20-40% reserve capacity that's never been called on, which covers the AI hardware addition without requiring a new chiller. DWC also runs warmer than HVAC, often allowing free-cooling hours that further reduce plant load.
How long until my facility team is productive operating DWC infrastructure?
Most facility teams with existing chilled-water and HVAC experience reach basic productivity within 4-6 weeks of system commissioning, and full operational fluency within 4-6 months. The OxMaint DWC Monitoring deployment includes structured training: weeks 1-2 cover CDU operation, leak detection, pressure/flow basics; weeks 3-4 cover the AI dashboard and anomaly interpretation; weeks 5-12 cover advanced topics including coolant chemistry management, CDU pump replacement procedures, and integration with structural mechanical and electrical engineering review workflows. Teams already operating chilled-water plants ramp faster — they recognize the plumbing vocabulary immediately and just need to learn the rack-level interfaces and IT-coordination workflow. By month 4, the facility team is independently operating the cooling infrastructure with thresholds tuned to plant conditions and CMMS work orders auto-generating from every above-threshold detection.