Implementing Direct Water Cooling (DWC) in Factories

The math is brutal: an NVIDIA GB200 NVL72 rack generates 120 kilowatts of continuous heat. The ceiling for air cooling is around 25 kilowatts per rack. That's a 5× gap on the conservative end and a 15× gap on the aggressive end — and no amount of bigger fans, hot-aisle containment, or chilled-aisle gymnastics closes it. Liquid is roughly 3,500× better than air at carrying heat away. Direct Water Cooling (DWC) is the architecture every factory deploying Blackwell-class AI hardware is converging on, because it's the only physics that works. The retrofit isn't a luxury — it's a precondition. NVIDIA officially mandates liquid cooling for GB200 NVL72; deviation from inlet temperature, flow rate, or pressure-drop specifications triggers automatic 60% performance throttling at the silicon level. This guide walks through how to plan a DWC retrofit for an existing factory: the thermal loop architecture, the four-phase rollout sequence, the maintenance-monitoring stack, and the per-plant economics. Sign up free to see DWC monitoring and predictive maintenance running on your cooling infrastructure.

MAY 12, 2026 5:30 PM EST , Orlando

Upcoming OxMaint AI Live Webinar — Implementing Direct Water Cooling (DWC) in Factories

Live session for facility managers, plant engineering teams, data center architects, and reliability leaders planning DWC retrofits to support next-generation AI server hardware. We'll walk through the full thermal-loop architecture from cold plate to facility water, demonstrate the four-phase retrofit sequence, show real-time CDU and manifold monitoring with AI-driven anomaly detection, and walk through the OxMaint AI deployment that ships pre-trained and ready to run in 6–12 weeks.

Cold plate to facility-loop architecture

Air vs liquid capacity gap

Four-phase retrofit timeline

Live CDU monitoring demo

Why Air Cooling Hit a Wall — The Capacity Gap

This isn't a marginal upgrade. It's a structural shift forced by physics. Air cooling has a hard ceiling that hasn't moved in 20 years; the GPU thermal envelope kept moving, and around the GB200 generation it sailed past the air-cooling line and never looked back. The bar chart below shows the per-rack capacity each cooling architecture supports versus what current and next-gen AI hardware actually demands.

AIR COOLING
Traditional CRAC/CRAH

8–25 kW

Ceiling. Limited by die-surface heat flux.

REAR-DOOR HX
Hybrid retrofit

35–50 kW

Bridges to liquid without rack plumbing.

DIRECT WATER COOLING
Cold-plate DWC

120–200+ kW

Required for GB200 / GB300 NVL72.

25 kW · air ceiling

120 kW · GB200 demand

The 5–15× gap matters: GB200 NVL72 die-surface heat flux exceeds 500 W/cm² — beyond what any air-cooled heatsink can transfer to ambient. Liquid is the only physics that works above the line.

The Thermal Loop — Cold Plate to Facility Water

A DWC system isn't one device; it's a chain of seven components each doing one job in sequence. Heat starts at the GPU die. It transfers across a thermal interface material to a copper cold plate. Coolant flows through micro-channels in the cold plate and carries the heat to a rack manifold. The manifold collects flow from every server in the rack and routes it to a Coolant Distribution Unit (CDU). Inside the CDU, a liquid-to-liquid heat exchanger transfers heat from this server-side loop (the Technology Cooling System, or TCS) to the building's facility water loop (FWS). The facility loop sends the heat outside via cooling tower, chiller, or dry cooler. Get any link wrong and the chain breaks. Book a demo to walk through the thermal-loop monitoring stack on your facility.

Cold Plate

Cu micro-channel · ≤0.03°C/W

Vacuum-brazed copper plate sits directly on the GPU die. Internal micro-channels maximize heat transfer to coolant.

Server Loop

~2.5 L/min per GPU

Coolant routes through the server tray, picking up heat from all GPU and CPU cold plates in sequence.

Rack Manifold

Blind-mate quick-connect

Vertical manifold collects supply and return from every server in the rack. Quick-disconnect ports for hot-swap servicing.

CDU

800 kW – 2 MW capacity

The "heart" of the system. Pumps, filters (50µ), leak detection, condensation control, and the liquid-to-liquid heat exchanger.

L-to-L HX

Plate, ≤3°C approach

Plate-and-frame heat exchanger inside the CDU. Isolates the engineered TCS coolant from facility water — no contamination crossover.

Facility Loop

FWS · building-wide

Building's chilled-water plant carries heat from CDU heat exchangers to outdoor heat-rejection equipment.

Heat Rejection

Cooling tower / dry cooler

Final stage. Heat dumps to ambient air outside. Free-cooling hours grow dramatically with warmer-water DWC operation.

The Four-Phase Retrofit — From Audit to Commissioning

Retrofitting an existing factory or industrial site for DWC isn't a weekend project, but it's not a complete rebuild either. The smart approach is phased: assess what's already there, drop in the CDU and pipe-up to the rack rows, plumb individual racks during scheduled outages, then commission and tune. A typical mid-density facility (4-6 GB200-class racks) takes 12-16 weeks end to end. Sign up free to see retrofit phase tracking and CMMS integration on your project.

PHASE 1

Weeks 1–3

Assessment & Design

Audit existing facility water capacity, electrical service, structural floor loading. CFD model the planned rack layout. Verify FWS supply temperature ≤25°C and capacity ≥1.2× projected IT load.

Output: thermal design specification

PHASE 2

Weeks 4–7

CDU & Plant Install

Install Coolant Distribution Units in-row or end-of-row. Run primary FWS piping from existing chilled-water plant to CDU inlets. Pressure-test, fill, treat, and commission the secondary loop with engineered coolant.

Output: live CDU + facility plumbing

PHASE 3

Weeks 8–12

Rack Plumbing & Server Migration

Mount in-rack manifolds. Connect blind-mate quick-disconnects to each server tray. Migrate workloads from legacy air-cooled racks. Run leak-detection burn-in. Validate inlet temperature 20–25°C, flow 80 L/min, ΔP ≤1.5 bar.

Output: production-ready DWC racks

PHASE 4

Weeks 13–16

Commissioning & AI Monitoring

Tune control loops. Establish healthy baselines for AI anomaly detection. Connect telemetry (1M+ metrics/sec from a single 72-GPU rack) to OxMaint AI predictive-maintenance pipeline. Train staff on CMMS work-order flow.

Output: optimized + AI-monitored

Specifications That Cannot Slip — The DWC Operating Envelope

NVIDIA published these as hard requirements, not preferences. Operating outside the envelope on inlet temperature, flow rate, or pressure drop triggers automatic GPU throttling at the silicon level — up to 60% performance reduction. The reference table below is the spec sheet your CDU controller monitors continuously and the maintenance team escalates against. Sign up free to load your CDU spec data into the DWC compliance dashboard.

Swipe for full specs

Parameter

Spec (GB200)

Tolerance

Failure Action

Coolant Inlet Temperature

20–25 °C

±1 °C

GPU throttle, alarm

Flow Rate (per rack)

80 L/min

≥75 L/min

Throttle, then shutdown

Pressure Drop (rack)

≤1.5 bar

+5%

Pump alarm

GPU Junction Temperature

<75 °C

Hard limit

Automatic throttle

Cold Plate Thermal Resistance

≤0.03 °C/W

By design

Replace plate

Coolant Filtration

50 µm

25 µm optional

CDU alarm

Per-Rack Heat Generation

120 kW (NVL72)

140 kW (GB300)

Capacity planning

Power Architecture

4× 30kW shelves

480V three-phase

97% conversion eff.

Owned, Not Rented — The OxMaint AI Cooling Stack

The OxMaint AI Cooling deployment isn't a SaaS subscription you pay every month forever. It's a pre-configured AI server bundled with edge sensors for CDU monitoring, manifold flow telemetry, leak detection, and the predictive-maintenance pipeline that ingests up to 1 million metrics per second from a fully-loaded rack. Get a quote and order it like the hardware it is — pre-configured, pre-tested, ready to start ingesting cooling-system telemetry within days, and owned outright the day delivery completes.

Perpetual License

No monthly fees, no per-rack metering, no per-CDU billing. Future costs are entirely optional and at your discretion.

Data Sovereignty

Cooling telemetry, baselines, anomaly histories all live on your server, behind your firewall. Never uploaded.

Source Access

Source code and modification rights included. Adjust thermal models, add custom CDU geometries, retrain freely within your org.

AI-Native Core

CDU anomaly detection, leak prediction, flow-rate analytics, NLP work orders — built in, not bolted on.

Pre-Configured · Telemetry-Ready · Ships in 6–12 Weeks

Order an OxMaint DWC Monitoring Stack — Pre-Loaded

A complete on-prem AI monitoring deployment for factory liquid-cooling infrastructure. Edge sensors at every CDU, manifold, and pump. AGX Orin appliances running per-loop autoencoders. RTX PRO 6000 Blackwell central server running thermal anomaly detection, leak prediction, and the OxMaint AI dashboard. Automatic CMMS work-order generation when flow, pressure, or temperature deviates from spec. Pre-trained on industrial liquid-cooling datasets, ready to fine-tune within days.

Investment Summary — Per-Plant Rollout

The OxMaint DWC Monitoring Stack uses the standard per-plant architecture: central RTX PRO 6000 Blackwell server plus two AGX Orin edge appliances, with flow/pressure/temperature sensors mounted at every CDU and rack manifold. Anomaly detection, leak prediction, retrofit-phase tracking, and CMMS connectors all included in the OxMaint AI Software + Integration line. Book a demo to walk through per-plant pricing for your DWC footprint.

Swipe to see breakdown

Component

Unit Cost

Per Plant

Notes

RTX PRO 6000 Blackwell 96GB Server

$19,000

Anomaly detection + dashboard

NVIDIA AGX Orin #1 (CDU Edge)

$4,000

Per-CDU autoencoder + leak prediction

NVIDIA AGX Orin #2 (Manifold Edge)

$4,000

Manifold flow + pressure analytics

Industrial Ethernet Switch + Cabling

~$2,500

Plant-floor switch, Cat6A, SFP modules

Local Electrical / Instrumentation

$8,000–$12,000

~$10,000

Sensor mounting, Modbus gateways

OxMaint AI Software + Integration

$35,000–$55,000

$45,000 avg

Cooling models, CDU library, CMMS connectors

Per-Plant Total

$72,500–$94,500

~$84,500 avg

4-month delivery per plant

4-Plant Full Rollout (with Enterprise AI)

~$420,000–$520,000

Total programme

Parallel delivery + DGX Station GB300 Ultra

$84.5K

Avg per plant

4 mo

Delivery

Recurring fees

∞

Perpetual

Perpetual · Owned · Source Access · Data Sovereignty

Stop Reacting to Cooling Failures — Predict Them, Owned

Real-time CDU monitoring, leak prediction, flow-rate anomaly detection, retrofit-phase tracking, automatic CMMS work-order generation, and the full OxMaint software stack. Your team owns the platform, the AI models, the cooling library, and the source code outright. The architecture every modern factory adopting Blackwell-class AI hardware needs alongside the cooling itself.

Start Your Free Trial Book a 30-Min Demo

Frequently Asked Questions

Do I really need DWC, or can I get away with rear-door heat exchangers?

It depends on what you're cooling. Rear-door heat exchangers (RDHx) are a great hybrid retrofit option that pushes per-rack capacity from the 8-25 kW air-cooling ceiling up to about 35-50 kW without rack-level plumbing — they're plumbed at the rack rear and the servers themselves stay air-cooled internally. That works for H100-generation hardware and most legacy AI workloads. But RDHx hits its own ceiling well below GB200 NVL72's 120 kW per rack and GB300's projected 140 kW. If you're deploying Blackwell-class hardware, NVIDIA officially mandates direct-to-chip liquid cooling. There's no RDHx workaround. The gap isn't just thermal capacity — it's the GB200 die's 500+ W/cm² heat flux, which exceeds what any heatsink-and-air interface can transfer regardless of how much air you push past it. The pragmatic path for many factories is RDHx for legacy AI in racks 1-3 and DWC for new Blackwell builds in racks 4-6, on the same facility water plant.

How does the engineered coolant differ from regular water?

The secondary loop (Technology Cooling System) uses a treated 75/25 to 80/20 water/glycol mix with corrosion inhibitors, biocides, and pH stabilizers — typically meeting <5 ppm chloride, <10 ppm sulfate, and <3 ppm total suspended solids. This is way cleaner than facility water, which typically carries 50-200 ppm dissolved solids and varying microbial load depending on the source. The reason for the engineered fluid is that micro-channel cold plates have channels 0.4-2 mm wide; mineral scale or biofilm buildup chokes flow within months and triggers GPU throttling. The CDU's liquid-to-liquid heat exchanger isolates this clean secondary loop from the dirtier facility water, which is the whole point of having a CDU. Coolant chemistry is monitored continuously — pH drift, conductivity changes, glycol depletion all become predictive-maintenance signals long before performance degrades.

What happens if a leak develops?

Modern CDUs ship with multi-layer leak detection: capacitive sensors at every quick-disconnect, conductive ribbon under the rack drip pan, and pressure-decay monitoring on the secondary loop. Detection latency is typically under 5 seconds. Standard response: the CDU's secondary pumps cut power, isolation valves close, the affected rack receives a controlled-shutdown signal, and the maintenance team gets an SMS alert — all automatic. The engineered glycol-water mix used in TCS loops is non-conductive when fresh, which buys additional protection if a few drops contact electronics before isolation completes. The OxMaint AI predictive-maintenance pipeline goes further: it watches for the leak's precursors — slow pressure drop, micro-flow imbalance between supply and return manifolds, gradual reservoir level changes — and flags developing leaks days or weeks before they become acute. Most leaks in production deployments are now caught at the maintenance-scheduling stage rather than the fail-and-respond stage.

Can I use my existing chilled-water plant, or do I need a new one?

Most factories with an existing chilled-water plant for HVAC or process cooling can extend the same plant to support DWC, with two important checks. First, supply-water temperature: the facility loop needs to deliver water at ≤25°C to feed the CDU's primary side; many process-cooling plants run at 7-12°C, which is fine, while some HVAC chilled-water plants run at 4-6°C, which is over-cold and wastes energy. Second, capacity headroom: a single GB200 NVL72 rack at 120 kW heat is roughly equivalent to 34 tons of cooling, so a four-rack deployment needs ~135 tons of dedicated capacity beyond existing loads. The smart approach is a heat audit during Phase 1 of the retrofit: measure existing chilled-water load profile, calculate IT load addition, identify whether existing plant has the capacity or needs supplementing. In many cases an existing plant has 20-40% reserve capacity that's never been called on, which covers the AI hardware addition without requiring a new chiller. DWC also runs warmer than HVAC, often allowing free-cooling hours that further reduce plant load.

How long until my facility team is productive operating DWC infrastructure?

Most facility teams with existing chilled-water and HVAC experience reach basic productivity within 4-6 weeks of system commissioning, and full operational fluency within 4-6 months. The OxMaint DWC Monitoring deployment includes structured training: weeks 1-2 cover CDU operation, leak detection, pressure/flow basics; weeks 3-4 cover the AI dashboard and anomaly interpretation; weeks 5-12 cover advanced topics including coolant chemistry management, CDU pump replacement procedures, and integration with structural mechanical and electrical engineering review workflows. Teams already operating chilled-water plants ramp faster — they recognize the plumbing vocabulary immediately and just need to learn the rack-level interfaces and IT-coordination workflow. By month 4, the facility team is independently operating the cooling infrastructure with thresholds tuned to plant conditions and CMMS work orders auto-generating from every above-threshold detection.

What Is City Maintenance? A Comprehensive Guide...

What Do Maintenance Managers Do? Roles, Responsibilities...

What is Scheduled Maintenance? Benefits, Importance...

Implementing Direct Water Cooling (DWC) in Factories

Why Air Cooling Hit a Wall — The Capacity Gap

The Thermal Loop — Cold Plate to Facility Water

The Four-Phase Retrofit — From Audit to Commissioning

Specifications That Cannot Slip — The DWC Operating Envelope

Owned, Not Rented — The OxMaint AI Cooling Stack

Investment Summary — Per-Plant Rollout

Frequently Asked Questions

Share This Story, Choose Your Platform!

Latest Posts

AI Auto Work-Order Generation for Power Plant CMMS...

Shutdown Management AI for Power Plant Overhauls...

Main Transformer AI: Dissolved Gas + Partial Discharge...

Induced Draft Fan AI Monitoring: Prevent ID Fan Trips...

Agentic AI for Power Plants: Ranked Recommendations...

What-If Scenario Analysis for Power Plants with AI Twin...

Mill Bowl DP Trending AI: Schedule Inspection Before Trip...

Air Preheater Leakage Detection: AI Catches APH Drift...

Overview

Features

By Industry

Integration

Community

Learn

Popular

What Is City Maintenance? A Comprehensive Guide...

What Do Maintenance Managers Do? Roles, Responsibilities...

What is Scheduled Maintenance? Benefits, Importance...

Implementing Direct Water Cooling (DWC) in Factories

Why Air Cooling Hit a Wall — The Capacity Gap

The Thermal Loop — Cold Plate to Facility Water

The Four-Phase Retrofit — From Audit to Commissioning

Specifications That Cannot Slip — The DWC Operating Envelope

Owned, Not Rented — The OxMaint AI Cooling Stack

Investment Summary — Per-Plant Rollout

Frequently Asked Questions

Share This Story, Choose Your Platform!

Latest Posts