What are common machine downtime reason codes?

Common reason codes are grouped into planned (e.g., Scheduled Maintenance, Changeover, Breaks) and unplanned (e.g., Tool Failure, Material Shortage, Operator Unavailable, Unplanned Maintenance). Standardizing these codes is the first step to accurate analysis.

How does a Pareto analysis help reduce manufacturing downtime?

A Pareto analysis graphically separates the "vital few" problems from the "trivial many." By visualizing your downtime reason codes in a Pareto chart, you can immediately identify the 2-3 root causes that account for 80% of your lost production time, ensuring your improvement efforts yield the highest possible ROI.

What specific machine status data fields should I track for downtime?

At a minimum, you must track: Machine ID, Event Timestamp, Machine Status (e.g., Running, Idle, Fault), Duration of Status, and the assigned Downtime Reason Code. Advanced systems can also track part counts, cycle times, and overrides to provide deeper context.

Essential Utilization Data Fields for Job Shops

Matt Ulepic
May 6
9 min read

Updated: May 11

Essential utilization data fields for job shops: define run/idle/down, capture shift-proof timestamps, and calculate true utilization without an IT-heavy MES

Essential Utilization Data Fields for Job Shops (Minimum Viable Set)

If your ERP says the shop was “busy” but you still miss due dates, you don’t have a scheduling problem—you have a measurement problem. Most CNC job shops aren’t short on opinions about what ran; they’re short on a small, consistent set of utilization data fields that make “running” provable across shifts, machines, and handoffs.

The practical goal isn’t to “collect everything.” It’s to capture only what changes decisions this week: where time disappears between what was planned and what actually happened on the machine—without launching an IT-heavy MES project.

TL;DR — Essential Utilization Data Fields for Job Shops

Treat utilization as time accounting: running time vs available time, with consistent definitions.
Use only three machine states (run/idle/down) to get shift-proof comparisons.
The core fields are: machine ID, timestamped state changes, state value, planned production window, and operator/shift ID.
Separate idle from down to assign ownership correctly (production vs maintenance).
Add supporting fields sparingly: coarse down reasons, job ID, optional part-count signal, exception notes, and source/confidence.
Daily review should focus on long idle/down blocks and “unknowns,” not perfect codes.
Mixed controls are workable: automate where you can; use minimal manual logging where you can’t.

Key takeaway
True utilization doesn’t require dozens of codes or perfect ERP reconciliation—it requires shift-proof definitions of run/idle/down plus timestamped state changes and a clear “available time” window. Once those fields are consistent, the hidden loss shows up fast: long idle blocks at handoffs, disputed “it was running” claims, and downtime that’s really waiting. Fixing that leakage is usually the first capacity recovery step before buying another machine.

Essential Data: Manufacturing Downtime Tracking Reason Codes, Pareto Analysis & Machine Status Data Fields

For a plant manager, unidentified downtime is pure lost profit. Without accurate data, you're left guessing why a CNC machine sat idle for 45 minutes or why a press went offline. Establishing clear machine status data fields is the first step to creating standardized reason codes, which then feed a Pareto analysis to reveal your biggest production bottlenecks—like "tool changeovers" or "material shortages"—and provide the ROI justification for process improvements.

What are the key components of a downtime analysis?

The only utilization question that matters (and why most shops can’t answer it)

Utilization is a simple fraction: running time ÷ available time. Everything that makes utilization “hard” in a job shop comes down to two problems: (1) the numerator is guessed (because “running” isn’t defined and logged consistently), and (2) the denominator is argued (because “available” changes depending on who’s defending the schedule).

This is where utilization leakage hides: the gray zones between the schedule and the machine—setup handoffs, first-article approval, waiting on material, waiting on inspection, a programmer tweak, an operator covering two machines, or a shift that “ran all night” but still didn’t produce what the plan assumed. Without state-based time, those gaps get compressed into end-of-shift notes or never recorded at all.

ERP timestamps are not utilization data. They capture transactions (move tickets, labor entries, operation complete), often after the fact and with incentives to “keep it clean.” They do not capture minute-by-minute machine behavior—especially across multiple shifts where the best and worst habits are amplified.

“Essential data fields” in this context means the minimum required to calculate utilization and act on it: intervene during a shift, adjust dispatching, or fix a recurring handoff problem. If you’re trying to decide whether to invest in better visibility, start by understanding the operational purpose of machine utilization tracking software—not as a dashboard, but as a way to turn ambiguous time into accountable time.

Define the 3 machine states so they’re shift-proof: Run vs Idle vs Down

Multi-shift utilization falls apart when each shift uses different meanings for “down” and “running.” The fix is not more granularity—it’s enforceable, repeatable definitions for three states that can be applied to a VMC, a lathe, or a Swiss, regardless of control vintage.

Run

Run means the machine is executing a cycle/program in a way that you’d accept as production-relevant motion (e.g., cycle active, spindle/cutting cycle, or controlled execution depending on what your equipment can signal). Edge cases will happen—single-block, dry-run, warm-up, probing—but your rule should be consistent. If the machine is moving but the job is on QA hold and cannot proceed, that’s typically idle (capable but waiting), not run.

Idle

Idle means the machine could run but is not running. This is where job shops lose capacity without noticing: waiting on an operator, waiting on material, waiting on first-article approval, waiting on inspection, waiting on a tool preset, waiting on a program change, or extended setup that has no “hard fault” attached. Idle is uncomfortable because it often points to dispatching, staffing, or handoffs rather than a mechanical failure.

Required scenario: second shift says “machine ran all night,” but morning still sees missed due dates. In a three-state record, you often find long idle stretches: waiting on first-article signoff, waiting on material that didn’t get staged, or the next op not being released. The argument stops being personal; it becomes “what kept it from being runnable?”

Down

Down means the machine is not capable of running without intervention: alarm/fault conditions, E-stop, a broken tool that requires maintenance-level help (not just swapping an insert), electrical issues, planned maintenance, or a repair in progress. Down is appropriate when production cannot resolve it through normal pacing and setup work.

Separating idle from down changes ownership. That matters in a Swiss/lathe cell where “low utilization” can be misdiagnosed. Required scenario: if three-state data shows most non-run time is idle (waiting on operator/inspection) rather than down (fault/maintenance), the fix belongs with staffing, cell support, or inspection flow—not a maintenance lecture.

One rule for ambiguous time: pick a default classification and an escalation process. For example, if the machine is stopped and nobody can state a fault condition within a short window, default to idle and require a note if it’s later reclassified as down. The goal is auditability, not perfection on the first pass.

Essential utilization data fields (minimum viable set)

Once the three states are defined, the “minimum viable” data model is small. The win is consistency: the same fields, captured the same way, across a mixed fleet and across shifts.

Machine identifier: a unique, stable machine ID that doesn’t change by nickname (“Mazak 2,” “Big Lathe”). It must match whatever the schedule uses so you can compare planned vs actual without translation.
Timestamped state changes: start/end times for each state block, or event time + duration. Without timestamps, you don’t have time accounting—only a story.
State value: exactly one of run/idle/down from a controlled vocabulary. No “misc,” no “other,” no shift-specific synonyms.
Planned production window (available time): the scheduled time the machine is expected to be available by machine and shift (including planned breaks if you choose). This prevents denominator games and keeps discussions focused.
Operator/shift identifier: identify the shift (and optionally the lead/operator) so patterns show up as operational realities—handoffs, coverage, and support constraints—rather than blame.

These fields are enough to quantify utilization and to separate “we were scheduled” from “we were actually running.” They also create a clean foundation for machine downtime tracking without forcing you into a heavy taxonomy on day one.

The ‘supporting’ fields that prevent bad utilization math (use sparingly)

After you can trust run/idle/down and timestamps, a small set of supporting fields can reduce rework and stop misleading conclusions. The key is restraint: add only what prevents repeated arguments or enables a near-term decision.

Coarse “reason” (Down only): start with 5–8 reasons max (e.g., fault/alarm, tool break requiring intervention, maintenance planned, crash, power/air issue, waiting on service). Don’t force reasons for idle; you’ll end up coding narratives.
Work order / job ID: use it to tie leakage to dispatching and release decisions (“this job sat in idle pending first-article approval”), not to reconcile every ERP field.
Part-count signal (optional): useful as a sanity check for run time (“we had 6 hours of run with zero parts—why?”). Don’t make part counting a prerequisite for utilization.
Notes/annotation for exceptions: capture rare but real holds like first-article approval, QA hold, material shortage, or program prove-out. Notes prevent fake “reason codes” that pollute your categories.
Confidence/source flag: automatic vs manual entry (and optionally “edited”). This lets you audit
data quality without accusing anyone—especially important in mixed-control environments.

When you do need deeper categorization, treat it as a second phase after state stability—not a prerequisite. For broader context on what monitoring can (and can’t) capture across equipment types, see machine monitoring systems.

You don't need a dedicated IT guy to get real-time visibility, you just need a system that captures the right signals automatically. To see the actual architecture of how this works, read our technical breakdown on manufacturing downtime tracking data fields.

How to calculate true utilization with these fields (worked examples)

You don’t need a long lecture on formulas to use these fields. The calculation is straightforward once the denominator (available time) is declared and state time is trustworthy. Also: utilization is not OEE—don’t mix performance and quality into your first pass or you’ll lose the operational signal.

Example 1: 10-hour shift, three-state timeline → utilization and leakage

Assumptions: a machine is scheduled for a 10-hour window (available time = 10:00). State log for the shift:

Time	State	Duration
6:00–7:10	Idle (setup + waiting on first-article approval)	1:10
7:10–10:40	Run	3:30
10:40–11:20	Down (alarm/fault)	0:40
11:20–14:30	Run	3:10
14:30–16:00	Idle (waiting on material + operator covering another machine)	1:30

Totals: Run = 6:40 (6.67 hours), Idle = 2:40 (2.67 hours), Down = 0:40 (0.67 hours). Utilization = 6.67 ÷ 10.00 = 66.7%.

The actionable insight is not “66.7.” It’s that idle (2:40) is the biggest leakage bucket—driven by first-article approval and material staging—so the immediate management action is a handoff rule: first-article approvals and material kitting must be ready by a set point in the shift, or the job doesn’t get dispatched.

Example 2: Same timeline, idle misclassified as down → the wrong owner

Now imagine the two idle blocks (1:10 and 1:30) get logged as “down” because the machine wasn’t cutting. Your totals become Run = 6:40, Down = 3:20, Idle = 0:00. Utilization is still 66.7%—but the story changes completely.

With “down” dominating, the shop is likely to send the problem to maintenance (or to the machine builder) and hold the wrong meeting. In the Swiss/lathe cell scenario, that’s how you end up “fixing” a machine that was actually waiting on inspection coverage or an operator bounce. This is why three-state definitions matter more than adding codes.

What to review daily: run %, idle %, and down % by machine and by shift. Look for long blocks (e.g., 30–90+ minutes) and repeated patterns. If you need help interpreting patterns at scale, that’s where tooling like an AI Production Assistant can support faster triage—but the value still depends on clean state definitions and timestamps.

Implementation without an IT team: capture rules that actually stick

The fastest way to fail is to design a perfect system nobody can maintain. For 10–50 machine shops, the practical rollout is a hybrid: automatic capture where it’s easy, and minimal manual inputs where it’s not—while still producing consistent run/idle/down time.

Start automatic where possible; keep manual inputs simple where not

Required scenario: a mixed-control shop where 6 machines are easy to connect and 10 are not. Don’t wait for full integration. Connect the machines that can provide run signals reliably, and for the rest, require only state changes with timestamps (run/idle/down) and a shift ID. The point is comparable utilization across the fleet, not identical data sources.

Manual methods (whiteboards, end-of-shift spreadsheets, “I’ll enter it later”) can work on day one, but they break under multi-shift reality: entries get batch-filled, “down” becomes a catch-all, and the timeline turns into a narrative. Automation is the scalable evolution because it captures state changes as they happen and reduces the incentive to rewrite history.

Set a standard for state-change latency

Define when a change must be captured. A practical rule is: if a stop lasts longer than a short threshold (often 5–15 minutes depending on your work), the operator records idle or down with a note if needed. The goal is to prevent hours of “unknown” time that gets guessed at shift end.

Daily audit routine (10 minutes) and the mid-article diagnostic

Operational diagnostic CTA: once you have even a week of three-state logs, pick your top five machines by scheduled hours and ask three questions in a 10-minute daily review: (1) What were the longest idle blocks yesterday? (2) What were the longest down blocks? (3) Which blocks have no credible note/reason? This is where you recover hidden time before you talk about capital purchases.

Avoid the downtime-code trap. Expand reason codes only after patterns stabilize; otherwise you’ll spend weeks debating labels instead of fixing the constraint. And make it multi-shift: shift handoff should reference the state record (“Machine 12 is idle waiting on first-article approval”) rather than “we were running fine.”

Implementation considerations and cost framing (without the noise)

When you evaluate implementation, focus on friction: mixed controls, limited IT bandwidth, and whether the approach supports a “minimum viable” model first. Cost should be framed around rollout effort, maintainability, and whether the data will be trusted enough to drive dispatching and staffing decisions. If you need a straightforward way to think about packaging and rollout tradeoffs without hunting for numbers in a meeting, review the pricing page for context on what typically drives cost (machine count, connectivity, and support), then come back to your definitions and capture rules.

If your team wants to pressure-test whether your current fields and definitions are enough to expose leakage—especially across shifts and across “easy” vs “hard” machines—set up a short working session. You can schedule a demo and walk through your current state definitions, what you log today, and what would change decisions next week.

Essential Utilization Data Fields for Job Shops

Essential Utilization Data Fields for Job Shops (Minimum Viable Set)

TL;DR — Essential Utilization Data Fields for Job Shops

Essential Data: Manufacturing Downtime Tracking Reason Codes, Pareto Analysis & Machine Status Data Fields

The only utilization question that matters (and why most shops can’t answer it)

Define the 3 machine states so they’re shift-proof: Run vs Idle vs Down

Run

Idle

Down

Essential utilization data fields (minimum viable set)

The ‘supporting’ fields that prevent bad utilization math (use sparingly)

How to calculate true utilization with these fields (worked examples)

Example 1: 10-hour shift, three-state timeline → utilization and leakage

Example 2: Same timeline, idle misclassified as down → the wrong owner

Implementation without an IT team: capture rules that actually stick

Start automatic where possible; keep manual inputs simple where not

Set a standard for state-change latency

Daily audit routine (10 minutes) and the mid-article diagnostic

Implementation considerations and cost framing (without the noise)

Guide To Machine Data

Machine Data Insights

What's Happening Now

Welding Production Dashboard Software: Buyer’s Guide

Welding Department Dashboards: 7 Views That Run the Shift

Assembly Production Dashboard Software for Job Shops

About

Try The Utilization Revenue Calculator

Download The How To For Machine Data Collection