Machine Down Time Graphs: How to Read Them in CNC Shops
- Matt Ulepic
- May 4
- 10 min read

Machine Down Time Graphs: How to Read Them in CNC Shops
If 1st shift looks “steady” and 2nd shift looks “choppy,” you don’t have a motivation problem—you have a visibility problem. In most CNC job shops, both shifts are working hard. The difference is that lost time clusters differently: a few longer interruptions on one shift versus many short stops on another, often tied to handoffs, staging, tool readiness, or changeover discipline.
Machine down time graphs are how you stop arguing about anecdotes and start isolating where capacity is leaking—by shift, by machine, and by time window—so you can decide what to do today and what to standardize this week.
TL;DR — machine down time graphs
Use graphs to answer where, when, and how often downtime repeats—not to “report totals.”
Always segment by shift and machine before you draw conclusions.
Pick the time window to fit the question: last shift (response), 7 days (repeatability), 30 days (systemic leakage).
Use a Pareto to focus investigation, then validate on a timeline or heatmap.
Watch for micro-stops and shift-boundary spikes—they compound more than one-off outages.
Graphs are only as credible as timestamps and reason capture; noise hides the real constraint.
Convert any “hotspot” into a same-shift action: verify, assign an owner, and re-check next shift.
Key takeaway ERP notes and end-of-shift summaries rarely match actual machine behavior minute-to-minute. Downtime graphs make the gap visible—especially across shifts—so you can spot recurring idle patterns, validate them on the floor, and recover capacity by fixing the repeatable causes before you consider adding equipment.
What downtime graphs are actually for (and what they’re not)
Downtime graphs are operational diagnostics. They’re meant to help you answer three practical questions: where time is being lost (which machine/cell), when it’s being lost (hour, shift boundary, day-of-week), and how often it repeats (random vs patterned). When you can answer those, you can make a decision that changes what happens on the floor—dispatching, staffing coverage, staging, tool crib response, or setup readiness.
What downtime graphs are not: “pretty reports” for monthly meetings or a single total number that proves someone is underperforming. A total downtime number is often misleading because it blends unlike conditions: different jobs, different people, different changeovers, and different constraints. Without segmentation, you can’t tell whether you’re looking at a one-off incident (power issue, broken tool, crashed probe) or a repeatable leak (material staging gaps, program prove-out patterns, tool preset bottlenecks).
The real payoff is speed. Faster interpretation reduces response latency: you stop waiting until the end of the week to discover that the “pacer” machine spent multiple short windows idle each shift. If you need broader context on building a cadence around downtime visibility (without turning it into bureaucracy), start with machine downtime tracking and then come back here for the graph-reading mechanics.
The 5 downtime graph types that reveal different problems
Shops tend to overuse one chart (usually a total-by-day bar chart) and then wonder why nothing changes. Different visuals are better at exposing different loss modes—especially in multi-shift CNC environments where changeovers, first-article, probing routines, and operator coverage vary.
1) Time series (trend line): “Is this getting better or worse?”
A time series shows downtime moving across days or shifts. It’s best for seeing whether a countermeasure is sticking. Expect to see step-changes (something improved and stayed improved) or oscillation (good day/bad day). Operationally, it helps you ask: “Did last week’s staging change actually reduce the stops—or did it just shift them to a different hour?”
2) Pareto by reason: “Which few causes dominate losses?”
A Pareto ranks downtime reasons so you don’t chase everything. In a job shop, this often surfaces actionable buckets like “waiting on material,” “setup/first-article,” “tooling,” “program issue,” or “operator not present.” The key is to use it as a starting point, then confirm the pattern with a second graph before you act (timeline or heatmap) so you don’t “fix” the wrong thing based on messy reason capture.
3) Stacked bars by shift: “Which shift is leaking time and why?”
This is the fastest way to compare 1st vs 2nd vs 3rd shift without turning it into a blame discussion. You’re looking for differences in the mix: one shift might have more “setup-related” stops while another has more “waiting” or “minor stops.” When paired with like-for-like job families, shift bars give you a clean direction for what to investigate on the floor.
4) Heatmap (hour of day vs day of week): “When do losses cluster?”
Heatmaps are pattern detectors. They answer: “Is downtime random, or does it show up at predictable times?” In CNC shops, clusters often show up around shift start, lunch windows, shift handoffs, end-of-day reporting, Monday job release, or after changeovers. A heatmap is also a quick way to see if your biggest issue is a particular window rather than a particular machine.
5) Run/Down timeline (gantt-style): “Are stops random, clustered, or changeover-driven?”
A timeline shows sequences: long runs interrupted by short stops, sawtooth patterns around setups, or repeated brief interruptions that suggest response-time or process friction. It’s the closest thing to “watching the machine,” but at scale across days and shifts. If you’re evaluating broader collection approaches, keep it grounded in operational use (not platform shopping) with machine monitoring systems.
How to segment downtime graphs so trends show up fast
Segmentation is where downtime graphs become a decision tool instead of background noise. The goal is not to create more views—it’s to isolate signals quickly enough that a supervisor can validate them during the same shift.
Shift segmentation: compare like-for-like
Start by splitting by shift. If possible, compare periods with similar job mix or at least similar routing complexity (heavy setups vs repeat work). If 2nd shift has more first-article and prove-out, you may expect a different downtime profile—but the graph should still show whether the losses are concentrated in predictable windows (handoff, staging, tool readiness) versus scattered.
Machine segmentation: isolate chronic offenders vs systemic issues
Next, split by machine (or cell). If one mill repeatedly shows “waiting on material” while adjacent machines don’t, that’s a routing/kitting/dispatch problem or a machine-specific workflow dependency—not a shop-wide shortage. Conversely, if many machines show short stops at similar times, you’re likely looking at shared resources (tool crib, forklift availability, inspection bottleneck, programming support).
Time-window selection: last shift vs 7 days vs 30 days
Pick the time window based on the decision you’re making:
Last shift: best for immediate response—what should the supervisor address right now?
Last 7 days: best for repeatability—does the pattern come back across different days and jobs?
Rolling 30 days: best for systemic leakage—small recurring stops that compound and quietly consume capacity.
Filter strategy: planned vs unplanned depends on the question
If you’re trying to recover throughput this week, you may exclude planned downtime (breaks, scheduled maintenance windows) so the graph focuses on controllable interruptions. If you’re trying to understand why a schedule keeps slipping, you may include planned elements to see how much calendar time is truly available versus assumed in the ERP.
Reason-code rollups: keep actionability without a 40-item list
If your Pareto has dozens of tiny categories, the graph becomes an argument about labels. If you only have two categories (“down” and “running”), you can’t act. Roll up reasons into a manageable set that maps to owners: material/kitting, setup/first-article, tooling, program/engineering, maintenance, and operator coverage. The point is credibility: consistent timestamps and consistent capture so the picture reflects real behavior, not end-of-shift memory.
When segmentation reveals recoverable time loss, that’s capacity you can often reclaim before considering new equipment. For a deeper utilization-oriented view (still grounded in shop-floor behavior), see machine utilization tracking software.
Patterns to look for (and what they usually indicate)
Once you have the right graph and segmentation, most shops see the same pattern families. The value is not the chart—it’s having a short list of first hypotheses to test on the floor.
Repeated short stops
If you see frequent brief interruptions, think friction: delayed response time, tool offset adjustments, probing cycles that create waiting, chip management interruptions, or an operator covering multiple machines. First check: are stops clustered around specific operations (probe, tool change, washdown) or tied to operator coverage gaps?
Spikes at shift start/end
Shift boundaries often show a repeatable downtime signature: warm-up routines, incomplete handoff notes, missing material staged at the machine, tool carts not reset, or reporting friction (everyone trying to close out at once). If the graph shows the start of shift as a consistent dip, don’t assume “people are late”—verify whether the machine is ready to run at the bell.
Sawtooth run/down around changeovers
A sawtooth timeline (run, stop, run, stop) around job transitions often points to setup readiness issues: missing fixtures, tool presetting delays, program prove-out iterations, or inspection sign-off pauses. The operational question is: “What must be true before the previous job ends so the next job starts clean?”
One machine outlier
When one machine looks different from its neighbors, look for machine-specific conditions: training gaps on that control, fixture condition, program stability, or how material is presented at that station. A reason Pareto can hint at the category, but a timeline shows whether it’s a few long interruptions or many short ones.
Weekend/Monday effects
If losses cluster on Mondays or right after weekends, think upstream release and readiness: material receiving and put-away timing, job packets not fully released, kitting not complete, or planned maintenance windows that compress available time. You’re not forecasting failures—you’re spotting calendar-driven constraints that repeat and can be managed.
Two shift-focused walkthroughs: reading the graphs to find the real constraint
The fastest way to get value from machine down time graphs is to walk from “pattern” to “question” to “verification,” then use a second view to confirm before you change process. The two walkthroughs below mirror what shows up in many multi-shift job shops.
Walkthrough 1: 2nd shift micro-stops concentrated in the first 90 minutes
What you see: A stacked-by-shift view shows 1st shift with longer continuous runs, while 2nd shift has frequent short downtime bursts. When you switch to an hour-of-day heatmap (segmented to 2nd shift), the hotspot lands in the first 60–90 minutes of that shift.
First hypotheses to test: staging isn’t complete at handoff; tool crib response is slower early 2nd shift; setups are being “handed over” midstream; the first job is waiting on inspection sign-off; or the operator is doing startup tasks that could be standardized.
What to ask supervisors/operators: “Is the first job fully kitted at the machine before you clock in?” “Are the next tools preset and labeled, or are you hunting for lengths/inserts?” “Are you waiting on a program change or first-article approval?” “What’s the most common reason you stop in that first hour?”
Triangulate with a second graph: Pull a run/down timeline for the same machines and zoom into that first 90-minute window across several days. If you see repeated short interruptions right after shift start, you likely have a repeatable handoff/staging issue. If it’s one longer stop most days, it may be a single gating task (inspection, program prove-out, or material delivery rhythm).
Walkthrough 2: Monday “waiting on material” spikes on one mill after changeovers
What you see: One mill shows repeated “waiting on material” downtime while adjacent machines don’t. In a 7-day view, it looks annoying but not decisive. In a 30-day rolling view (segmented by machine), the pattern is clearer: it spikes on Mondays and tends to appear after job changeovers.
First hypotheses to test: that machine’s routing depends on a specific saw/cut length or outside process return; kitting rules aren’t aligned to its dispatch priority; Monday release is happening before material is received/put away; or changeovers are being started before the next job’s material is physically staged.
What to ask: “Is the job packet released before material is checked in?” “Where is the ‘missing’ material usually found—receiving, saw, WIP rack, inspection?” “Are we starting setups without confirming the next kit is complete?” “Does dispatch prioritize this machine in a way that conflicts with kitting sequence?”
Triangulate with a second graph: Use a heatmap (day-of-week vs hour) to confirm the Monday clustering, then use a timeline around changeover periods to confirm whether “waiting” begins immediately after the last job ends (dispatch/kitting) or mid-setup (missing tool/fixture/material variant).
Also watch for the time-window trap: a 24-hour graph might show a big outage that dominates the view, but a 30-day rolling graph can reveal the bigger loss is daily micro-stops tied to probing cycles, tool changes, or small response-time gaps. One-off events matter—but recurring small stops are usually where capacity quietly disappears.
If you’re spending time arguing about what a pattern “means,” interpretation support can help teams move faster from chart to investigation. An example is an AI Production Assistant that helps translate machine-state patterns and reason mixes into a short list of plausible checks—without turning the exercise into a report-writing task.
Common graph mistakes that hide downtime leakage
Most “we already knew that” charts fail because they average away the signal or use a misleading window. These are the frequent traps that keep downtime leakage invisible.
Averaging across machines and shifts until the signal disappears: A shop-level average can look stable while one pacer machine is repeatedly idle during specific windows.
Letting one major event dominate the view: A single outage can drown out daily micro-stops. Always re-check with a longer rolling window.
Too many downtime reasons (noise) or too few (non-actionable): If people can’t pick a reason in 10–30 seconds, the data won’t be consistent; if reasons are too broad, you can’t assign an owner.
Missing timestamps or inconsistent auto-logged down states: End-of-shift recall can shift stops into the wrong hour; over-sensitive auto rules can overstate down time. Credibility comes from consistent timestamps and clear “what counts” definitions.
Treating graphs as monthly reporting: If charts are only reviewed monthly, you’ll spot issues long after the floor adapted around them.
Turning a downtime graph into action within one shift
A downtime graph is useful only if it changes the next shift. The simplest cadence is lightweight and repeatable—no “dashboard ritual,” just disciplined follow-through.
A simple workflow
Identify hotspot: pick one machine/shift/window that shows repeatable loss (heatmap or shift bars), not “the whole shop.”
Validate on the floor: confirm whether the stop matches reality (material missing, tool not preset, waiting on QC, program question).
Assign an owner: supervisor, tooling, engineering, scheduling, or material/kitting—who can change the condition before the next occurrence?
Re-check next shift: did the pattern shrink, move, or stay the same? If it moved, you may have fixed a symptom, not the cause.
What “good” looks like
You’re not chasing a perfect day—you’re reducing repeated patterns, tightening variance between shifts, and shortening recovery time when an interruption happens. Over time, the visual should show fewer recurring clusters (especially around shift start, changeovers, and recurring “waiting” states).
Escalation thresholds: supervisor vs engineering vs scheduling
If the pattern is tied to staging, coverage, or handoff readiness, it’s often a supervisor-led fix (standard work, pre-stage checklist, tool cart reset). If it’s tied to program prove-out iterations, probing strategy, or unstable processes, it needs engineering attention. If it’s tied to Monday spikes, kitting order, or starting setups without confirmed material, it’s usually scheduling/material flow. The graph helps you route the problem to the right owner faster.
Document the countermeasure so the graph improves for the right reason
Write down what changed (staging rule, tool preset process, dispatch rule, handoff checklist) and when it started. Otherwise, a graph can look “better” because people changed how they code reasons, not because the floor got more stable. This is also where cost framing matters: the real expense is usually not the software—it’s the internal inconsistency that creates untrusted data. If you need implementation expectations and what a rollout typically entails (without hunting for a number), see pricing for practical packaging context.
A useful diagnostic check before you buy anything: can your current process produce consistent timestamps and a small set of actionable reasons across shifts? If not, your graphs will stay debatable—and you’ll be tempted to solve a visibility problem with capex. Fix the measurement and response loop first; then decide whether you need automation to scale it.
If you want to see what these graph views look like on a mixed fleet (including legacy equipment) and how teams use them at shift level to isolate recurring downtime, you can schedule a demo.

.png)








