Machine Downtime Chart: 6 Charts That Reveal Repeat Issues
- Matt Ulepic
- 4 days ago
- 9 min read

Machine Downtime Chart: 6 Charts That Reveal Repeat Issues
If your ERP says you “hit schedule” but the floor feels like it’s constantly catching up, the gap is usually hidden time loss: small stoppages, slow changeovers, and recurring delays that don’t show up clearly in end-of-shift totals. A machine downtime chart is only useful if it turns that fog into a pattern you can assign, verify, and fix—by machine, shift, time-of-day, job, and reason code.
The goal isn’t “reporting.” It’s faster diagnosis: seeing the same loss repeat in the same window so supervisors and maintenance can act within days—not after month-end cleanup.
TL;DR — Machine Downtime Chart
Pick charts based on the decision you need to make (process fix, staffing, scheduling, maintenance response), not what looks clean.
Always separate planned vs. unplanned time—or you’ll “find problems” in good scheduling.
Use event-level timestamps when you’re hunting repeat windows (e.g., the same 20–30 minute slot every night).
Pareto by reason tells you what consumes the most time; duration distributions tell you whether it’s chronic micro-stops or rare big hits.
Heatmaps expose time-of-day and shift patterns that averages hide.
Any review rhythm must end with an owner, a deadline, and a verification chart next week.
Key takeaway Downtime charts become decision tools only when they reflect actual machine behavior at the event level—timestamps, planned vs. unplanned separation, and reason codes—so you can spot repeat loss by shift and time window and recover capacity before you assume you need more machines.
What a machine downtime chart should answer (before you pick a chart)
Before you choose a chart, decide what action it must trigger. In a 10–50 machine CNC shop, the “best” view is the one that lets someone on the team make a call without a separate investigation: dispatch maintenance, remove a process bottleneck, adjust scheduling, or retrain on a repeat setup/workholding issue.
A downtime chart should answer four minimum questions:
Where is it happening? (machine, cell, or “pacer” machine)
When is it happening? (time-of-day and shift—not just the date)
Why is it happening? (reason code, even if simplified)
How big is it? (duration and frequency—both matter)
Next: separate planned vs. unplanned downtime. Breaks, scheduled maintenance, meetings, and planned changeovers shouldn’t “pollute” your problem chart. If you mix planned time into the same view as alarms, waiting on material, or probe checks, the chart will punish good scheduling and send the team chasing the wrong fixes.
Finally, decide whether you need event-level data (start/end timestamps for each stop) or whether a rolled-up total is acceptable. Totals can work for a monthly “where do we spend the most unplanned time?” discussion. But if you’re trying to see repeat windows by shift or recurring micro-stops during a job family, totals are too blunt.
Assign an owner to each question so the chart has a destination:
Ops manager: patterns across machines/shifts; prioritization and escalation
Shift supervisor: daily repeat windows; execution of quick fixes
Maintenance: long-stop events; response time and parts readiness
Programmer/process engineer: job-family friction; probing/offset verification loops; setup workflow
The 6 downtime chart types that surface different problems
Most shops don’t need more charts—they need the right few views that expose different failure modes. The six below cover the majority of “why are we losing time?” questions in multi-shift CNC operations.
1) Pareto chart by downtime reason
Use this when you need to focus. A Pareto view answers: “Which few unplanned reasons account for most lost time?” It’s the fastest way to avoid spreading attention across 20 minor categories.
2) Stacked bar by machine (with reasons)
This answers: “Where does downtime accumulate, and is it the same mix of causes everywhere?” Two machines can have similar total downtime but completely different drivers—one mostly waiting on material, another mostly alarms. That difference dictates who should act first.
3) Downtime timeline (Gantt-style) per machine
A timeline shows clustering: stops around job changeovers, repeated short interruptions, and the handful of long events that can dominate a day. It’s also the easiest way to see whether “one bad event” is skewing your perception.
4) Heatmap by time-of-day vs machine/shift
Heatmaps reveal repeat windows: the same 20–30 minute loss at the same time, often tied to staging, inspection queues, tool crib availability, or handoff gaps—not the machine itself. If your operation runs multiple shifts, this is one of the highest-signal views you can use.
5) Scatter plot of downtime events (duration vs time) or bubble chart (duration + reason)
This is your outlier detector. Plot each event so you can separate “chronic small stuff” from “rare but devastating” events. Adding reason codes as color (or bubble size by duration) helps you see whether a particular cause produces lots of small hits or a few large ones.
6) Histogram or box plot of downtime durations by reason
This answers: “Is this reason code mostly micro-stops or mostly long events?” If “probe check” creates dozens of 2–4 minute interruptions, you’ll never see the real pain in a single average. A distribution view makes that pattern obvious.
If you’re building the broader capture discipline (not just the charts), see machine downtime tracking for context on event capture and why timestamps matter.
How to structure downtime data so the charts don’t lie
Charts don’t fail because the visualization is wrong; they fail because the underlying downtime data collapses important context. If you want recurrence detection (same machine + same time window + same reason + same shift), you need event records that preserve that detail.
At minimum, each downtime event should include:
Start time and end time (or start + duration)
Machine (and cell/line if relevant)
Shift (don’t infer it later—store it)
Job/part or job family (so you can see “this only happens on these parts”)
Reason code
Operator (optional—useful when the goal is coaching or training, not blame)
Normalize planned categories so they’re consistent: breaks, scheduled maintenance, meetings, and planned changeover should be labeled the same way every time. Then filter planned downtime out of “problem” views and keep it in capacity and scheduling discussions.
Keep reason codes disciplined. If “Other” becomes a landfill, the Pareto becomes meaningless. A simple rule that works: if someone selects “Other,” require a quick note and a “next-best” reason (material, tool, program, inspection, maintenance, setup, quality, etc.) so the event can be reclassified during the daily review.
Handle overlapping states with a consistent rule. Example: the machine alarms, and while the operator waits, material also isn’t staged. Decide how you’ll code it (e.g., first-cause wins, or “dominant time” wins) and stick with it. Inconsistent coding creates false shifts in the Pareto.
Finally, aim for minute-level timestamps instead of end-of-shift summaries. Summaries can tell you “second shift had more downtime,” but they can’t show you that it spikes in a repeat 9:30–10:00 PM window. For mixed fleets and legacy machines, that’s also where manual logging breaks down—operators remember the big stops, not the repeated small ones. Moving from manual logs to near-real-time event capture is often the scalable evolution that makes the charts trustworthy.
For additional context on monitoring approaches without turning this into a “platform tour,” see machine monitoring systems.
Reading patterns: what each chart is telling you to investigate
A good downtime chart doesn’t “prove” root cause; it narrows the search to a small set of hypotheses you can validate quickly on the floor. Use these heuristics to move from visual pattern to next action.
Pareto pattern: if the top 1–3 unplanned reasons dominate, schedule spot checks and supervisor notes within 24–48 hours. Confirm that the reason coding matches reality and that the “big bars” aren’t just better-coded machines.
Heatmap pattern: recurring hot windows usually point upstream. Look for material staging gaps, inspection queueing, tool crib constraints, program release timing, or shift handoff timing. If the window aligns to breaks, meetings, or planned tasks, verify whether it’s incorrectly classified as unplanned.
Timeline clustering: if stops cluster around changeovers, don’t jump straight to “operators are slow.” Check setup readiness: tools kitted, fixtures ready, first-article process, gage availability, and whether the next job is queued before the current one finishes.
High frequency / low duration: repeated 1–5 minute stops are often process friction—offset checks, chip clearing, part handling, door opens, probing/verification loops—not a single mechanical failure. Your leverage is standard work, better kitting, program edits, or job-family-specific setup improvements.
Low frequency / high duration: when a few long events dominate, focus on escalation and response. Who is notified, how quickly, what’s the triage path, and are parts/tools available? The fix is often less about the machine and more about reducing “time-to-help.”
If you’re using an assistant to interpret patterns across many machines and shifts, the goal should be faster triage and clearer questions for the team—not predictions. That’s where an AI Production Assistant can help summarize repeat windows and reason clusters without turning the work into spreadsheet archaeology.
Two shop-floor scenarios: the same downtime data, better charts, faster fixes
Below are two realistic scenarios using simplified assumptions (a week of downtime events across a small fleet, with planned vs. unplanned separated and a handful of reason codes). The point is to show how the same underlying events can lead to very different conclusions depending on chart choice.
Scenario 1: Second shift looks “worse,” but only on two machines
You start with a single summary view: downtime by shift. It shows second shift has higher downtime. The wrong conclusion is “second shift needs more oversight” or “those operators are the issue.”
Now split it properly:
Filter to unplanned downtime only (planned breaks and planned changeovers removed).
Use a stacked bar by machine (reasons stacked) and slice by shift.
Add a heatmap (time-of-day vs machine, or time-of-day vs shift).
What it reveals: second shift’s extra downtime is concentrated on two machines, with a repeat stoppage around 9:30–10:00 PM. The dominant reason code isn’t “alarm” or “maintenance”—it’s material staging/changeover delay. That changes the fix from “repair the machine” to “stage the next job and tools before the changeover window.”
Action: The shift supervisor owns a staging checklist (material pulled, program released, tool cart ready) triggered 30–60 minutes before the changeover, and the ops manager verifies the handoff timing with receiving/tool crib if needed.
What to monitor next week: a heatmap focused on the 9:00–10:30 PM window for those two machines, plus a timeline view on nights when the stop still occurs to see whether it’s truly staging or a different hidden constraint.
Scenario 2: A lathe shows “low downtime %,” but bleeds time in micro-stops
A CNC lathe looks fine on an aggregate chart: “low downtime %” for the week. The wrong conclusion is that it’s not worth attention, so you chase louder problems elsewhere.
But operators complain it “never runs clean” on one job family. To see that, you need two chart types that averages hide:
A histogram/box plot of downtime durations for that machine, filtered to unplanned.
A Pareto by reason code filtered to the specific job family.
What it reveals: lots of 2–4 minute interruptions (micro-stops) that spike during that job family. The Pareto points to recurring probe/offset verification stops. On a summary chart, those look like noise; in a distribution view, they’re clearly the dominant pattern.
Action: The programmer/process engineer owns a short review: is the probing cycle overly conservative, are offsets being reset unnecessarily, is there a gage strategy problem, or does the setup sheet lack a standard verification step that prevents repeated “double-check” pauses?
What to monitor next week: the same duration distribution for that reason code, plus a stacked bar by job family to confirm the micro-stop cluster is shrinking rather than just being recoded as “Other.”
Both scenarios share a theme: recover capacity by eliminating repeat loss before assuming you need capital equipment. If you can’t see repeat windows and micro-stops, you’ll misdiagnose the constraint.
Operational cadence: how often to review downtime charts (and with whom)
Charts only create visibility if they show up in a repeatable rhythm. In multi-shift CNC shops, the simplest cadence is the one that keeps chart review short, consistent, and tied to ownership.
Daily (10–15 minutes) with the shift lead: review top unplanned downtime reasons and the biggest downtime events (timeline view on the pacer machines). The purpose is quick triage and reclassification of “Other,” not a deep meeting.
Weekly (30–60 minutes) with ops + leads + maintenance: review Pareto trends by machine and shift, plus at least one heatmap slice to catch time-of-day windows. Confirm whether last week’s fixes reduced recurrence—or merely moved it to a different reason code.
Monthly (60 minutes) with ops + process engineering: identify chronic categories that deserve engineering work (fixture improvements, setup standardization, inspection flow changes) instead of repeated supervisor firefighting.
Use clear escalation rules: if a supervisor fix doesn’t reduce recurrence within a week (or if the same time window keeps lighting up), escalate to a cross-functional fix with maintenance, tooling, programming, or scheduling. The anti-pattern to avoid is reporting-only: every review should end with an owner, a deadline, and the exact chart that will verify the change.
Implementation note: if you’re still relying on manual end-of-shift entries, expect missing micro-stops and fuzzy timestamps. Moving to automated capture doesn’t have to be a heavy IT project, but you should plan for (1) reason-code discipline, (2) planned vs unplanned definitions, and (3) how you’ll review and correct data weekly. For capacity-focused context, see machine utilization tracking software.
If you’re evaluating whether it’s worth instrumenting this now, keep the cost framing practical: the expense is less about “software features” and more about whether you can consistently capture timestamped events across a mixed fleet and turn them into a weekly operating rhythm. You can review implementation expectations on the pricing page without needing a long evaluation cycle.
If you want to sanity-check your current downtime views against the six charts above—especially planned vs unplanned separation and recurrence by shift/time window—use a short working session to map your current data to the simplest set of charts that would change decisions next week. When you’re ready, schedule a demo to walk through your downtime events and see what patterns are being hidden by rollups.

.png)








