Machine Monitoring System Data Fields Checklist
- Matt Ulepic
- 4 hours ago
- 9 min read

Machine Monitoring System Data Fields: What CNC Shops Must Capture
The most common myth in machine monitoring is that “the dashboard is wrong.” In practice, dashboards usually do exactly what the underlying data allows. If your timestamps are fuzzy, your state changes are inconsistent, or your part counters mean different things on different controls, you don’t have a visibility problem—you have an event integrity problem.
For CNC job shops running mixed controllers across multiple shifts, the difference between trustworthy utilization and misleading totals comes down to a small set of machine-state data fields and a few reconciliation rules. This article defines the minimum viable data model—field by field—so you can expose utilization leakage (scheduled time vs true cutting time) before you spend money on more machines or bigger reporting layers.
TL;DR — Machine Monitoring System Data Fields
Model monitoring as time-stamped state transitions plus context—not daily totals.
Standardize a strict state enum (RUN/IDLE/STOPPED/SETUP/ALARM) and define each state operationally.
Capture both event timestamps and ingestion timestamps to detect drift, gaps, and backfill.
Record data provenance (signal source + confidence) to normalize mixed controllers.
Use part counters with explicit semantics (good/total/scrap/rework) to prevent “utilization vs shipments” contradictions.
Require downtime reasons only when stops exceed a threshold; keep reason taxonomy small and auditable.
Validate monotonic time, no overlapping states, and that durations reconcile to scheduled time by shift.
Key takeaway If you want shift-level, actionable visibility, you need clean state-change events with explicit definitions, consistent timestamps, and counters that mean the same thing across machines. When those fields are missing or ambiguous, the gap between ERP expectations and actual machine behavior widens—hiding idle patterns, micro-stops, and setup time that can often be recovered before adding equipment.
What “good visibility” requires: event integrity before metrics
A machine monitoring system is best understood as a stream of time-stamped state transitions—RUN to IDLE, IDLE to STOPPED, STOPPED to SETUP—decorated with enough context to interpret what happened. When those transitions are accurate, utilization and downtime rollups become reliable outputs. When they’re not, daily averages and shift totals will look “reasonable” but still mislead supervisors.
This is why shops often feel a disconnect between ERP and reality: the ERP might say the work order is on pace, while the floor has repeated short stops, long waits, or untracked setups. If the data model doesn’t capture correct boundaries (when a stop started, when it ended, what state it truly was), you can’t see where scheduled time is leaking away.
The minimum viable data model is simple to list, but hard to execute consistently across a mixed fleet: identifiers, timestamps, states, reasons, and counts. The point is not to start with utilization or OEE formulas; the point is to produce trustworthy signals fast—what stopped, when, how long, and why—so a supervisor can act during the shift, not after the week is over. If you’re also building out a broader monitoring approach, keep the scope anchored on data capture and use the higher-level context from machine monitoring systems as background rather than a checklist of UI features.
Core machine-state event fields (the non-negotiables)
Think of your “state-change record” as the atomic unit of truth. Every rollup—shift utilization, downtime minutes, idle pockets—depends on these fields being present and consistent.
Required identifiers
machine_id: internal, immutable key (don’t use a name that changes).
asset_tag: what maintenance/supervision recognizes on the floor.
site_id and/or cell_id: so reporting can match how you manage (cells, departments).
controller_id (optional): useful when one physical machine has swapped controls or gateways.
Time fields
event_start_ts and event_end_ts (or duration_ms): required to compute time in state without guesswork.
ingestion_ts: when the event hit your system (critical for spotting delayed/backfilled records).
timezone or offset: multi-shift reporting breaks when clocks or offsets are assumed.
State enum + data provenance
state: one of RUN, IDLE, STOPPED, SETUP, ALARM (mutually exclusive).
signal_source: controller, operator, derived (so you know what created the record).
confidence_flag or quality_flag: “trusted,” “estimated,” “gap-filled,” etc.
Uniqueness and ordering
event_id: unique key for each state segment.
sequence_no: ordering within a machine stream (helps resolve out-of-order inserts).
out_of_order_flag (optional): mark corrected ordering instead of silently rewriting history.
One practical example: two identical horizontals can show different “run time” simply because one controller reports cycle start at feed hold release while another reports at program start. Without signal_source and a confidence/normalization approach, you’ll end up coaching the wrong shift or chasing a machine that isn’t actually the problem.
Status values and why your state dictionary must be explicit
A “state dictionary” only works if every state has a single operational meaning. If RUN sometimes means “spindle turning” and other times means “program loaded,” your utilization will be incomparable across machines and shifts.
Define each state in physical terms that match CNC reality:
RUN: executing cycle activity you consider productive for capacity (often tied to cycle start/end or feed active).
IDLE: powered and available but not cycling (e.g., waiting on operator, waiting on material, queue gaps).
STOPPED: not available due to an unplanned stop or intentional pause beyond normal handling (e.g., program stop, waiting on maintenance, no operator).
SETUP: planned non-cycle work required to run the next work (tooling, offsets, probing/first-article workflow, fixture swaps).
ALARM: controller alarm present; optionally split “alarm active” vs “stopped without alarm.”
This matters most on complex jobs where the machine alternates between short stops and idle during tool changes, probing, and operator inspection. If everything that isn’t RUN gets lumped into “down,” you’ll miss the real leakage: repeated micro-stops, extended first-article checks, or queue gaps that look small individually but add up across a shift. That’s where structured machine downtime tracking begins—by separating “not cutting” into categories you can actually act on.
You also need explicit rules for micro-stoppages. Many shops use a threshold (for example, stops under a short window like 10–60 seconds) to classify “blips” differently than longer interruptions. The exact threshold is less important than documenting it and applying it consistently; otherwise, one machine will “look worse” simply because its control reports brief feed holds as STOPPED.
Finally, add shift and schedule context so the same state can be interpreted correctly:
shift_id: day vs night isn’t just a label—it’s different staffing, response time, and expectations.
scheduled_flag: was the machine expected to run at that moment (within planned production time)?
Timestamps, boundaries, and reconciliation rules (where most systems fail)
Multi-shift reporting falls apart when systems rely on periodic polling snapshots (“it was RUN at 9:00, 9:01, 9:02…”) instead of true state-change timestamps. Snapshots can be fine for a quick display, but they are fragile for duration accounting: missed samples inflate RUN, delayed samples smear STOPPED across boundaries, and short interruptions disappear.
Define boundary rules explicitly so you don’t double-count or “lose” time:
Shift change: split an event that spans shifts into two segments, each tagged with the correct shift_id.
Breaks and lunches: if breaks are scheduled, they should be represented via scheduled_flag (or a schedule table) so STOPPED during lunch doesn’t get interpreted as a performance issue.
Weekends/planned maintenance windows: keep “not scheduled” distinct from “scheduled but not running.”
Clock integrity is another common trap. Controllers drift. Gateways reconnect. Servers timestamp differently. You need fields (and rules) that let you detect and normalize:
controller_ts (optional) vs server_ts: keep the raw source when available.
last_heartbeat_ts: latest “I’m alive” signal from a machine/gateway.
gap_detected_flag and backfill_flag: mark periods where events were reconstructed after reconnect.
When signals overlap, don’t improvise—use precedence rules. Example: if a machine is tagged SETUP but a cycle start appears, you must decide whether cycle activity overrides SETUP or whether SETUP persists until an operator-confirmed “setup complete.” Whatever you choose, document it and apply it consistently so supervisors can trust the story the data tells.
Counters and throughput fields: part counts that don’t lie
If your part counts are ambiguous, you’ll see contradictions that look like “people problems” but are really counter semantics problems. A classic scenario: the night shift shows higher utilization on the dashboard, but day shift supervisors report more parts shipped. That can happen when the night shift racks up RUN time on warm-up programs, long cycles with low yield, or when the part counter increments on cycle start instead of completed good parts. Without explicit counter fields, the discussion turns into opinions instead of diagnosis.
Primary counters (with required relationships)
total_part_count: all completed parts (good + scrap), if you can define “completed.”
good_part_count: accepted parts (what you would ship).
scrap_count: parts rejected.
rework_count: parts requiring additional operations beyond standard flow.
Your system should enforce reconciliation rules (even if only as validation): for a given job/operation window, good + scrap should not exceed total unless you explicitly model multi-up or partial completions.
Counter provenance and reset rules
counter_source: controller macro/M-code, PLC, operator input, derived.
counter_reset_policy: never, per job, per shift, manual reset (and where it occurs).
This is also where mixed-controller differences show up. Two “identical” machines may behave differently because one control increments a part counter at program end, while another increments at pallet change. Without recording counter_source and reset_policy, it’s easy to misread throughput and falsely conclude one machine or shift is underperforming.
Cycle indicators and job context
cycle_start_ts, cycle_end_ts, cycle_time_ms (optional): powerful for locating leakage between cycles.
work_order_id, operation_id, part_number: ties machine reality to what you promised.
program_name and program_rev (optional): helps separate warm-up/verification from production.
pallet_id / fixture_id (optional): important for multi-pallet horizontals or repeatable setups.
CNC edge cases to handle explicitly: multi-up fixtures (one cycle produces multiple parts), probing cycles that look like “run” but aren’t output, warm-up programs that should be excluded from throughput, and partial parts during interruptions (scrap vs rework decisions made later). These are exactly why capacity conversations should be driven by trustworthy utilization data rather than assumptions—see how shops apply machine utilization tracking software as a capacity recovery tool once counters and state events are consistent.
Downtime reason fields: capturing ‘why’ without slowing operators down
Reason codes are where monitoring becomes actionable—but they can also become unrealistic if you require operators to classify every pause. The practical compromise is to trigger reason capture only when STOPPED exceeds a defined threshold and the time is inside scheduled production.
reason_required_flag: set when STOPPED duration passes your threshold.
downtime_reason_code: the selected leaf reason.
reason_category: planned vs unplanned (at minimum).
reason_entered_by and reason_entered_ts: who entered it and when.
operator_id, shift_id, supervisor_id (optional): adds accountability and coaching context.
last_edited_ts or reason_edit_history: prevents “cleaning up” data after the fact without visibility.
Taxonomy design matters more than the number of codes. Keep leaf reasons limited, separate symptom from cause, and allow “unknown” so production can move—then follow up. Here’s an example snippet that reflects CNC realities without turning into a maintenance system:
Category | Leaf Reason Code (Example) |
Planned | Setup / Fixture change |
Planned | First-article inspection |
Planned | Tooling: preset/offset adjustment |
Unplanned | Waiting on material |
Unplanned | Program stop / verification |
Unplanned | Tool break / tool issue |
Unplanned | Quality hold / measurement required |
Unplanned | Alarm active (code captured separately) |
Unplanned | Unknown (follow-up required) |
A mid-shift diagnostic question to pressure-test your model: if a machine repeatedly flips between IDLE and STOPPED during a complex job (tool changes, probing, inspection), can you separate “planned handling” from “unplanned waiting” without forcing constant operator input? If not, your improvement effort will get misrouted, because all “not RUN” time will look the same.
When it’s time to interpret patterns at scale (across shifts, part families, or recurring “unknown” buckets), use structured review rather than more codes. Tools like an AI Production Assistant can help supervisors and leads summarize recurring stop causes and convert raw events into coaching or scheduling actions—without turning your reason list into a 60-item spreadsheet.
Field-level checklist + sample event log (what to validate before you trust the numbers)
Before you trust utilization, validate the fields and the math at the event level. This is especially important in mixed-controller environments where run signals and counters don’t behave identically.
Checklist (minimum validation)
Required fields present: ids, timestamps, state, provenance, and (when applicable) reason and counter context.
Enums consistent: no surprise states like “Active,” “Cutting,” “Busy” unless mapped into the dictionary.
Timestamps monotonic per machine: event_start/end move forward; out-of-order is flagged.
Durations reconcile: sum of event durations within a scheduled window equals scheduled time (after applying boundary rules).
Sanity checks: no RUN during ALARM unless explicitly modeled; utilization cannot exceed scheduled time.
Counter sanity: part counts align with cycle events; resets are visible and explained.
Sample event log (illustrative)
The table below is a simplified example (not a benchmark) showing how a single machine’s shift can be represented as state segments. Note how short interruptions, setup, and reasons stay separate so “not cutting” time doesn’t get mislabeled.
Event ID | Machine ID | Shift ID | Event Start TS | Event End TS | State | Downtime Reason Code | Good Part Count (Delta) | Signal Source |
e101 | HMC-04 | DAY | 2026-05-15 06:00:00 | 2026-05-15 06:18:00 | SETUP | Setup / Fixture change | 0 | operator |
e102 | HMC-04 | DAY | 2026-05-15 06:18:00 | 2026-05-15 07:02:00 | RUN | +4 | controller | |
e103 | HMC-04 | DAY | 2026-05-15 07:02:00 | 2026-05-15 07:06:00 | IDLE | 0 | derived | |
e104 | HMC-04 | DAY | 2026-05-15 07:06:00 | 2026-05-15 07:17:00 | STOPPED | Waiting on material | 0 | operator |
e105 | HMC-04 | DAY | 2026-05-15 07:17:00 | 2026-05-15 08:05:00 | RUN | +5 | controller | |
e106 | HMC-04 | DAY | 2026-05-15 08:05:00 | 2026-05-15 08:09:00 | ALARM | 0 | controller | |
e107 | HMC-04 | DAY | 2026-05-15 08:09:00 | 2026-05-15 08:22:00 | STOPPED | Tool break / tool issue | 0 | operator |
e108 | HMC-04 | DAY | 2026-05-15 08:22:00 | 2026-05-15 09:00:00 | RUN | +4 | controller |
Common failure patterns to watch for:
“Always RUN” machines: often caused by a single stuck signal or an overly broad RUN definition.
Missing end timestamps: events never close, inflating whichever state they’re stuck in.
Counter resets at shift change: throughput appears to “drop” because you lost continuity.
Different cycle-start semantics across similar machines: run time becomes a controller artifact unless normalized (capture source + confidence).
Implementation note: start with the non-negotiables (event ids, state, start/end timestamps, shift context, provenance) and add optional fields only when they support a real decision. This keeps the rollout lightweight for a shop without heavy IT overhead and reduces the temptation to “buy” visibility by adding layers on top of weak signals.
If you’re evaluating vendors, ask to see how they represent these fields and rules in your environment—not just what the dashboards look like. Cost-wise, the meaningful questions are about data integrity coverage (mixed controllers, multi-shift boundaries, reason workflows) and what it takes to keep the model consistent as you add machines. For practical packaging context, you can reference pricing without turning your evaluation into a features checklist.
Want a fast diagnostic on whether your current data can produce trustworthy utilization and downtime by shift? schedule a demo and bring a small export of your current machine events (or what your system calls “statuses”). The goal isn’t more charts—it’s confirming your state dictionary, timestamp boundaries, and counter semantics are strong enough to expose hidden capacity before you consider capital spend.

.png)








