Fatigue Agent — Pilot-to-Rollout Execution Plan¶
Source:
Architechture & Research/Fatigue Agent/Pilot & Rollout/Pilot to Rollout Execution Plan.docxStatus: Reference — operational execution plan for first pilots Note: This document covers a condition monitoring version of the Fatigue Agent (sensor-based, deployed on installed machines), which is a broader scope than the FEA post-processing pitch in Technical_Pitch_Slides.md. The two approaches are complementary: post-processing works on existing simulation results; condition monitoring works on live sensor data from running machines.
Executive Summary¶
This plan turns a first pilot into a repeatable, productised offering that a sales engineer and systems integrator (SI) can execute end-to-end: from technical scoping, to installation and model validation, to commercial packaging and rollout conversion.
Technical thesis: Fatigue damage and early failure risk can be quantified from measured load proxies (strain, vibration, torque/current, cycle counts) using standards-aligned cycle counting and fatigue assessment workflows (rainflow counting + damage accumulation), while anomaly detection provides safety net coverage for failure modes not captured cleanly by fatigue physics.
Commercial thesis: Condition-based monitoring reliably delivers 30–50% reduction in machine downtime and 20–40% increase in machine life — benchmarks suitable for first-pass ROI models (replace with customer numbers once available).
First Pilot Targets¶
| Company | Why |
|---|---|
| SN Maschinenbau | High-availability positioning (>98% machine efficiency); packaging HFFS machines with known high-cycle jaw assemblies — ideal hotspot |
| B&B Verpackungstechnik | >250 employees across DE and US; scalable installed base and service organisation to monetise monitoring |
| NERAK | Already sells explicit service agreements and remote diagnosis; a monitoring agent is a natural add-on to their existing service contracts |
Recommended first pilot subsystem (SN): Sealing/cutting jaw assembly and its drive/linkage — high cyclic loads, impacts during jams, alignment sensitivity, and failure has immediate downtime impact.
Pilot Scope: Technical Implementation¶
Sensor Package¶
Place sensors at hotspots identified via drawings, service history, or lightweight FEA. For welded joints, follow IIW recommendations for hot-spot families (weld toe, attachments, cutouts). For mechanical components, align with FKM guideline influencing factors.
Minimum sensor set: - Strain gauges at 2–4 critical locations (24-bit ADC, bridge completion, simultaneous sampling) - Vibration accelerometers, tri-axial (IEPE input, anti-aliasing, ≥1 kHz sample rate) - Drive/PLC tags: speed, torque/current, cycle counters, fault codes, recipe ID
Edge Compute Specification¶
Do not stream raw high-rate data continuously. Compute features and fatigue counters on the edge; store raw waveform only for triggered events; send summarised data upstream.
Minimum edge hardware: - 4-core IPC, 8–16 GB RAM, 256–512 GB SSD, dual NICs, TPM recommended - Time sync: NTP/PTP alignment with plant clock - OPC UA or MQTT interface for upstream data transmission
Data Retention Policy¶
| Data type | Retention |
|---|---|
| Fatigue ledger outputs (damage D(t), cycle histograms) | 12–24 months |
| Derived condition features (vibration, torque features) | 6–12 months |
| Raw waveforms | 7–30 days rolling + frozen event captures |
| Audit logs and model versions | ≥24 months |
Processing Pipeline¶
Feature Extraction¶
Strain-derived: - Peak/valley sequence per channel - Stress/strain range histogram + mean stress bins - Damage-equivalent load (DEL) proxy - Fatigue damage D(t) accumulator - Duty cycle: cycles/hour by recipe/speed
Vibration-derived: - Time domain: RMS, peak, crest factor, kurtosis, skewness - Frequency domain: band power, spectral peaks, envelope spectrum for bearing defects - Baseline drift detection in key frequency bands
Drive/PLC-derived: - Torque/current: mean, variance, peaks, spikes, energy per cycle - Speed profile and start/stop count - Fault/jam codes and durations - Recipe/format identifier (critical for comparability across runs)
Cycle Counting and Damage Model¶
Cycle counting: Rainflow counting aligned with ASTM E1049
Damage model: Palmgren-Miner linear damage rule:
For bins i: ΔD = n_i / N_i
Total damage D(t) = Σ ΔD
Alert threshold: D approaching 1.0 (with application-specific safety factors)
Where N_i comes from: - General components: FKM Guideline (analytical fatigue strength, influencing factors, load-characteristic dependence) - Welded joints: IIW Recommendations (S-N detail categories, hot spots, partial safety concepts)
Remaining Useful Life (RUL) Estimation¶
- Maintain a fatigue ledger per component (inputs: cycle histogram, mean stress bins, temperature factors, duty cycle context; output: D(t) with confidence bounds)
- Estimate damage rate dD/dt under current duty cycle cluster (by recipe/speed)
- RUL ≈ (D_fail − D_now) / E[dD/dt], with uncertainty intervals
- Blend condition indicators: if vibration anomaly score rises sharply, increase hazard (fatigue ledger is not the only gate)
Anomaly Detection¶
Start with interpretable, unsupervised methods (low labelling burden): - Robust z-score / EWMA change detection on key features (fast, explainable) - Isolation Forest or One-Class SVM on an engineered feature vector - Autoencoder only if feature scaling and drift monitoring can be guaranteed
IT/OT Integration Checklist¶
Minimum PLC Tag Set¶
- Machine state: running, stopped, cleaning, maintenance mode
- Speed setpoint and actual
- Servo/drive torque or current, and alarms
- Cycle counters (jaw cycles, product cycles)
- Jam/fault codes + timestamps
- Recipe/format identifier
- Existing temperature sensors in cabinet or bearings
Protocol Standards¶
- OPC UA: Secure client/server, documented information model, certificate-based auth, application and communication layer security (OPC UA 1.04+)
- MQTT: OASIS MQTT 5.0 for lightweight publish/subscribe telemetry
Topic schema:
/telemetry/features/... (1 Hz or per cycle)
/telemetry/events/... (faults, jams, interventions)
/telemetry/rawsnap/... (event windows only)
/alerts/... (severity, confidence, recommended action)
Cybersecurity Controls¶
Framework: ISA/IEC 62443 (zones/conduits, shared responsibilities, lifecycle coverage)
Minimum controls: - Network segmentation with allow-listing of outbound connections - Certificate-based auth for OPC UA; certificate rotation plan; no default credentials - Signed updates for edge box; patch policy compatible with plant operations - Remote access via customer-approved VPN/jump host; session logging and least privilege - Data minimisation: features transmitted by default; raw waveforms only on-demand
Regulatory note: EU Machinery Regulation (EU) 2023/1230 applies from 20 January 2027 and includes explicit cybersecurity requirements for machinery placed on the market.
Validation Plan¶
Validation must be defined before results are reviewed.
Phase 1 — Benchmarks: Validate core engine against published analytical solutions, IIW/FKM benchmark problems, and ASTM E1049 reference signals.
Phase 2 — Pilot data: Compare agent outputs against existing manual assessments from pilot partners (same FE results / same physical load scenarios, same load cases).
Phase 3 — Field correlation: Correlate predictions with actual field failure data and maintenance records. Track prediction accuracy. Build confidence intervals.
Pilot artefacts to produce: - Labelled event log (all stops, jams, maintenance actions, part replacements) - Ground-truth inspections at planned intervals (visual + NDT at hotspot locations) - Backtesting: model run on first 2–4 weeks as frozen baseline, then prospective evaluation
KPIs: - Lead time distribution (days) for actionable alerts before failure - Precision/recall of alerts (define "true positive" tied to maintenance findings) - RUL calibration: predicted risk bands vs observed degradation and inspections
ROI Model Template¶
Use this to build the customer business case. Fill in customer numbers; use benchmarks as placeholders only.
C_downtime: cost per hour of downtime (€/hr)
H_event: average hours of downtime per target failure event
N_events_year: events per year (baseline, from maintenance records)
R_reduction: expected reduction fraction (benchmark: 0.30–0.50)
C_pilot: pilot cost (€)
C_rollout: annual subscription + service (€/yr)
C_parts: annual spare part savings (€/yr)
Annual downtime savings = C_downtime × H_event × N_events_year × R_reduction
Simple payback (months) = 12 × (C_pilot / annual downtime savings)
Rollout ROI (year 1) = (savings + C_parts − C_rollout) / C_rollout
Commercial Packaging¶
Pricing Models¶
Model A — Hardware + SaaS: - One-time: hardware + installation + SI labour - Recurring: monthly/annual subscription per machine (analytics + dashboards + alerts + model updates)
Model B — Availability Service Add-On: - Bundle monitoring into OEM service contract with defined response times and inspection cadence (aligns with NERAK-style service positioning)
Model C — Retrofit Programme: - Fixed-price retrofit package for installed base; optional financing via "downtime avoided" narrative
Subscription Tiers¶
| Tier | Contents |
|---|---|
| Basic | Condition features, thresholds, dashboards, event capture |
| Pro | Fatigue ledger + component RUL + inspection workflow + recommended actions |
| Enterprise | Fleet analytics, recipe clustering, automated warranty/service insights |
Three-Month Sprint Plan¶
| Period | Milestones |
|---|---|
| Week 1–2 | Contract/SOW + KPI definitions; failure-mode workshop; OT security and network plan |
| Week 3–4 | Sensor kit finalised; install drawings; PLC/drive tag mapping; edge box imaging |
| Week 5–6 | Install + commissioning; golden run baseline; event capture verification |
| Week 7–8 | Fatigue ledger v1 (rainflow + Miner); anomaly baseline; alert routing to service workflow |
| Week 9–10 | Tuning + validation (inspections/NDT tie-in); KPI measurement starts; ROI worksheet populated |
| Week 11–12 | Pilot results report; rollout kit v1; commercial proposal for 5–10 machine rollout |
Key Assumptions and Risks¶
Assumptions (state explicitly in pitch): - Exact subsystem geometry and known field failure history must be finalised in a workshop before installation — hotspot selection and sensor count cannot be finalised from this document alone - PLC/drive accessibility and protocol support depend on machine generation and customer IT/OT policies; plan assumes read-only access via OPC UA or gateway mapping is feasible - The monitoring agent is decision-support, not a safety function — if later positioned as a safety function, requirements change substantially - ROI ranges use published benchmarks as framing placeholders only; customer-specific downtime costs and event rates must be substituted for a defensible business case
Primary risks: - Integration friction (PLC access, OT security policies) can stall momentum — address with IT/OT security whitepaper and pre-agreed interface contract upfront - Sensor installation window requires machine downtime — must align with planned maintenance schedule - Insufficient labelled failure data from pilot machine history reduces model validation quality in Phase 2
Standards Referenced¶
| Standard | Application |
|---|---|
| ASTM E1049 | Rainflow cycle counting procedure |
| FKM Guideline | Analytical fatigue strength assessment for mechanical components |
| IIW Recommendations | S-N detail categories and hot-spot fatigue for welded joints |
| Eurocode 3 | Fatigue assessment for steel structures |
| ISO 17359 | Procedures for setting up a condition monitoring programme |
| ISO 13374-1 | Software specifications for CM data processing and communication |
| ISO 13379 | Data interpretation and diagnostics concepts |
| ISO 16063-21 | Accelerometer calibration methods |
| ISA/IEC 62443 | OT cybersecurity — zones, conduits, lifecycle responsibilities |
| OPC UA 1.04 | Secure industrial communication protocol |
| OASIS MQTT 5.0 | Lightweight publish/subscribe telemetry protocol |
| EU Machinery Regulation (EU) 2023/1230 | Applies from 20 Jan 2027; includes cybersecurity requirements |
Related Documents¶
- Technical Pitch Slides — FEA post-processing approach (complementary to sensor-based monitoring)
- Problem Brief — Industrial Equipment
- Target Companies