Fatigue Agent — Pilot-to-Rollout Execution Plan¶

Source: Architechture & Research/Fatigue Agent/Pilot & Rollout/Pilot to Rollout Execution Plan.docx Status: Reference — operational execution plan for first pilots Note: This document covers a condition monitoring version of the Fatigue Agent (sensor-based, deployed on installed machines), which is a broader scope than the FEA post-processing pitch in Technical_Pitch_Slides.md. The two approaches are complementary: post-processing works on existing simulation results; condition monitoring works on live sensor data from running machines.

Executive Summary¶

This plan turns a first pilot into a repeatable, productised offering that a sales engineer and systems integrator (SI) can execute end-to-end: from technical scoping, to installation and model validation, to commercial packaging and rollout conversion.

Technical thesis: Fatigue damage and early failure risk can be quantified from measured load proxies (strain, vibration, torque/current, cycle counts) using standards-aligned cycle counting and fatigue assessment workflows (rainflow counting + damage accumulation), while anomaly detection provides safety net coverage for failure modes not captured cleanly by fatigue physics.

Commercial thesis: Condition-based monitoring reliably delivers 30–50% reduction in machine downtime and 20–40% increase in machine life — benchmarks suitable for first-pass ROI models (replace with customer numbers once available).

First Pilot Targets¶

Company	Why
SN Maschinenbau	High-availability positioning (>98% machine efficiency); packaging HFFS machines with known high-cycle jaw assemblies — ideal hotspot
B&B Verpackungstechnik	>250 employees across DE and US; scalable installed base and service organisation to monetise monitoring
NERAK	Already sells explicit service agreements and remote diagnosis; a monitoring agent is a natural add-on to their existing service contracts

Recommended first pilot subsystem (SN): Sealing/cutting jaw assembly and its drive/linkage — high cyclic loads, impacts during jams, alignment sensitivity, and failure has immediate downtime impact.

Pilot Scope: Technical Implementation¶

Sensor Package¶

Place sensors at hotspots identified via drawings, service history, or lightweight FEA. For welded joints, follow IIW recommendations for hot-spot families (weld toe, attachments, cutouts). For mechanical components, align with FKM guideline influencing factors.

Minimum sensor set: - Strain gauges at 2–4 critical locations (24-bit ADC, bridge completion, simultaneous sampling) - Vibration accelerometers, tri-axial (IEPE input, anti-aliasing, ≥1 kHz sample rate) - Drive/PLC tags: speed, torque/current, cycle counters, fault codes, recipe ID

Edge Compute Specification¶

Do not stream raw high-rate data continuously. Compute features and fatigue counters on the edge; store raw waveform only for triggered events; send summarised data upstream.

Minimum edge hardware: - 4-core IPC, 8–16 GB RAM, 256–512 GB SSD, dual NICs, TPM recommended - Time sync: NTP/PTP alignment with plant clock - OPC UA or MQTT interface for upstream data transmission

Data Retention Policy¶

Data type	Retention
Fatigue ledger outputs (damage D(t), cycle histograms)	12–24 months
Derived condition features (vibration, torque features)	6–12 months
Raw waveforms	7–30 days rolling + frozen event captures
Audit logs and model versions	≥24 months

Processing Pipeline¶

Feature Extraction¶

Strain-derived: - Peak/valley sequence per channel - Stress/strain range histogram + mean stress bins - Damage-equivalent load (DEL) proxy - Fatigue damage D(t) accumulator - Duty cycle: cycles/hour by recipe/speed

Vibration-derived: - Time domain: RMS, peak, crest factor, kurtosis, skewness - Frequency domain: band power, spectral peaks, envelope spectrum for bearing defects - Baseline drift detection in key frequency bands

Drive/PLC-derived: - Torque/current: mean, variance, peaks, spikes, energy per cycle - Speed profile and start/stop count - Fault/jam codes and durations - Recipe/format identifier (critical for comparability across runs)

Cycle Counting and Damage Model¶

Cycle counting: Rainflow counting aligned with ASTM E1049

Damage model: Palmgren-Miner linear damage rule:

For bins i: ΔD = n_i / N_i
Total damage D(t) = Σ ΔD
Alert threshold: D approaching 1.0 (with application-specific safety factors)

Where N_i comes from: - General components: FKM Guideline (analytical fatigue strength, influencing factors, load-characteristic dependence) - Welded joints: IIW Recommendations (S-N detail categories, hot spots, partial safety concepts)

Remaining Useful Life (RUL) Estimation¶

Maintain a fatigue ledger per component (inputs: cycle histogram, mean stress bins, temperature factors, duty cycle context; output: D(t) with confidence bounds)
Estimate damage rate dD/dt under current duty cycle cluster (by recipe/speed)
RUL ≈ (D_fail − D_now) / E[dD/dt], with uncertainty intervals
Blend condition indicators: if vibration anomaly score rises sharply, increase hazard (fatigue ledger is not the only gate)

Anomaly Detection¶

Start with interpretable, unsupervised methods (low labelling burden): - Robust z-score / EWMA change detection on key features (fast, explainable) - Isolation Forest or One-Class SVM on an engineered feature vector - Autoencoder only if feature scaling and drift monitoring can be guaranteed

IT/OT Integration Checklist¶

Minimum PLC Tag Set¶

Machine state: running, stopped, cleaning, maintenance mode
Speed setpoint and actual
Servo/drive torque or current, and alarms
Cycle counters (jaw cycles, product cycles)
Jam/fault codes + timestamps
Recipe/format identifier
Existing temperature sensors in cabinet or bearings

Protocol Standards¶

OPC UA: Secure client/server, documented information model, certificate-based auth, application and communication layer security (OPC UA 1.04+)
MQTT: OASIS MQTT 5.0 for lightweight publish/subscribe telemetry

Topic schema:

/telemetry/features/...  (1 Hz or per cycle)
/telemetry/events/...    (faults, jams, interventions)
/telemetry/rawsnap/...   (event windows only)
/alerts/...              (severity, confidence, recommended action)

Cybersecurity Controls¶

Framework: ISA/IEC 62443 (zones/conduits, shared responsibilities, lifecycle coverage)

Minimum controls: - Network segmentation with allow-listing of outbound connections - Certificate-based auth for OPC UA; certificate rotation plan; no default credentials - Signed updates for edge box; patch policy compatible with plant operations - Remote access via customer-approved VPN/jump host; session logging and least privilege - Data minimisation: features transmitted by default; raw waveforms only on-demand

Regulatory note: EU Machinery Regulation (EU) 2023/1230 applies from 20 January 2027 and includes explicit cybersecurity requirements for machinery placed on the market.

Validation Plan¶

Validation must be defined before results are reviewed.

Phase 1 — Benchmarks: Validate core engine against published analytical solutions, IIW/FKM benchmark problems, and ASTM E1049 reference signals.

Phase 2 — Pilot data: Compare agent outputs against existing manual assessments from pilot partners (same FE results / same physical load scenarios, same load cases).

Phase 3 — Field correlation: Correlate predictions with actual field failure data and maintenance records. Track prediction accuracy. Build confidence intervals.

Pilot artefacts to produce: - Labelled event log (all stops, jams, maintenance actions, part replacements) - Ground-truth inspections at planned intervals (visual + NDT at hotspot locations) - Backtesting: model run on first 2–4 weeks as frozen baseline, then prospective evaluation

KPIs: - Lead time distribution (days) for actionable alerts before failure - Precision/recall of alerts (define "true positive" tied to maintenance findings) - RUL calibration: predicted risk bands vs observed degradation and inspections

ROI Model Template¶

Use this to build the customer business case. Fill in customer numbers; use benchmarks as placeholders only.

C_downtime:     cost per hour of downtime (€/hr)
H_event:        average hours of downtime per target failure event
N_events_year:  events per year (baseline, from maintenance records)
R_reduction:    expected reduction fraction (benchmark: 0.30–0.50)
C_pilot:        pilot cost (€)
C_rollout:      annual subscription + service (€/yr)
C_parts:        annual spare part savings (€/yr)

Annual downtime savings = C_downtime × H_event × N_events_year × R_reduction
Simple payback (months) = 12 × (C_pilot / annual downtime savings)
Rollout ROI (year 1)  = (savings + C_parts − C_rollout) / C_rollout

Commercial Packaging¶

Pricing Models¶

Model A — Hardware + SaaS: - One-time: hardware + installation + SI labour - Recurring: monthly/annual subscription per machine (analytics + dashboards + alerts + model updates)

Model B — Availability Service Add-On: - Bundle monitoring into OEM service contract with defined response times and inspection cadence (aligns with NERAK-style service positioning)

Model C — Retrofit Programme: - Fixed-price retrofit package for installed base; optional financing via "downtime avoided" narrative

Subscription Tiers¶

Tier	Contents
Basic	Condition features, thresholds, dashboards, event capture
Pro	Fatigue ledger + component RUL + inspection workflow + recommended actions
Enterprise	Fleet analytics, recipe clustering, automated warranty/service insights

Three-Month Sprint Plan¶

Period	Milestones
Week 1–2	Contract/SOW + KPI definitions; failure-mode workshop; OT security and network plan
Week 3–4	Sensor kit finalised; install drawings; PLC/drive tag mapping; edge box imaging
Week 5–6	Install + commissioning; golden run baseline; event capture verification
Week 7–8	Fatigue ledger v1 (rainflow + Miner); anomaly baseline; alert routing to service workflow
Week 9–10	Tuning + validation (inspections/NDT tie-in); KPI measurement starts; ROI worksheet populated
Week 11–12	Pilot results report; rollout kit v1; commercial proposal for 5–10 machine rollout

Key Assumptions and Risks¶

Assumptions (state explicitly in pitch): - Exact subsystem geometry and known field failure history must be finalised in a workshop before installation — hotspot selection and sensor count cannot be finalised from this document alone - PLC/drive accessibility and protocol support depend on machine generation and customer IT/OT policies; plan assumes read-only access via OPC UA or gateway mapping is feasible - The monitoring agent is decision-support, not a safety function — if later positioned as a safety function, requirements change substantially - ROI ranges use published benchmarks as framing placeholders only; customer-specific downtime costs and event rates must be substituted for a defensible business case

Primary risks: - Integration friction (PLC access, OT security policies) can stall momentum — address with IT/OT security whitepaper and pre-agreed interface contract upfront - Sensor installation window requires machine downtime — must align with planned maintenance schedule - Insufficient labelled failure data from pilot machine history reduces model validation quality in Phase 2

Standards Referenced¶

Standard	Application
ASTM E1049	Rainflow cycle counting procedure
FKM Guideline	Analytical fatigue strength assessment for mechanical components
IIW Recommendations	S-N detail categories and hot-spot fatigue for welded joints
Eurocode 3	Fatigue assessment for steel structures
ISO 17359	Procedures for setting up a condition monitoring programme
ISO 13374-1	Software specifications for CM data processing and communication
ISO 13379	Data interpretation and diagnostics concepts
ISO 16063-21	Accelerometer calibration methods
ISA/IEC 62443	OT cybersecurity — zones, conduits, lifecycle responsibilities
OPC UA 1.04	Secure industrial communication protocol
OASIS MQTT 5.0	Lightweight publish/subscribe telemetry protocol
EU Machinery Regulation (EU) 2023/1230	Applies from 20 Jan 2027; includes cybersecurity requirements

Technical Pitch Slides — FEA post-processing approach (complementary to sensor-based monitoring)
Problem Brief — Industrial Equipment
Target Companies