Exercise 1000.1 — FFNN regression for engine prediction¶
A simple neural network surrogate (PyTorch)¶
Files¶
🧪 Script
FFNN.py
🧪 Dataset
b777_engine_inputs.dat
b777_engine_outputs.dat
🧪 Supporting script
b777_engine.py
- Supporting script (model/data context):
b777_engine.py
How to run¶
From the script folder (chapters/1000_ML/scripts):
python FFNN.py
Expected outputs typically include:
- printed training progress (loss vs epoch)
- saved plots (e.g., loss history, parity plots) in the chapter outputs/ folder
Learning objectives¶
By completing this exercise, you will learn to:
- Train a feed-forward neural network (FFNN) to approximate an engineering mapping
- Split data into train/validation/test sets and evaluate generalization
- Use appropriate regression diagnostics beyond a single loss value
- Understand why scaling and data leakage matter
- Interpret ML results in an engineering way: where does the surrogate work, and where does it fail?
What you are modeling¶
You are learning a surrogate of the form:
where \(\mathbf{x}\) comes from b777_engine_inputs.dat and \(\mathbf{y}\) from b777_engine_outputs.dat.
You should treat this like a real engineering surrogate task: - inputs may have different units and scales, - outputs may have different magnitudes and sensitivities, - the model may behave poorly near the edges of the data domain.
The role of b777_engine.py is to give context on what the variables represent (and/or how the dataset was generated).
Guided questions¶
1) Data and scaling¶
- Are the input variables on comparable numerical scales?
- What happens to training if no normalization is applied?
- If normalization is used, how can data leakage be avoided
(i.e. using test-set statistics during training)?
Hint: scaling parameters should be computed using training data only.
2) Train / validation / test split¶
- How sensitive are the results to the random split seed?
- Do all outputs generalize equally well, or do some exhibit larger errors?
3) Underfitting vs overfitting¶
- If the network capacity is strongly reduced (few layers / neurons), what failure mode is observed?
- If the network capacity is strongly increased, what failure mode appears?
- How do training and validation loss curves help diagnose these behaviors?
4) What is a good validation metric?¶
- Is mean squared error (MSE) sufficient for this problem?
-
Would you prefer:
- coefficient of determination \(R^2\),
- relative error,
- mean absolute error (MAE),
- error normalized by the output scale?
Explain your choice from an engineering interpretation perspective.
5) Where does the surrogate fail?¶
Using parity plots or residual plots:
- Are errors larger near the boundaries of the input domain?
-
Are errors predominantly:
- systematic (bias), or
- random (variance)?
- Does the model violate any expected physical trends
(e.g. monotonicity with respect to certain inputs)?
Student tasks¶
Task 1 — Baseline training and evaluation (core)¶
Run the script as provided and report:
- final training loss,
- final validation loss,
- test-set performance using at least one metric of your choice.
Produce: - one parity plot (predicted vs true) for at least one output variable.
Write 6–10 lines discussing whether the surrogate is acceptable for engineering use, and justify your assessment.
Task 2 — Architecture study (capacity vs generalization)¶
Train at least three different networks by varying one architectural choice:
- number of hidden layers, or
- number of neurons per layer, or
- activation function (if supported by the script).
Deliver: - training and validation loss curves for each case, - a short interpretation explaining:
- where underfitting occurs,
- where overfitting begins,
- where performance saturates.
Task 3 — Data efficiency experiment¶
Repeat the training using reduced fractions of the available dataset (e.g. 20%, 50%, and 100% of the training data).
Deliver: - a plot of test error versus number of training samples, - a short paragraph answering:
- how much data is required to reach acceptable accuracy,
- which outputs are most data-hungry.
Task 4 — Engineering interpretation (short essay)¶
In 10–12 lines, answer the following:
- What makes this neural-network surrogate trustworthy (or not)?
- What additional validation steps would you require before using it inside a mission analysis or design loop?
- What risks arise when optimizing designs using a surrogate that is inaccurate near the boundaries of the input domain?
Limitations (important)¶
This exercise demonstrates supervised regression with a feed-forward neural network, but it does not address:
- uncertainty quantification or confidence intervals,
- extrapolation outside the data domain,
- physical constraints (e.g. conservation laws, monotonicity),
- bias or sparsity in the training dataset,
- coupling with optimization or decision-making loops.
The FFNN should be treated as a conditional surrogate: its predictions are meaningful only within the region of the input space covered by the training data.