Skip to content

Exercise 1000.1 — FFNN regression for engine prediction

A simple neural network surrogate (PyTorch)

Files

🧪 Script
FFNN.py

🧪 Dataset
b777_engine_inputs.dat b777_engine_outputs.dat

🧪 Supporting script
b777_engine.py

  • Supporting script (model/data context): b777_engine.py

How to run

From the script folder (chapters/1000_ML/scripts):

python FFNN.py

Expected outputs typically include:

  • printed training progress (loss vs epoch)
  • saved plots (e.g., loss history, parity plots) in the chapter outputs/ folder

Learning objectives

By completing this exercise, you will learn to:

  • Train a feed-forward neural network (FFNN) to approximate an engineering mapping
  • Split data into train/validation/test sets and evaluate generalization
  • Use appropriate regression diagnostics beyond a single loss value
  • Understand why scaling and data leakage matter
  • Interpret ML results in an engineering way: where does the surrogate work, and where does it fail?

What you are modeling

You are learning a surrogate of the form:

\[ \mathbf{x} \rightarrow \mathbf{y} \]

where \(\mathbf{x}\) comes from b777_engine_inputs.dat and \(\mathbf{y}\) from b777_engine_outputs.dat.

You should treat this like a real engineering surrogate task: - inputs may have different units and scales, - outputs may have different magnitudes and sensitivities, - the model may behave poorly near the edges of the data domain.

The role of b777_engine.py is to give context on what the variables represent (and/or how the dataset was generated).


Guided questions

1) Data and scaling

  • Are the input variables on comparable numerical scales?
  • What happens to training if no normalization is applied?
  • If normalization is used, how can data leakage be avoided
    (i.e. using test-set statistics during training)?

Hint: scaling parameters should be computed using training data only.


2) Train / validation / test split

  • How sensitive are the results to the random split seed?
  • Do all outputs generalize equally well, or do some exhibit larger errors?

3) Underfitting vs overfitting

  • If the network capacity is strongly reduced (few layers / neurons), what failure mode is observed?
  • If the network capacity is strongly increased, what failure mode appears?
  • How do training and validation loss curves help diagnose these behaviors?

4) What is a good validation metric?

  • Is mean squared error (MSE) sufficient for this problem?
  • Would you prefer:

    • coefficient of determination \(R^2\),
    • relative error,
    • mean absolute error (MAE),
    • error normalized by the output scale?

Explain your choice from an engineering interpretation perspective.


5) Where does the surrogate fail?

Using parity plots or residual plots:

  • Are errors larger near the boundaries of the input domain?
  • Are errors predominantly:

    • systematic (bias), or
    • random (variance)?
    • Does the model violate any expected physical trends
      (e.g. monotonicity with respect to certain inputs)?

Student tasks

Task 1 — Baseline training and evaluation (core)

Run the script as provided and report:

  • final training loss,
  • final validation loss,
  • test-set performance using at least one metric of your choice.

Produce: - one parity plot (predicted vs true) for at least one output variable.

Write 6–10 lines discussing whether the surrogate is acceptable for engineering use, and justify your assessment.


Task 2 — Architecture study (capacity vs generalization)

Train at least three different networks by varying one architectural choice:

  • number of hidden layers, or
  • number of neurons per layer, or
  • activation function (if supported by the script).

Deliver: - training and validation loss curves for each case, - a short interpretation explaining:

- where underfitting occurs,
- where overfitting begins,
- where performance saturates.

Task 3 — Data efficiency experiment

Repeat the training using reduced fractions of the available dataset (e.g. 20%, 50%, and 100% of the training data).

Deliver: - a plot of test error versus number of training samples, - a short paragraph answering:

- how much data is required to reach acceptable accuracy,
- which outputs are most data-hungry.

Task 4 — Engineering interpretation (short essay)

In 10–12 lines, answer the following:

  • What makes this neural-network surrogate trustworthy (or not)?
  • What additional validation steps would you require before using it inside a mission analysis or design loop?
  • What risks arise when optimizing designs using a surrogate that is inaccurate near the boundaries of the input domain?

Limitations (important)

This exercise demonstrates supervised regression with a feed-forward neural network, but it does not address:

  • uncertainty quantification or confidence intervals,
  • extrapolation outside the data domain,
  • physical constraints (e.g. conservation laws, monotonicity),
  • bias or sparsity in the training dataset,
  • coupling with optimization or decision-making loops.

The FFNN should be treated as a conditional surrogate: its predictions are meaningful only within the region of the input space covered by the training data.