Confidence in Numbers,
Fragile Decisions
Composite geotechnical metrics can clarify decisions — and quietly distort them.
In mine geotechnical engineering, I am less worried about the number itself than about the confidence we start to place in it.
That is especially true for composite geotechnical metrics — confidence scores, reliability indices, design acceptance ratings, and similar dashboard-style measures built from multiple weighted inputs. These metrics are useful. They make uncertainty easier to communicate. They make comparisons easier. They help bring discipline to discussions that would otherwise remain vague.
That is exactly why they matter.
It is also why they need to be handled carefully.
This is not a criticism of the work of Steffen, Terbrugge, Read, Stacey, Dunn, and others. Their contribution was important. They helped move mine geotechnics away from loose intuition and toward more explicit thinking about consequence, risk appetite, study stage, and design confidence. That was a real step forward.
The weakness begins later, and it is not personal. It is structural.
Once many partial judgements are aggregated into a single confidence-style metric, the resulting number can start to look more robust than the underlying geotechnical understanding actually is.
That is the issue.
I am not speaking here mainly about direct engineering outputs such as factor of safety or probability of failure. Those are also conditional, but at least they usually point back to a specific mechanism, a specific model, and a defined set of assumptions. My concern is narrower: synthetic assurance metrics that compress geology, structure, hydrogeology, model quality, implementation readiness, monitoring maturity, and related factors into one summary expression of confidence.
Those metrics do not measure the ground directly.
They summarize a view of the ground.
More precisely, they summarize how much confidence we believe should be placed in the current interpretation of the ground. That can be useful. But it can also create an impression of coherence that the underlying evidence does not fully support.
A composite metric is built through interpretation. Categories are defined. Components are selected. Weights are assigned. Evidence is scored. Partial results are aggregated into a final index or rating. By the time that number reaches a dashboard or governance pack, most of that chain has disappeared. What remains is a number that looks settled, comparable, and decision-ready.
But the ground is rarely that settled.
A confidence score may neatly combine geology, hydrogeology, structure, model quality, controls, and monitoring. Yet each of those components may still be incomplete, conditional, disputed, or uneven in quality. The final number can therefore look more stable than the understanding behind it really is.
This is where confidence in numbers begins to outrun confidence in reality.
A mine can carry a reassuring composite score while still facing the questions that actually matter: Is the domaining too coarse? Has the controlling structure really been identified? Is depressurization working as assumed? Is implementation quality slipping? Is the selected model genuinely the best interpretation, or simply the most convenient one?
Those are not side issues.
In my experience, they are often the issues that decide whether a design is robust or fragile.
They are also exactly the kinds of issues that composite metrics struggle to express.
The attraction of these metrics is obvious. They make technical complexity legible across organizational levels. They make sites easier to compare. They make trends easier to track. They give management a disciplined summary of a difficult technical case.
All of that is legitimate.
But there is an inherent weakness in the method: the clarity of the summary can exceed the clarity of the underlying technical case.
That does not happen because people are careless. It happens because the approach itself encourages compression. Many partial judgements are assembled into one synthetic number, and the coherence of that summary can begin to stand in for coherence in the geology, the model, or the decision basis.
That is why the weakness is methodological.
There is a second issue, and it is more subtle. Once a model, design, or rating becomes accepted, contradictory evidence becomes harder to absorb. New data that does not fit the prevailing picture can be treated as noise, anomaly, or local complication rather than as a signal that the interpretation itself may need to change.
This is the dissonance gap: the gap between what the current framework says is true and what the ground may already be starting to say.
A composite metric can widen that gap without meaning to. Because it provides a structured and apparently settled rating, it becomes easier to defend the summary than to reopen the underlying case.
That is not a moral failure. It is simply one of the risks of summarizing complex and uneven evidence too well.
And this matters because some of the most valuable geotechnical work is not captured well by a composite score. It is recognizing that the interpretation is too neat. It is questioning the accepted mechanism. It is admitting that the data density is not really adequate for the decision being asked of it. It is understanding that monitoring may provide visibility, but not necessarily decision security. It is preserving professional unease when the structured process would prefer closure.
That is not a failure of discipline.
That is discipline.
None of this means composite metrics are useless. It means they need to stay in their proper place.
Used well, they can structure discussion, show where confidence is stronger or weaker, help prioritize additional work, and improve consistency across sites. They can make uncertainty more discussable across technical and managerial levels.
That is valuable.
But only if the metric remains subordinate to the thinking.
The real question is not whether a composite metric is “accurate” in some abstract sense. The better question is whether it keeps the engineering conversation open. Does it help us ask better questions about the ground, the model, the assumptions, the controls, and the decision? Or does it create premature closure by turning unresolved technical judgement into an apparently settled rating?
A simple stress test gets to the heart of it:
Can this system produce a better confidence or reliability score while actual mine understanding stays the same — or gets worse?
If the answer is yes, the weakness is real.
And in many cases, the honest answer is yes. A score can improve because the framework has been standardized, because scorers have become more practiced, because reporting quality has improved, or because the weighting logic has changed — without any meaningful improvement in the mine’s understanding of the ground.
The number becomes smoother.
The dashboard becomes cleaner.
But the technical reality may not have changed at all.
That is why these metrics need unusual discipline around their use. They should not be treated as truth statements, stand-alone proxies for technical quality, or reasons to close technical debate simply because they create managerial clarity.
They should be treated as structured prompts for further judgement.
They should travel with their assumptions, blind spots, and unresolved dependencies. They should sit inside a wider operating model of geotechnical governance: model updating, observational reconciliation, implementation review, independent challenge, and explicit discussion of what remains unknown.
Most importantly, they should preserve the ability of experienced engineers to say, in effect:
the score is acceptable, but the case is not yet comfortable.
That is the space where VSKY.GEO works - focusing on decision-critical geotechnical questions where models, metrics, and monitoring need to be interpreted with discipline rather than accepted at face value. The aim is not to reject composite metrics, but to keep them in their proper role: as aids to judgement, not substitutes for understanding.
That, to me, is the practical conclusion.
Composite confidence and reliability metrics are useful when they sharpen thinking, expose weak spots, and support better decisions under uncertainty. They become dangerous when confidence in the summary exceeds confidence in the geology, structure, and behaviour it is meant to represent.
The aim is not to force the ground into neat summary numbers, but to make complexity decision-ready without disguising uncertainty. A composite score is valuable only while it remains subordinate to the reality it summarizes.
VSKY.GEO helps decision-makers distinguish between confidence in a metric and confidence in the ground itself.
Independent Geotechnical Advisory for Strategic Mining Decisions.
Strengthen your technical judgment with independent review and senior expertise.
vsky.geo@outlook.com