Did your AI model just make that up?

I used to be able to spot an LLM hallucination a mile away. The writing had a whiff about it — too confident, slightly off-topic, a citation that didn’t quite exist. You’d read it, squint, and go: no, that’s not right.

That job is getting harder. Not because the models have stopped making things up. They haven’t. The problem is that they’ve got much better at making things up plausibly.

Why this is a problem we can’t engineer away

Every large language model — Claude, ChatGPT, Gemini, Copilot, the lot — is at its core a probability engine. It predicts the “next token” based on everything that came before. That’s a remarkable trick when the training data is rich and the question is well-posed. It’s less useful when the model has to reach for something outside its training or retrieval context, because the same machinery that produces a correct answer also produces a confident-sounding wrong one. The models can represent uncertainty internally — token probabilities give you something to work with — but there’s no reliable mechanism that surfaces “I don’t know” to the user. It’s a calibration and training-incentive problem, not a capability one.

This isn’t just my view. In September 2025, OpenAI’s own researchers published a paper arguing that hallucinations can be understood, in part, as errors in binary classification — a predictable consequence of how models are trained and how we evaluate them. Accuracy-only leaderboards, they point out, reward confident guessing over saying “I don’t know.” Other researchers have gone further: one group showed, under formal assumptions about computability and training data, that hallucination is unavoidable in any computable LLM, while another used Gödel-style reasoning to argue that hallucination risk persists across every stage of the LLM pipeline. Even an earlier STOC 2024 paper by Kalai and Vempala suggested that rare facts in training data place a lower bound on the hallucination rate of any well-calibrated model. These are theoretical results with real assumptions behind them, but the direction is consistent: hallucination isn’t going to be patched out.

Reasoning models — the frontier stuff from the last eighteen months — help, but they don’t solve it. What they do is take more turns: they draft, critique, redraft, search, reconsider. Given enough compute and enough tool access (web search, code execution, document retrieval), a reasoning model can catch its own errors. Given a single quick turn on a question outside its training data, it often won’t.

And there’s a twist. OpenAI’s own system card for o3 and o4-mini, published in April 2025, reported that o3 hallucinates around 33% of the time on the PersonQA benchmark — a difficult factual-recall task — and o4-mini around 48%, roughly double the rate of their o1 and o3-mini predecessors. That’s one benchmark, not a universal regression, but the New York Times covered the trend in May 2025: on several benchmarks, hallucination rates in newer reasoning models have gone up, not down. Models are tackling harder tasks and attempting more claims; the absolute number of confident-but-wrong statements rises along with the confident-and-right ones. So the ability to sound right is increasing faster than the ability to be right. That’s the problem.

What a hallucination actually looks like now

Two years ago an obvious hallucination might have been a wrong date, a fabricated quote, a made-up case law reference. You could fact-check it in thirty seconds.

Today’s failures are subtler. A Python script that uses a function with the right name but wrong arguments. A network configuration that’s valid syntax but wrong for your vendor’s firmware version. A summary of a document that captures the shape of the argument but inverts one crucial claim. A legal clause that reads professionally but cites the wrong Act. The shell is convincing. The filling isn’t.

I’ve seen this in my own work, more frequently than I did even a year ago, and colleagues have noticed the same thing. An LLM recently drafted a paragraph for a colleague about a piece of telecoms hardware, confidently asserting that it was “the same ATA our CloudPBX team already ships into office deployments.” We don’t ship it. The model had never been told we did. It reads as the kind of thing a helpful internal colleague would write — specific, plausible, in the right voice — which is exactly what makes it dangerous. In another session, pushed on where a particular claim had come from, the model simply admitted it had made the line up: it had extrapolated from a third-party review and the general reputation of a similar product, then written it in first-person company voice because it sounded convincing. That’s an unusually honest self-report and worth taking seriously. It’s the machinery describing itself: I wrote what sounded right.

This is where the structural point from the last section stops being abstract. It’s also something people often miss: even when you give an LLM good source material to work from, it doesn’t just quote you back. It summarises, re-references, and reformulates internally — and things get distorted in that process. The research community calls this faithfulness hallucination, to distinguish it from the making-stuff-up variety. The canonical survey by Huang et al. describes it as “context inconsistency” — cases where the model “ignores or alters important facts within the original text.” A paper from January 2026 found a tendency for faithfulness to degrade toward the end of longer responses — the model drifts from the source the further it gets into its own output. So that long, fluent, well-cited summary of your policy document? The bit at the bottom is, on the evidence, the most likely place for something to be subtly wrong.

Ethan Mollick has made a related point: because LLM errors are, by construction, plausible, users “fall asleep at the wheel.” And it’s not a new concern — a 2023 Science Advances study found that in controlled conditions, participants were more likely to believe AI-generated disinformation than the human-written equivalent. The prose was simply better. That was GPT-3. The gap has only widened.

This is not a failure mode anyone can detect by vibe-checking the output.

What actually works

There’s a temptation to treat this as a novel problem requiring novel tools. It isn’t. Academia has been dealing with confident-sounding-but-wrong writing for about four hundred years, and the mechanism we landed on is peer review. Someone else reads your work and tries to break it.

That’s exactly what works with AI output too. A few patterns I rely on:

Ask the same model to critique its own answer, in a fresh session. Not “check your work” — that rarely does much. Instead: “Here’s a proposed solution to X. What’s wrong with it? What assumptions is it making? What would cause this to fail?” The fresh session matters, because the model isn’t anchored to defending what it just wrote. This is the idea behind Self-Refine, a NeurIPS 2023 paper that reported improvements of around 20% on several tasks from iterative self-critique, without any additional training. There’s a catch, though: a 2024 paper from Stechly, Valmeekam and Kambhampati found that pure self-critique can actually make reasoning performance worse on some tasks — the model talks itself into the wrong answer. Which leads to the next pattern.

Cross-examine with a different model. If I’ve had Claude draft a tricky piece of network config, I’ll often paste it into ChatGPT or Gemini and ask what it would do differently and why. The disagreements are where the interesting errors live. The models have different training data, different reasoning styles, different blind spots. Where they all agree, my confidence goes up — but it’s not proof. Frontier models are trained on heavily overlapping corpora and can confidently agree on the same wrong answer, especially on topics where the public internet is itself wrong. So agreement is a signal, not a guarantee; disagreement is what tells me I need to go and read the documentation myself. This pattern is exactly the approach taken in Du et al.’s “Multiagent Debate” paper at ICML 2024, which found that having multiple LLM instances debate an answer over several rounds measurably reduces hallucination. Google DeepMind’s FACTS Grounding benchmark operationalises the same idea at institutional scale, using an ensemble of Gemini, GPT-4o and Claude to judge factuality.

Make the model do the research. For anything involving facts that change — pricing, versions, API behaviour, legislation — don’t trust the weights. Make the model search, cite the source, and then check the source actually says what it claims. Retrieval doesn’t eliminate hallucination — an EMNLP 2023 benchmark called ALCE found that even the best systems miss full citation support about 50% of the time, and as noted above, even with a good source document in hand, models can drift from what it actually says — but it massively reduces fabrication when combined with a human checking the citations. Vendors are starting to build this in: Anthropic’s Citations API, launched in January 2025, returns direct quotes from supplied source documents, and at least one customer reported reducing “source confabulations from 10% to zero.”

Build the friction in before you need it. The cheapest time to catch a hallucination is before you’ve acted on it. That means reviewing AI output before you paste it into the production config, send it to the client, or quote it in the board paper. Simon Willison puts it well: hallucinated code is actually the least dangerous failure mode, because the compiler tells you immediately. It’s the hallucinations that read clean and pass initial review that hurt. Treat the first draft as a draft.

What this means for how we use AI at work

At RWTS we do a lot of work where the cost of a confident wrong answer is high — network changes, security configurations, compliance documents. The teams who get good results with AI aren’t the ones who’ve bought the most expensive model. They’re the ones who’ve built review into the workflow. A senior engineer reviews the AI-drafted config. A peer reads the AI-drafted proposal before it goes out. The AI is a junior colleague with a photographic memory and occasional blind spots, not an oracle.

That framing isn’t mine — Ethan Mollick has been arguing for years that LLMs are best thought of as “weird, somewhat alien interns that work infinitely fast and sometimes lie to make you happy.” Simon Willison goes further and calls them “a growing army of weird digital interns who will absolutely cheat if you give them a chance.” The operational answer, in both cases, is the same: tests, specs, code review, and a human in the loop who actually knows the domain.

That’s the one I’d offer to anyone trying to work out how much to trust these tools. You wouldn’t let a talented graduate push code to production unreviewed on day one. You wouldn’t let them send a client a statement of work without someone else reading it. The same rules apply.

Hallucinations aren’t going away. They’re a property of how these systems work, not a bug that’ll be patched out in the next release. The models will keep getting better at sounding right. Our job is to make sure the review process keeps up.


A note on process: I drafted this post with Claude’s help, then had a different model review it for overstated claims and weak citations. Several paragraphs got tightened as a result. Which is, roughly, the point.


I’m the Director and CTO over at Real World Technology Solutions. RWTS helps organisations get real value from AI without betting the business on it. Call us on 1300 798 718.

Reading the WHS Eldership Survey: Context and Questions

Work in progress. This resource is still being refined, and there will likely be updates over the next week or so as people read it and I have the opportunity to help refine it. Part of the reason I’m putting it out now is to help that conversation start. If you spot something that needs fixing, sharpening, or rethinking, please let me know.
View the survey here: surveymonkey.com/r/WHSEldershipSurvey2026 — Closes 25 May 2026

The PCNSW is consulting on a proposed change to restrict future eldership to men.

The consultation is being run as a Work Health and Safety process, with a survey that closes on 25 May. If you’ve been sent the link and weren’t sure how to engage with it, this post is for you.

I respond to legislative and regulatory consultations regularly, and I sit on committees that do the same. When I read a consultation paper, I need to understand what I’m being asked and why. What does the evidence say? What are the risks? What are the alternatives?

The PCNSW survey asks important questions, but it doesn’t provide any explanatory context tohelp you answer the questions well. So I went looking for that context. This is what I found.

A few things worth saying upfront.

This isn’t an attempt to argue the theological question of whether Scripture requires male-only eldership. That conversation is being had elsewhere, by people more qualified than me to have it. What this does ask is whether the process, the safeguards, and the evidence are adequate for a change of this magnitude. Those are governance questions, not theological ones, and they deserve honest engagement with the evidence we have. It’s also not a committee document. I’m a member and lay leader at my own church. I don’t have standing in the Assembly, and this hasn’t been produced in consultation with the WHS Consultative Committee or any denominational body.

What I do bring is experience in governance across business, not-for-profit, and community organisations, and time spent working on child safety policy. That work has shaped how I think about authority, accountability, and what happens when institutions get the structures wrong.

One thing readers should know: the evidence on the questions this survey asks runs predominantly in one direction — toward caution about the proposed change. That’s not because I’ve been selective. It’s because the evidence we’d need to feel confident about this isn’t there.

  • No public systematic evaluation has been identified showing what happened after any denomination restricted or removed women from governance.
  • No baseline assessment, no transition measurement, no outcome evaluation has been published for any Australian state church that moved to male-only eldership.
  • No binding structural safeguards specifically designed for all-male governance have been identified in any complementarian denomination, anywhere.
  • The alternative participation mechanisms that exist — advisory committees, women’s ministry groups, “co-worker” roles — don’t appear to have been independently evaluated.

That evidence gap is itself worth weighing. It is usual to see that institutions that are confident the change had worked well have measured it.

Where things get more complicated is in our own context. Most PCNSW congregations don’t currently have women serving as elders. In many churches, sessions have been all-male for years or always. The Code permits women to serve as ruling elders — and has done since the 1967 GAA — but most churches have not exercised that permission.

For some congregations, the proposed change would formalise what is already the case. For the small number of churches that do have women elders, current women elders would continue to serve in that role, but no new women could be elected. The door closes permanently for everyone.

There’s a real difference, though, between a church that has chosen its current arrangement and a church that is prohibited from doing otherwise. The current Code preserves the freedom of each congregation to discern, under the guidance of Scripture and the Spirit. The proposed change removes that freedom denomination-wide. Even for churches that haven’t exercised that freedom, that’s a different thing.

What follows is structured around the survey itself. Each section gives you some evidence — research, the Royal Commission, the experience of comparable denominations, our own data — and a set of reflective questions designed to help you think through your own answer.

A note before you start: this covers a lot of ground. You don’t need to read it all at once, and you don’t need to do the survey straight away. Read a section. Sit with it. Talk about it with someone — a friend, your home group, your spouse, your minister. The questions are sharper when you’ve heard how someone else answers them. If your session is willing to host a conversation, this could serve as a starting point.

With thanks to Valerie Ling (Centre for Effective Serving), a registered clinical psychologist and supervisor, for sharing her clinical and doctoral research into psychosocial safety, governance, and gender in church contexts which helped inform the framing of the WHS evidence here.


A note on how to read this

We’re a Reformed church. We believe Scripture is the supreme authority for faith and practice. We also believe, as the Westminster Confession teaches, that God has given us reason, conscience, and the experience of the church across the ages as gifts for understanding how to apply Scripture wisely. The Reformers built Presbyterian governance on the conviction that all human authority is fallible and must be accountable. They insisted on plurality of elders, parity between ministers and ruling elders, and the right of appeal to higher courts — precisely because they took total depravity seriously as a reality that shapes how institutions must be designed.

The questions in this survey are governance questions. They deserve both spiritual discernment and honest engagement with the evidence. The two are not in competition. A church that ignores evidence is not being more faithful — it is being less careful.

What follows is written for churches in every situation: those that have women elders and would lose the option of electing more, those that have never had women elders and may see no immediate change, and those in between. Wherever your church sits, the question the survey is asking is the same: what are the impacts if the door is permanently closed?


What you should know before you start

The survey exists because changes to governance structures can create psychosocial risks — risks to the psychological health and safety of the people in our churches. Under NSW law, PCBUs must identify and manage reasonably foreseeable risks to health and safety, including psychosocial risks, in connection with proposed changes and on an ongoing basis, not only after changes have been implemented.

The PCNSW’s own WHS Manual acknowledges the denomination’s legal obligations as a Person Conducting a Business or Undertaking (PCBU) under the Work Health and Safety Act 2011 (NSW). Volunteers, including elders, are legally defined as “workers” under this Act. The Manual opens by affirming both a theological duty of care (love your neighbour) and a legal duty of care (comply with WHS) as a single commitment. The denomination itself has linked these two frames. They cannot later be separated when it is convenient to do so.

Under section 48 of the Act, consultation about changes affecting health and safety requires that four elements are present: sharing relevant work health and safety information with workers; giving workers a reasonable opportunity to express their views and raise health or safety issues; giving workers a reasonable opportunity to contribute to the decision-making process relating to the health and safety matter; and taking workers’ views into account and advising them of the outcome in a timely manner. The survey includes free-text boxes and open questions, and the Consultative Committee has indicated it will summarise responses from the survey and other consultations. On the face of the survey and the public material, information-sharing is clear, but the wider consultation process is not yet transparent enough to assess how the remaining section 48 elements will be met. It is possible that further consultation steps are being undertaken by the Committee that are not yet publicly visible — this resource can only assess what is in the public record. SafeWork NSW’s guidance is clear that there is no set way consultation must occur — but it must be genuine, and workers must have a reasonable opportunity to express views and contribute to decisions.

In September 2025, SafeWork NSW issued a prohibition notice requiring the University of Technology Sydney to halt a redundancy process, not because redundancies are unlawful in themselves, but because the way the process was being managed created a serious and imminent risk of psychological harm. This appears to be the first publicly reported instance of SafeWork NSW using a prohibition notice to stop a redundancy or restructuring process on psychosocial safety grounds.

Why does this matter for your church? Professor Tuckey’s research with a major Australian retailer showed that when organisations change the structures that govern people’s working lives — reporting lines, authority, who has a voice and who doesn’t — the way they manage that change determines whether people are protected or harmed. When the retailer redesigned its conditions through structured risk surveys, co-design with affected workers, and measured follow-up, bullying dropped. When institutions skip those steps, the process itself can become the source of harm. The UTS notice shows that regulators will now treat poorly managed change processes as psychosocial hazards in their own right. The proposed change to the PCNSW Code would alter who governs, who is heard, and who has formal standing in every congregation. That is exactly the kind of structural change that requires careful management. If it proceeds without adequate risk assessment, without genuine consultation, and without safeguards in place, the people most likely to bear the cost are the women who serve faithfully, the volunteers who raise concerns, and the children whose safety depends on governance that listens.

Who carries the legal duty? The PCNSW’s own WHS Guidelines state that each congregation is a PCBU, and that “much of this responsibility falls on the Committee of Management.” Officers of a PCBU must exercise “due diligence” to ensure compliance. The Guidelines are explicit: “It is not a defence to claim, ‘We can’t do anything about it’ or ‘We’ve been doing it this way for years.'” Under the Act, an officer is anyone who makes, or participates in making, decisions that affect the whole or a substantial part of the undertaking — which in the context of a congregation includes elders, ministers, and members of the committee of management. The decision on whether to change the Code rests with the members of the Assembly and, under the Barrier Act, with the members of presbyteries who must ratify it. But every officer at the local level has a duty of care for the safety of the people in their congregation during the consultation process that is happening now.

Questions worth sitting with

  • Does your session know about these legal obligations? Have they discussed them with your congregation?
  • Has anyone in your church been given the opportunity to raise issues about this proposed change that go beyond the preset questions in this survey?
  • If you have concerns that aren’t captured by the survey questions, how would you raise them?
  • Have you considered what governance, due process, and complaint pathways will need to be in place at the local level if this change proceeds, and what psychosocial risks those processes may create for ministers, elders, staff, and volunteers?
  • Have you thought about what psychosocial hazards might arise in your own church from the way this debate and survey are conducted — for example, conflict, exclusion, distress, loss of trust, or fear of speaking up?
  • Have you ensured that people in your church can participate in the survey with enough information to understand potential impacts, and with genuine freedom to answer honestly?
  • Before this change proceeds, have you, in your role as an elder or leader, been meaningfully consulted about the WHS implications you will be responsible for managing — and have you had the chance to raise your own concerns?

Section 2: Church Environment and Culture

Survey Q1: “If women are unable to serve as elders, do you feel that quality of leadership will be affected?”

Evidence: Both sides of the theological debate in the PCNSW acknowledge that governance bodies comprising men and women working together show stronger governance inputs — better monitoring, greater accountability, and more rigorous challenge. The Elders and Deacons Committee’s own 2021 report cited this evidence. Research on corporate and nonprofit boards finds that gender-diverse boards show stronger monitoring, better attendance, and greater accountability (Adams & Ferreira, 2009; Buse, Bernstein & Bilimoria, 2016). Irving Janis’s foundational work on groupthink identified homogeneity of membership as a structural fault that makes groups stop questioning themselves.

An important nuance: Adams and Ferreira found that gender diversity’s effect depends on existing governance quality. In organisations that were already well-governed — with strong accountability, active oversight, and robust challenge — adding diversity provided diminishing returns and could even reduce performance through over-monitoring. But in poorly-governed organisations, where scrutiny was weak and challenge was absent, gender diversity added significant value. An ANU review confirmed that the average effect across all organisations is close to zero — but the governance-specific benefits (improved scrutiny, stronger challenge, greater sensitivity to ethical risk) are robust. The question for the PCNSW is: are all our sessions in the “well-governed” category? The denomination has never measured this.

Questions worth sitting with

  • If your church has women elders: has there been a time when a woman elder brought a perspective or noticed something that might have been missed by an all-male group? What would be lost if that voice were no longer at the table?
  • If your session is already all-male: how are women’s perspectives currently represented in governance decisions? Is that arrangement formal or informal, and does it carry real weight? Could your congregation be strengthened if a gifted woman were able to serve as an elder in the future?
  • The research suggests that sessions which already consult women effectively, handle complaints well, and take safeguarding seriously may not need the structural safeguard of women elders to achieve good governance outcomes. But sessions where consultation is thin, complaints are handled informally, and the minister’s authority goes largely unchecked are exactly where diverse governance adds the most value. The proposed change applies to both. Which category is your session in — and how would you know?
  • The current Code preserves the freedom of each congregation to discern whether women should serve as elders. The proposed change removes that freedom for every church in the denomination. Even if your church has not exercised that freedom, is there value in having it?
  • Presbyterian governance was built on plurality and accountability — the conviction that unchecked authority corrupts. The Westminster divines established the ruling elder as a representative of the whole congregation. If ruling elders represent the congregation, and the congregation includes women, what does it mean for that representation when women cannot serve?

Survey Q2: “Do you believe this proposed change could impact the church’s culture?”

Evidence: The Australian Royal Commission into Institutional Responses to Child Sexual Abuse (2017) found that the absence or insufficient involvement of women in leadership and governance in religious institutions negatively affected decision-making and accountability, and may have contributed to inadequate responses to child sexual abuse. The Commission identified clericalism — the elevation of ordained or ministerial status above laity — as a significant contributing factor. Recommendation 16.37 called for child-safety advisory mechanisms that include lay men and women with relevant expertise.

Organisational climate research: Professor Maureen Dollard’s Psychosocial Safety Climate (PSC) framework — developed at the University of South Australia and studied across 38 research studies internationally (Amoadu et al., 2024) — demonstrates that the policies, practices, and procedures set by senior leadership determine the psychological safety of everyone underneath them. PSC is what researchers call the “cause of the causes”: when it is low, bullying, harassment, and burnout follow predictably. Governance changes that concentrate authority and reduce the diversity of voices at the top directly affect an organisation’s psychosocial safety climate. This is not a corporate abstraction — it describes the conditions that shape whether people in your church feel safe to speak, to raise concerns, and to trust that leadership will respond.

Experience of comparable churches: The Save the PCA “Functional Female Officer Report” (2025) surveyed 1,964 churches in the Presbyterian Church in America (not to be confused with the Presbyterian Church of Australia) — a denomination that has never ordained women as ruling elders — and found 9.5% had some form of functional female officer: 4.0% with women performing elder-like functions and 6.0% with women performing deacon-like functions. Practices included leading worship elements, sitting in session meetings as “elder advisors,” going through officer training, and being commissioned with public vows — all without formal office. Male-only ordination, in practice, can generate pressure toward informal workarounds that sit uneasily with the denomination’s own polity.

The Australian experience: Five of Australia’s six state Presbyterian churches moved to male-only eldership between 1984 and the early 2000s. Queensland was first in 1984; Victoria followed in 1997–98; Tasmania, Western Australia, and South Australia followed shortly after. NSW remains the sole holdout. None of these state churches conducted a baseline assessment before the change, measured impact during the transition, or evaluated outcomes afterward. Alternative participation mechanisms were created — women’s ministries committees, advisory groups, “fixed orders of the day” allowing women to comment but not vote at meetings — but none has been independently evaluated for effectiveness.

Our own denomination’s data: The Women’s Ministry Committee of the Presbyterian Church of Australia conducted a survey across the PCA in 2020 (484 respondents: 324 women, 162 men) and presented its findings to the 2023 GAA. The results came from our own churches. More than 50% of respondents could not affirm that women trust the elders with issues specifically related to women such as domestic violence and sexism. Fewer than 50% of women believed they were consulted by elders regarding church direction. More than 30% of women felt limited in what they could do in church. On every major question, there were statistically significant differences between how men and women perceived the same reality — a communication gap the WMPCA report itself identified as needing urgent attention.

Questions worth sitting with

  • The PCNSW has permitted women ruling elders since the 1967 GAA, though most congregations have not elected them. For churches that have: what would change in your culture if no new women could be ordained? For churches that haven’t: does the fact that the permission exists shape your culture in ways you might not notice until it is gone?
  • If the change proceeds, how would your church ensure that women’s voices continue to shape governance decisions? Would that mechanism be formal or informal? Would it carry voting power, or only advisory status? Is there a woman in your congregation who, under different circumstances, might be called to serve as an elder? What would it mean to her — and to your church — if that call could never be tested?
  • The Royal Commission examined what happens in religious institutions when governance becomes exclusively male. You may not think your church is at risk of those failures. But Reformed theology teaches us to build structures for the reality of sin, not the aspiration of godliness. What structures would your church need if this change proceeds?
  • Have you seen examples — in your own experience or in other churches — where the presence or absence of women in leadership affected the culture or decision-making of the community?

Survey Q3: “Do you believe the proposed change could have an impact on interpersonal relationships or team dynamics in your local church?”

Evidence: Research on volunteer organisations finds that when people perceive they have been excluded from decision-making or that their role has been diminished, their engagement drops and relationships within the team become strained (Allen & Mueller, 2013). This is not about theology — it is about how human beings respond when the terms of their participation change. In a local church, the proposed change would alter the relationship between women who serve and the governance structures they serve under. Women who have been elders, or who aspired to be, would need to find a new understanding of their place. Men who valued their female colleagues on session would lose those working relationships. The dynamics affect everyone, not just those directly excluded.

Experience of comparable churches: When the Christian Reformed Church opened all offices to women in 1995, 36 churches with approximately 7,500 members left the denomination. Governance changes on this question fracture communities in both directions. In churches where relationships are strong and consultation is genuine, the change may be absorbed without visible damage. But in churches where relationships are already strained, where women feel undervalued, or where the debate has been conducted without listening to the people most affected, the formal act of closing the door can crystallise tensions that were previously manageable. The proposed change applies uniformly to every congregation regardless of its relational health.

Questions worth sitting with

  • Think about the women in your congregation who currently serve in any form of leadership — as elders, as ministry workers, as volunteers who carry real responsibility. How would they experience this change? Have you asked them?
  • If your church has a woman elder: what is it like for her to serve on a session alongside colleagues who may have supported the overture to end her office? What does that dynamic do to the working relationship, the trust, and the ability to govern together during the transition?
  • Are there people in your church who might leave if this change is made? Are there people who might leave if it isn’t? What would either departure mean for your community?
  • Even in churches without women elders, the permission for women to serve sends a signal about how the church values women’s gifts. Removing that permission also sends a signal. How would the women and girls in your church receive it?
  • Paul writes in 1 Corinthians 12 that the body cannot say to any member, “I have no need of you.” How does your church currently say to its women, “We need you”? Would that change?

Section 3: Wellbeing and Safety

Survey Q1: “Do you believe the proposed change will impact on your emotional or psychological wellbeing?”

Evidence: Research on volunteers consistently finds that burnout and disengagement are driven by role ambiguity, lack of voice, and exclusion from decision-making (Allen & Mueller, 2013). Psychosocial safety research shows that harm from organisational change often emerges over time rather than immediately. This is relevant here: for some women, the impact of this change may not be felt at the point of announcement but later — as the change is experienced in practice, in session meetings, in pastoral care decisions, in the gradual realisation that a door has been permanently closed. People respond differently to institutional change; the survey is asking about your response, and it is worth being honest about what that might be over time, not just right now.

Questions worth sitting with

  • This question asks about your wellbeing. Take a moment to sit honestly with that. Not what you think the right answer is. Not what you think your church expects you to feel. What do you actually feel when you imagine this change taking effect?
  • If you are a woman: does this change affect your sense of belonging in your church? Your sense that your contribution is valued? Your willingness to serve?
  • If you are a woman currently serving as an elder: how does it feel to know that your denomination is considering a change that says no other woman should hold the role you hold? How will it affect your experience of serving in a role that your denomination has decided should not exist for anyone who comes after you? Has anyone asked you?
  • If you are a man: how do you think the women in your church would answer those questions? Have you asked them directly?
  • Genesis 3:16 describes a distortion of the relationship between men and women as a consequence of the Fall. Complementarian theologians including John Piper and Al Mohler acknowledge that male authority, post-Fall, tends toward sinful domination. If that’s true — and it’s in our own literature — what governance structures guard against that tendency? Does this change strengthen or weaken them?

Survey Q2: “Do you currently feel safe and supported in your local church environment?”

Evidence: The PCNSW’s Breaking the Silence program has been in place since 1997 and is mandatory for elders, ministers, and those working with children or vulnerable people. BTS is more developed than many people realise. Foundations training runs on a three-year cycle with annual Read and Review refreshers for those working with children. The CPU conducts compliance audits. The online training requires an 80% quiz score for satisfactory completion. This is real work done by serious people.

The harder question is not whether BTS exists, but what public evidence is available that it changes reporting behaviour and outcomes over time. The quiz is a knowledge check — it tests whether you know what to do, not whether you can do it. It does not assess the ability to actually handle a disclosure, manage a conflict of interest, or respond to a vulnerable person in distress. In 28 years, no public independent evaluation of BTS effectiveness or published outcome data has been identified. The system may work well — the training, audit, and complaint-handling machinery is real. But no public independent outcome evaluation has been identified showing whether BTS improves reporting behaviour and outcomes over time, and that gap matters if we are about to remove another layer of structural accountability.

A 2025 peer-reviewed study in Child Abuse & Neglect (Hunt, Higgins & Willis) interviewed 20 Christian leaders across denominations about safeguarding training and found a compliance-focused mentality rather than genuine culture change, with cultural resistance from some leaders who see safeguarding as unnecessary external regulation.

What our own denomination’s data says: The PCA’s 2020 Women’s Ministry Committee survey asked questions directly relevant to whether people feel safe and supported. More than 50% of respondents could not affirm that women who experience domestic violence know who to talk to about it within the church. More than 50% could not affirm that women trust the elders with issues specifically related to women. There were statistically significant differences between how female congregation members and male ministers answered the question of whether women trust elders with these issues. If that gap exists now — in a denomination where women can still serve as elders — what happens when that possibility is removed?

Questions worth sitting with

  • This is a baseline question. Answer it for how things are now, not how they might change. The honesty of this answer matters because it tells the Assembly what the starting point is.
  • If a woman in your church experienced harm from a male leader — bullying, harassment, spiritual abuse — where would she go? Who would she trust? Is that pathway clear, accessible, and safe?
  • Has your session ever discussed what would happen if someone made a complaint? Not in theory. Practically. Who handles it? What process is followed?
  • Do you know whether your elders and ministry workers have completed Breaking the Silence training? Could you find out?

Survey Q3: “Do you believe the proposed change could influence the amount of conflict, bullying, or harassment?”

Evidence: Bullying and harassment in organisations are not primarily caused by bad individuals. Professor Michelle Tuckey’s research at the University of Adelaide has shown that the majority traces to how an organisation is structured — its governance, its reporting lines, who holds authority, and who can challenge it. In plain terms: when your church changes who sits on session, who gets a vote, and who has formal standing, it is changing the very structures that research shows either prevent or produce harm. Tuckey’s BRIDGES at Work framework identifies formal organising arrangements as one of four subsystems that determine whether an organisation is safe. The proposed change directly alters those arrangements in every congregation.

What this looks like in churches: Research consistently shows that women in church settings already experience higher rates of burnout and lower confidence that leadership will respond to their concerns. Emerging evidence suggests women may be disproportionately affected when the organisational climate around psychological safety is poor (Amoadu et al., 2024). If the conditions are already difficult for women in churches, a governance change that reduces their formal voice does not make those conditions better.

The experience of comparable churches: These are not distant examples. The Royal Commission found that in some religious institutions — institutions led by people who believed they were serving God faithfully — the absence of women in governance negatively affected accountability in child-safety contexts. The Southern Baptist Convention’s Guidepost report (2022) documented over 700 accused abusers under exclusively male governance. The Presbyterian Church in America’s 2023 General Assembly — an all-male body — voted down all four abuse-prevention proposals put before it. The question is not whether our people would do the same. It is what structures we have in place to make sure they don’t have to.

Questions worth sitting with

  • Has your church experienced conflict, bullying, or harassment? If so, how was it handled? Would the proposed change make that handling better or worse?
  • The Reformed tradition teaches total depravity — not that every person is as bad as they could be, but that sin touches every part of human life, including the exercise of authority. We build structures — plurality of elders, higher courts, the right of appeal — because we know that good intentions are not a sufficient safeguard. Does this change add accountability or remove it?
  • The Royal Commission heard from more than 4,000 survivors of abuse in religious institutions. Its findings about male-only governance are not theoretical. They are based on evidence from institutions that believed they were serving God faithfully. How should those findings inform your answer?
  • Think about specific scenarios: a woman experiencing domestic violence, a child safety concern raised by a female volunteer, a complaint about a male leader’s conduct. Under the proposed arrangement, who hears these? Who decides what happens next?

Survey Q4: “Do you believe that this could impact the willingness of staff or volunteers to report issues such as conflict, bullying, or harassment?”

Evidence: The Presbyterian Church in America’s Ad Interim Study Committee on Domestic Abuse and Sexual Assault (2022) documented significant concerns about women’s ability to trust male-only elder bodies with issues like domestic violence and sexism. Advisory member Ann Maree Goudzwaard noted that substantial work remains before women in the American PCA can have assurance their case will be shepherded well. Murray Capill, Dean of Ministry Development at the Reformed Theological College in Melbourne, acknowledged that “from my experience and observation, lots of us don’t do that well” at consulting women — and this in a denomination that already has male-only eldership.

Governance research: There is a reasonable concern, supported by governance and institutional abuse research, that when the people receiving complaints share the demographic profile of the people being complained about, reporting confidence is affected. When there is no one on the governance body who shares the complainant’s experience, trust may diminish. The PCA (Australia)’s own 2020 survey data — where more than half of women could not affirm trust in elders with women-specific issues — is consistent with this concern.

Questions worth sitting with

  • If you needed to report a concern about a male leader’s conduct, would you feel comfortable bringing it to an all-male session? Would the women in your church?
  • Is there currently a woman in your church’s governance who would be a first point of contact for women with concerns? If the proposed change proceeds, who fills that role? What standing would they have?
  • Craig Tucker’s paper notes that dealing with complaints by women against male leaders is “particularly problematic when the case is heard by a group in which only men get a vote.” Do you agree? Does your church have any mechanism to address this problem?
  • The survey itself directs people who are feeling impacted to “reach out to your Minister, Elders or leadership.” For some respondents, these are the very people they find it hardest to approach. Does your church have an alternative pathway?

Section 4: Support and Recommendations

Survey Q1: “What additional support or resources do you believe are needed to cultivate a safe and healthy environment if the proposed change is made?”

This is an open question, and it is your opportunity to name specific, concrete measures rather than general sentiments. The evidence suggests several areas that would need to be addressed if the change proceeds. You may find these helpful as starting points for your own answer:

From advisory models to demonstrated safeguards. Alternative models for women’s participation have been proposed and in some cases Assembly-endorsed. The GANSW 2022 resolution encouraged sessions to establish women’s advisory groups, appoint women to “co-worker” positions with associate-style privileges, include women in complaints and disciplinary processes, and appointed a Women’s Engagement Working Group. The national WMPCA deliverances (2023 GAA) went further with similar recommendations at presbytery and Assembly level. This is real work. But the mechanisms are advisory (“encourage each Session to consider”), not binding. Uptake has been slow — a growing number of churches have established women’s advisory groups, but implementation is uneven. No binding, denomination-wide, independently evaluated mechanism has been demonstrated as an effective safeguard alongside this proposed change. The model on offer is partial, and its effectiveness has not been tested. The experience of other denominations suggests that advisory and associate-style mechanisms, without formal standing, often do not deliver the voice they promise.

Independent evaluation of safeguarding training. BTS has an 80% knowledge quiz and the CPU conducts compliance audits — but after 28 years, the denomination should also be able to demonstrate publicly that its training changes behaviour and outcomes. Competency-based assessment (testing the ability to handle a disclosure, not just know about one), independent external evaluation, and published outcome data would allow the denomination to know — in a measurable sense — whether its safeguarding system works.

An adequately independent complaints pathway. The PCNSW does have a Conduct Protocol Unit and a contact-person pathway intended to provide independence from the local church. The live question is whether that pathway is sufficiently independent from the denomination itself, sufficiently visible to women and volunteers, and sufficiently trusted — particularly for complaints about male leaders heard by all-male bodies.

Measurable implementation of Healthy Complementarianism. The GANSW 2022 resolution encouraged sessions to establish women’s advisory groups, appoint women to “co-worker” positions, and include women in complaints and disciplinary processes, and appointed a Women’s Engagement Working Group to bring further recommendations. At the national level, the PCA’s Women’s Ministry Committee presented a detailed strategy to the 2023 GAA with 15 deliverances along similar lines — relevant context, though national deliverances do not bind the PCNSW. Dr Murray Smith, one of the architects of the overture, has acknowledged that if the church says ‘no’ to women in eldership without also saying ‘yes’ to all the ways men and women complement each other, it has only done half the job. Whether the GANSW’s own 2022 encouragements have been implemented at session level is the question the NSW Assembly should be able to answer before voting on the overture.

Learning from complementarian denominations that have tried. The Presbyterian Church in America’s DASA report (2022) is the most comprehensive safeguarding framework produced by any complementarian denomination for all-male governance. Its 220 pages cover abuse response, whistleblower protections, and mandatory background checks. But the report is non-binding, and efforts to implement even its background check recommendations have met resistance. Nowhere in the Reformed world has a denomination produced binding structural safeguards specifically designed for all-male governance. The gap between complementarian theology — which teaches that authority should protect the vulnerable — and complementarian institutional design remains vast.

Questions worth sitting with

  • Which of these measures does your church currently have in place? Which are absent?
  • Should any of these be required before the change takes effect, or are you comfortable with them being developed after?
  • Even if your church does not currently have women elders, the proposed change closes the door for every congregation, including those where women’s gifts in governance might be most needed — small country parishes, churches in vacancy, congregations where the work of eldership falls on very few shoulders. What support would those churches need?
  • The Reformers insisted on building structures before granting authority. Knox and Calvin did not consolidate power and then design accountability later. They built the accountability first. What does that principle suggest about the sequencing of this change?

Survey Q2: “Do you have any recommendations for your local church and denominational leaders regarding health and safety as it relates to diversity and inclusion?”

Questions worth sitting with

  • What would you need to see from your session and presbytery to have confidence that this change — if it proceeds — will not diminish the safety and wellbeing of women in your church?
  • Are there practical steps your church could take now, regardless of the outcome of this vote, to strengthen the voice and safety of women in your community?
  • What would “doing this well” look like, even if you support the change? What would “doing this badly” look like?
  • If your church has never had women elders, have you considered why? It may be because your session holds a theological conviction that eldership should be male-only. It may be because the congregation discerned it was not right for your context at this time. Or it may be because the question was never asked. Each of these is a different starting point, and the proposed change affects each differently — the first formalises an existing conviction, the second removes a future option, and the third closes a door that was never opened.

Section 5: Final Comments

The survey gives you an open field. If the structured questions did not capture what you most want to say, say it here. Some things worth considering:

Questions worth sitting with

  • Is there something about this issue that keeps you awake at night? Something you haven’t been able to say to anyone in your church? This is the place for it.
  • If you are uncertain about the theological question, you are allowed to say so. The survey is not asking you to resolve the exegesis. It is asking what happens to real people in real churches if this change is made.
  • If you support the change but have concerns about how it is being implemented, this is the place to name those concerns. Supporting a theological position and questioning the process are not contradictory.
  • If your church has never had women elders, the proposed change might feel like it changes nothing for you. But consider: is there a difference between a church that has chosen its current arrangement and a church that has had the choice removed? What does that difference mean for the kind of denomination we want to be?
  • If you have experienced harm in a church setting — from male authority exercised without accountability, from complaints that went nowhere, from being told your concerns didn’t matter — your experience is evidence. It matters. You are not required to share it, but if you choose to, it will inform the Assembly’s understanding of what is at stake.

Section 6: Contact and Demographic Information

The survey’s contact details (name and email) are optional. However, the demographic and identifying questions earlier in the survey — your church name, your role, your gender, whether your church has female elders — are not marked as optional.

The committee has stated that when presenting results to the Assembly, it will aggregate the data so that no individual responses can be identified by anyone outside the survey process, including by the Assembly itself.

It is still worth being aware, however, that in smaller congregations a combination of church name, role, and gender may be enough to make a respondent identifiable within the survey data itself — even without their name attached. If the survey is asking whether you trust the current governance structures to keep you safe, and your answers could be connected to you by those with access to the raw data, that may affect how freely you respond. This is not a reason not to complete the survey — your voice matters. But it is worth knowing.

The survey notes that “certain State or Federal legislation may require the denomination to share these details.” This is a standard data-handling disclosure. If you have concerns about confidentiality, you may choose not to provide your name or email while still completing the substantive sections.

If you are feeling particularly impacted by this issue, Jericho Road provides access to counselling services. You do not need to go through your minister or elders to access this.


A note on the consultation process

The Assembly has charged sessions with responsibility for engaging in this consultation properly. That charge should be taken seriously. If your session has not discussed this survey with your congregation, has not made time for conversation about its questions, or has not explained how your responses will be used, you may wish to raise that directly.

Consultation that meets the legal and moral standard our denomination has set for itself requires more than distributing a link. It requires making space for people to be heard, especially those who may find it hardest to speak. In some churches, that will mean actively reaching out to women, to newer members, to those who serve faithfully but have never been asked what they think about how the church is governed.

The survey closes 25 May 2026. surveymonkey.com/r/WHSEldershipSurvey2026

References

Adams, R.B. & Ferreira, D., “Women in the Boardroom and Their Impact on Governance and Performance,” Journal of Financial Economics 94, no. 2 (2009): 291–309. doi.org
Allen, J.A. & Mueller, S.L., “The revolving door: A closer look at major factors in volunteers’ intention to quit,” Journal of Community Psychology 41, no. 2 (2013): 139–155. doi.org
Amoadu, M., Ansah, E.W. & Sarfo, J.O., “Preventing workplace mistreatment and improving workers’ mental health: A scoping review of the impact of psychosocial safety climate,” BMC Psychology 12, 195 (2024). pmc.ncbi.nlm.nih.gov
Australian Royal Commission into Institutional Responses to Child Sexual Abuse, Final Report, Volume 16: Religious Institutions (2017). childabuseroyalcommission.gov.au
Breaking the Silence, Foundations Training Workbook, 2024 Edition. breakingthesilence.org.au
Buse, K., Bernstein, R.S. & Bilimoria, D., “The Influence of Board Diversity on Nonprofit Governance Practices,” Journal of Business Ethics 133, no. 4 (2016): 179–191.
Catalyst, The Bottom Line: Connecting Corporate Performance and Gender Diversity. catalyst.org
Dollard, M.F. & Bakker, A.B., “Psychosocial safety climate as a precursor to conducive work environments,” Journal of Occupational and Organizational Psychology 83 (2010): 579–599.
Guidepost Solutions, Report of the Independent Investigation: The SBC Executive Committee’s Response to Sexual Abuse Allegations (2022). guidepostsolutions.com
Hunt, G.R., Higgins, D.J. & Willis, M.L., “‘Just tick the box and move on’: Australian Christian religious leaders reflect on safeguarding practices,” Child Abuse & Neglect 167 (2025). doi.org
Janis, I.L., Victims of Groupthink (1972). Houghton Mifflin.
Jensen, J., “The Functional Female Officer Report,” Save the PCA (2025). savethepca.com
Ling, V., 2023 Clergy Wellbeing Research Report. effectiveserving.com.au
Ling, V., “Evidence Base Research Map: Psychosocial Safety, Governance & Gender in the Presbyterian Church of NSW” (2026).
Nebbs, A., Psychosocial hazard management in regional volunteer involving organisations (2022). volunteeringstrategy.org.au
PCA (America) Ad Interim Study Committee on Domestic Abuse and Sexual Assault, DASA Report (2022). pcaga.org
PCNSW Women’s Ministry Committee, Assembly Resolution: Co-Heirs and Co-Workers (GANSW, July 2022). pcnswwomen.org.au
Smith, M. & Wright, F., Overture (xii): From the Special Committee on Elders and Deacons to amend The Code Part II 4.02(c) concerning male only elders (2026).
Tucker, C., Why Should We Stick With The Status Quo And Retain Female Elders? (Feb 2026). Scots Church Sydney.
Tuckey, M.R. et al., “Workplace bullying as an organizational problem,” Journal of Occupational Health Psychology 27, no. 6 (2022): 544–565. doi.org
Tuckey, M.R. et al., BRIDGES at Work (2025). bridgesatwork.au
Work Health and Safety Act 2011 (NSW). legislation.nsw.gov.au
Women’s Ministry Committee of the Presbyterian Church of Australia, On Men and Women in Ministry and Leadership in the PCA: A Report for the 2023 GAA (2023). wmpca.org.au
Yager, A., “Before We Change the Lock, We Should Build the Door,” andrewyager.com (12 April 2026). andrewyager.com


This document is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to share and adapt this material for any purpose, provided you give appropriate credit.

Before We Change the Lock, We Should Build the Door

The Presbyterian Church of NSW is consulting with its people about whether to remove women from the eldership. If you’re a member of a PCNSW church, you may have been pointed to a WHS survey asking you to assess the impact of the proposed change on wellbeing, culture, and safety. The survey is clear about what it’s doing: “The purpose of the survey is not to assess whether individuals agree or not with the proposed change but to assess the impact if all future elders are male.”

That’s the right question. And it’s a question the Assembly has asked through a Consultative Committee specifically established for this purpose. But outside the survey itself, I haven’t seen much public analysis of the evidence that bears on exactly these questions — the governance research, the Royal Commission findings, the experience of sister denominations, the state of our own safeguarding training. The survey asks whether the change could influence conflict, bullying, or harassment. It asks whether it could affect willingness to report. It asks what additional support would be needed. These are serious questions. They deserve serious answers grounded in evidence, not just feelings.

This post is my attempt to fill that gap — to lay out the evidence that bears on the questions the survey is asking — not to claim the answer is settled, but to argue the risks are serious enough that the Assembly should not proceed on trust alone. Not because the theological arguments don’t matter — they do — but because a church that changes its governance structure without counting the cost isn’t being faithful. It’s being reckless.

I should say plainly: I sit on the status quo side of this debate. I think the current arrangement — male ministers, male and female ruling elders — has served the PCNSW well, and I’m not persuaded that the case for change has been made. But this post isn’t primarily about which side is right on the exegesis. It’s about something I think both sides should care about: the consultation is asking people to assess impacts on safety, wellbeing, and culture, but it provides no evidence, no framework, and no guidance for thinking about those questions. People are being asked to answer a governance and safeguarding question armed with nothing but instinct. That’s not good enough for a decision this consequential.

I’m not an elder. I don’t have standing in the Assembly. I’m a member and lay leader in the PCNSW — someone who sits in the pew, serves where I can, and cares about what happens to the people around me. I’ve spent time working on child safety policy in community organisations, and that work has shaped how I think about governance, authority, and what happens when institutions get the structures wrong.

This post is long. The issue deserves it. I’ll try to lay out both sides of the argument fairly before explaining why I think the proposed change — even if you accept its theological premises — is being pursued in the wrong order and without the safeguards our people need.

Not all the evidence below carries the same weight. Royal Commission findings and Assembly documents are primary sources. Peer-reviewed research is stronger than commentary. I use overseas denominational examples and advocacy material to illustrate implementation risks, not to predict NSW outcomes.

What’s actually being proposed

The overture from the Special Committee on Elders and Deacons would insert a single word — “man” — into the PCNSW Code §4.02(c), so that the qualification for eldership reads: “be a mature Christian man, who demonstrates exemplary Christ-like character as defined by the Scriptures (1 Tim. 3:1–7; Tit. 1:5–9).”

Women currently serving as elders would be grandfathered. No new women could be ordained. Every church court in the denomination — session, presbytery, and Assembly — would become exclusively male. NSW is the last Australian state that permits women ruling elders. The other states moved to male-only eldership between the late 1990s and early 2000s, so the question isn’t hypothetical. We can look at what happened elsewhere.

This is not a debate about biblical authority

Both sides need to hear this, because both sides sometimes talk past it.

Craig Tucker’s paper, endorsed for wide distribution and discussion by more than thirty ministers and elders, puts it plainly: “The current debate in NSW is not a debate between those who believe in the authority of the Bible and those who do not.” Both sides affirm biblical authority. Both affirm complementarianism. Both agree that pastoral ministry should be restricted to men. Both agree that women may speak, argue from Scripture, and persuade at every level of church government.

The entire practical disagreement comes down to whether a woman may cast a vote in a session meeting.

That’s it. One vote in a committee. And yet this question has consumed years of denominational energy, generated thousands of pages of argument, and is now heading for a vote that will permanently reshape how the church governs itself.

The question everyone should be asking

The exegetical arguments on both sides are well-developed, and I’ll summarise them. But the question that determines everything is one of Presbyterian polity, not Greek vocabulary: which biblical texts define the ruling elder’s office?

The Smith/EDC position

Dr Murray Smith’s case for male-only eldership argues that PCNSW elders are biblical elders, defined by the overseer passages in 1 Timothy 3 and Titus 1. The PCNSW Code already cites those passages. The Code already describes elders using the specific language of those passages — “shepherds” under “the Chief Shepherd,” exercising “oversight and government,” “competent to teach.” If those texts define the office, and those texts require the officeholder to be male, then the conclusion follows.

This is the overture’s most effective argument, because it turns the Code against the status quo. Code §4.02(c) says elders must demonstrate character “as defined by the Scriptures (1 Tim. 3:1–7; Tit. 1:5–9).” If you’ve already legislated that those texts define your elders, you’ve already accepted the premises. The overture just draws the conclusion.

Smith also argues that 1 Timothy 2:12 — Paul’s prohibition on women “teaching or exercising authority over men” — anticipates the elder qualifications in chapter 3, tying the two passages together. The prohibition is grounded in creation order (1 Tim. 2:13–14), not cultural circumstances, and therefore applies permanently. The complementarian scholarly consensus behind this reading is strong — Andreas Köstenberger’s syntactical analysis of the Greek, Thomas Schreiner’s work on the creation-order grounding, Wayne Grudem’s systematic case, Douglas Moo’s definitive statement that the restrictions are “permanent, authoritative for the church in all times and places.” These aren’t fringe voices. They represent the mainstream of evangelical complementarian scholarship.

The “husband of one wife” requirement (1 Tim. 3:2; Tit. 1:6) adds weight. Paul uses the ordinary Greek words for “man” and “woman,” and applies the female equivalent — “wife of one husband” — to enrolled widows in 1 Timothy 5:9, confirming that the male phrasing is gender-specific rather than generic.

The Tucker position

Tucker’s response isn’t egalitarian. It’s a polity argument. He contends that the Westminster Assembly in 1645, when it wrote the Form of Presbyterial Church Government, deliberately grounded its “Other Church-Governors” (ruling elders) in different texts from its pastors — specifically Romans 12:8 and 1 Corinthians 12:28, which describe a gift of governance, not the overseer passages of 1 Timothy 3 and Titus 1.

George Gillespie, whose work shaped the Westminster Assembly’s thinking on this point, cited Titus 1:6–8 for the ruling elder’s character but deliberately left out verse 9 — the verse about teaching. He wasn’t being careless. He was making a distinction. The vote at the Assembly passed with only two dissenters, neither of whom wanted the overseer texts applied to governors.

Tucker documents that when the 1967 GAA opened eldership to women, the resolution explicitly cited this Westminster distinction: it “holds the doctrine of the Eldership as set forth in the Westminster Form of Presbyterian Church Government, under the heading ‘Other Church Governors'” and on that basis proposed that women could serve. The 1997 GAA confirmed this was a matter of government, not doctrine.

So when the overture is framed as “returning to our roots,” Tucker’s counter is that the roots being invoked aren’t Westminster’s 1645 settlement but Robert Dabney’s 1860 reinterpretation. Iain Murray, the respected Banner of Truth editor, acknowledged this: “My personal opinion is that the [Dabney Model] has often found acceptance among us because we assumed it was the position biblically established by the Westminster Assembly. The truth is that the assumption is wrong.”

There’s also a separate biblical case — not Tucker’s primary argument, but present in the “no change” paper circulating among members. It points to women exercising authority throughout Scripture: Deborah governing as prophet and judge (Judges 4–5), Huldah speaking God’s word to kings (2 Kings 22–23), Phoebe called diakonos — the same word Paul uses for himself (Romans 16:1), Junia called “outstanding among the apostles” (Romans 16:7), women as the first witnesses and proclaimers of the resurrection. The “husband of one wife” language, on this reading, is an idiom about marital fidelity, not gender exclusivity — just as “you shall not covet your neighbour’s wife” (Exodus 20:17) doesn’t condone women coveting their neighbour’s husband. I find some of these arguments more persuasive than others, but they’re made by people who take Scripture seriously, and dismissing them as liberal is lazy.

The Code catches both sides

Here’s what makes this genuinely hard. The Smith side has a strong textual argument from the Code as currently written. The Tucker side has a strong historical argument about legislative intent — what the Assembly meant when it wrote those words. The eight-year process (2015–2023) that produced the current Code language considered this question and chose not to insert “man.” That’s not an accident.

Both arguments have integrity. The question is what you weigh more: what the text says, or what the people who wrote it intended.

Even if Smith is right, the order is wrong

This is where I part company with the overture. Not on the exegesis — reasonable people disagree on that, and I respect the scholarship on both sides. But on the sequencing.

There’s a distinction worth naming first: most PCNSW congregations don’t currently have women elders. For many churches, the practical effect of the overture would be small. But there is a difference between a church that has chosen not to elect women elders and a church that is prohibited from doing so. The current Code preserves the freedom of each session to discern, under Scripture and the Spirit, whether women should serve. The proposed change removes that freedom permanently — not just for churches that have women elders now, but for every future congregation that might discern that call differently.

The overture proposes to make all church courts exclusively male without first establishing any concrete mechanism for women’s participation in governance. No alternative has been designed, consulted on, voted on, or tested. The Assembly is being asked to remove an existing structure and trust that something will replace it.

The experience of every other state that’s done this, and every comparable denomination overseas, says the replacement doesn’t come. Or it comes as what Lauren Riske called “shadow elders” in AP Magazine — women invited to attend but not vote, burdened with responsibility but stripped of standing. A two-tier system that satisfies nobody.

In 2020, the Women’s Ministry Committee of the Presbyterian Church of Australia — our own denomination — surveyed 484 people (324 women, 162 men) across the PCA and presented its findings to the 2023 General Assembly of Australia. More than 50% of all respondents — men and women — could not affirm that women trust the elders with issues specifically related to women such as domestic violence and sexism. Less than 50% of women believed they were consulted by elders regarding church direction. More than 30% of women felt limited in what they could do. On multiple questions, the gap between how female congregation members and male ministers perceived the same reality was statistically significant. This is not overseas data. It comes from our own churches, and it paints a picture of a communication gap that ministers and elders need to take seriously — the report’s own words.

Murray Capill, Dean of Ministry Development at the Reformed Theological College in Melbourne, acknowledged that “from my experience and observation, lots of us don’t do that well” at consulting women.

Overseas examples are suggestive, not dispositive. But the patterns are consistent enough to take seriously. The Save the PCA “Functional Female Officer Report” (2025) — published by a conservative advocacy group within the American Presbyterian Church in America, but documenting observable practices — found that numerous PCA (America) churches have created quasi-official workarounds: women leading worship elements, sitting in session meetings as “elder advisors,” going through officer training, being commissioned with public vows — all without formal office. Male-only eldership, in practice, generates constant pressure toward informal arrangements that satisfy nobody. The rules say one thing; the practice says another.

The Christian Reformed Church’s experience tells a different story. The CRC opened all offices to women on a local-option basis in 1995, and 36 complementarian churches with approximately 7,500 members left to form the United Reformed Churches in North America. The denomination never recovered those numbers. The lesson cuts both ways: this question is deeply divisive in either direction. But the PCNSW already has 35 years of settled practice. The disruption flows from changing that practice, not from maintaining it.

The Royal Commission said this, specifically

The Australian Royal Commission into Institutional Responses to Child Sexual Abuse (2013–2017) examined over 4,000 survivors’ accounts from religious institutions. Volume 16, devoted entirely to religious organisations, found — as a matter of evidence — that the absence or insufficient involvement of women in leadership and governance negatively affected decision-making, accountability, and may have contributed to inadequate institutional responses to child sexual abuse.

That’s not a recommendation. It’s a finding. Based on evidence. From the most significant institutional abuse inquiry in Australian history.

The Commission identified clericalism — the elevation of ordained or ministerial status above laity — as a significant contributing factor in some religious institutions. It found that insufficient involvement of women in leadership and governance negatively affected decision-making and accountability. Recommendation 16.37 called for child-safety advisory mechanisms that include lay men and women with relevant expertise. The Commission received allegations relating to 106 cases in Presbyterian Church institutions across Australia. We are not exempt.

A church that votes to make all its governance bodies exclusively male in 2026 — nine years after that Final Report — cannot claim it didn’t know.

The governance research raises a serious caution

Tucker’s paper notes that “it is acknowledged on all sides of the debate that governance bodies comprising men and women working together make better decisions and achieve better outcomes.” The Elders and Deacons Committee’s own 2021 report cited the same evidence. The academic literature is more cautious than either side acknowledges — an ANU review found that most studies do not convincingly isolate a causal effect of board diversity on overall organisational performance. But the evidence for narrower effects — improved monitoring, better attendance, stronger scrutiny, and greater sensitivity to ethical risk — is robust. The most that can responsibly be said is that removing women from governance may carry governance costs, especially around challenge and accountability. The research raises a serious caution. It does not settle the polity question by itself.

Adams and Ferreira’s 2009 study in the Journal of Financial Economics found that female directors have better attendance records, that male directors attend more reliably when boards are gender-diverse, that women are more likely to join monitoring committees, and that CEO accountability increases measurably. An important nuance: they also found that the effect depends on existing governance quality. In well-governed organisations with strong accountability already in place, adding diversity provides diminishing returns. But in poorly-governed organisations — where scrutiny is weak and challenge is absent — gender diversity adds significant value. The question for the PCNSW is whether all our sessions are in the well-governed category. The denomination has never measured this.

Irving Janis’s foundational work on groupthink identified homogeneity of membership as a key antecedent condition — the very thing that makes groups stop questioning themselves. Session meetings are governance meetings. They make decisions about people’s lives — pastoral care, discipline, complaints, child safety.

The proponents of male-only eldership would argue, fairly, that if Scripture requires it, then pragmatic research can’t override a clear biblical command. I agree with the principle. But it loops back to the polity question: if the Westminster model is correct and ruling elders are grounded in different texts, then there is no clear biblical command to override the research. The governance evidence becomes a relevant consideration in an area where Scripture gives the church freedom to order its government wisely.

Compliance is not comprehension

Someone will point out that we have Breaking the Silence. They’re right. BTS has been in place since 1997 — one of the earliest church safeguarding programs in Australia. It’s mandatory for elders, ministers, and anyone working with children or vulnerable people. It covers child protection, domestic violence, abuse of authority, mandatory reporting. It’s been updated regularly, expanded to cover DFV in 2020, and mapped to the Royal Commission’s Child Safe Standards. The Conduct Protocol Unit is professionally staffed. This is real work done by serious people.

BTS is more developed than many people realise. Foundations training runs on a three-year cycle with annual Read and Review refreshers for those working directly with children and young people. The CPU conducts compliance audits across churches. The online training requires an 80% quiz score for satisfactory completion, and reattempts are permitted. An independent contact person option exists for reporting abuse via the CPU, providing a pathway that does not run through the local session. This is a real and serious safeguarding architecture.

The harder question is not whether BTS exists, but what public evidence is available that it changes reporting behaviour and outcomes over time. The quiz tests knowledge — whether you know the right answer — but not competency: whether you can handle an actual disclosure, manage a conflict of interest, or respond appropriately to a vulnerable person in distress. After 28 years of operation, I have not found a public independent evaluation of BTS effectiveness or published outcome data that would let anyone assess whether the training changes behaviour. The system may well work. But we can’t currently demonstrate that it does, and that gap matters more if we’re about to remove another layer of structural accountability.

A 2025 peer-reviewed study in Child Abuse & Neglect (Hunt, Higgins & Willis, Australian Catholic University) interviewed 20 Christian leaders across denominations about safeguarding training and titled their findings “Just tick the box and move on.” It’s a small qualitative study — not a definitive evaluation of any one program, and not specific to BTS — but the patterns it identified are worth taking seriously: compliance-focused mentality rather than genuine culture change, cultural resistance from leaders who see safeguarding as unnecessary external regulation, and outright hostility from some volunteers. Those patterns describe a risk that any church safeguarding program needs to guard against.

The Anglican Diocese of Sydney enforces a hard rule: if your safeguarding training lapses and isn’t renewed within 30 days, you step down from ministry entirely. The Catholic Church runs formal external audits against national safeguarding standards with registered auditors. We have neither enforcement mechanism.

The WHS survey asks directly: “Do you believe that this could impact the willingness of staff or volunteers to report issues such as conflict, bullying, or harassment?” There is a reasonable concern, supported by governance and institutional abuse research, that when the people receiving complaints share the demographic profile of the people being complained about, reporting confidence is affected. Our own denomination’s 2020 WMPCA survey is consistent with this concern — even men in the PCA couldn’t affirm that women trust elders with issues like domestic violence. That wasn’t just a women’s perception. It was a denomination-wide finding. We don’t need to guess what the risks might be. We have data from our own churches.

Genesis 3:16 — the text complementarians should take most seriously

This argument doesn’t require an egalitarian theology. It requires taking Genesis 3 seriously within the complementarian framework.

“Your desire shall be contrary to your husband, and he shall rule over you.” Both sides agree this describes a distortion — what sin does to the relationship between men and women, not what God designed.

John Piper, co-founder of the Council on Biblical Manhood and Womanhood, calls this a description of the curse. He identifies what he calls “the essence of corrupted maleness” as the self-aggrandising effort to subdue and control and exploit women. Calvin read it as introducing a harsher subjection than existed before the Fall. Susan Foh, whose 1975 interpretation in the Westminster Theological Journal most complementarians follow, concluded that the rule of love founded in paradise is replaced by struggle, tyranny, and domination.

Even CBMW’s Jonathan Leeman conceded that views of male headship can contribute to partner violence. Al Mohler said complementarian theology “can be, and it sometimes is” used to abuse women.

So here’s the question we need to sit with: if you believe Genesis 3:16 predicts that male authority, post-Fall, will tend toward sinful domination — and you do believe that, it’s in your own literature — then what governance structures guard against that predictable distortion? The answer can’t be “godly men will lead like Christ.” That’s the aspiration. The question is what happens when they don’t. Because sometimes they don’t. The SBC’s 700-name database of accused abusers says they don’t. The Presbyterian Church in America’s 2023 General Assembly, where an all-male body voted down all four abuse-prevention proposals, says they don’t.

Removing women from governance concentrates authority in exactly the group most susceptible to the corruption Genesis 3 describes, while eliminating the structural accountability that diverse governance provides. That should bother every complementarian in the room.

What would need to be true before this change could responsibly be made

I’m not saying “never.” I’m saying “not yet.” And I’m saying the burden of proof is on those proposing the change to show that these conditions are met:

An alternative mechanism for women’s participation in governance has been designed, consulted on, and voted on by the Assembly — and published before the vote on the overture, not promised afterward. Not promised. Not recommended. Voted on, with specifics. What is it? How does it work? What standing do women have? Can they merely advise, or do they have constitutive power? These aren’t details to work out later. They’re the whole question.

Breaking the Silence has been independently evaluated for effectiveness. BTS has an 80% knowledge quiz, compliance audits, and a contact-person pathway — but after 28 years, the denomination should also be able to demonstrate publicly that its training changes behaviour and outcomes. Competency-based assessment, independent external evaluation, and published outcome data would allow the denomination to know — in a measurable sense — whether its safeguarding system works. If it does, prove it. If it doesn’t, fix it before you remove the structural safeguard that partly compensates for the gap.

An independent complaints mechanism, accessible to women and not solely administered by male office-bearers, has been established and resourced. The CPU and contact-person pathway already provide some independence from the local session. The live question is whether that pathway is sufficiently independent from the denomination itself, sufficiently visible to women and volunteers, and sufficiently trusted — particularly for complaints about male leaders heard by all-male bodies. Dealing with complaints by women against male leaders is, as Tucker’s paper notes, “particularly problematic when the case is heard by a group in which only men get a vote.”

The church has mapped which pathway handles which class of concern. The WHS survey asks about conflict, bullying, harassment, and willingness to report. BTS is designed for abuse, abuse of authority, and child safety — and it does that work seriously. But not every concern the survey names is a BTS matter. If the Assembly is asking members to assess broader risks, it should be able to show where a complaint about session conflict goes, where a bullying concern about a ministry leader is heard, and what independent options exist in each case. These pathways need to exist, be published, and be known — not assumed.

The Healthy Complementarianism resolutions adopted by the GAA have been implemented at session level across PCNSW presbyteries, with measurable outcomes reported annually to the Assembly. Dr Smith’s own framing: if we say ‘no’ to women in eldership without also saying ‘yes’ to all the ways men and women complement each other, we’ve only done half the job. The Other Cheek observed that the Healthy Complementarianism paper “is clear on how to change the church code to bring about male elders, but there are no worked-out solutions for the ideas to support women.” The ‘yes’ doesn’t exist yet. Build it before you legislate the ‘no.’

The deeper irony

Presbyterian governance was designed, from its inception, to prevent the concentration of power. Calvin insisted on lay participation alongside clergy. Knox built Scottish polity around elder parity and appeals to higher courts. The whole system exists because the Reformers understood — on the basis of total depravity — that unchecked authority corrupts.

The ruling elder exists as a representative of the congregation. Not an extension of the minister’s authority. A representative. If ruling elders represent the congregation, and the congregation includes women, then a governance body that systematically excludes women has a representational deficit built into its foundation.

Paul’s argument in 1 Corinthians 12 insists that the body cannot say to any member, “I have no need of you.” The ear cannot be told it has no place because it is not an eye.

The proposed change tells more than half the body of Christ that their voice has no formal standing in the governance of their own church. Not because they lack character, competence, or calling. Because of their sex. That should trouble us — not because complementarianism is wrong, but because this particular expression of it contradicts the very logic of the governance system we claim to value.

The evidence in this post has limits. Some of it is institutional and direct; some is comparative and indirect; some comes from advocacy sources I’ve used only as illustration. I’m not claiming the future can be read off a single dataset. I’m arguing that the risks are credible enough, and the safeguards underdeveloped enough, that the Assembly should not make this change on trust.

Where I land

I don’t think the exegetical question is as settled as either side claims. The meaning of authentein in 1 Timothy 2:12 is genuinely contested among serious scholars. The relationship between the overseer passages and the ruling elder’s office has been debated within Presbyterianism for centuries. Whether 1 Timothy 5:17’s malista identifies a subgroup of teaching elders or refers to all elders is unresolved. These are hard questions with honest disagreement on all sides.

But even if I’m wrong about the exegesis, I’m not wrong about the sequencing. You don’t remove a structural safeguard before you’ve built the replacement. You don’t tell women they’ll be looked after by the same system that, in denomination after denomination, has demonstrably failed to look after them. You don’t vote to concentrate authority in the hands of one sex while knowing what the Royal Commission found about what that does.

If we must change the lock, let’s first build the door. The people who will bear the cost of getting this wrong are the ones with the least power to do anything about it.

I’ve talked to women in our churches who accept complementarianism and still feel afraid of this change. Not because they think it’s theologically wrong — they’re genuinely uncertain about that — but because they’ve seen what happens when institutions consolidate male authority without structural accountability. Some of them have experienced it personally. When they hear “godly men will lead like Christ,” they hear a promise that has been broken before. Not by bad theology, but by bad men operating within good theology. And they wonder who will notice, and who will act, when the governance table has no one who shares their experience sitting at it.

That’s not an argument against complementarianism. It’s an argument for taking it seriously enough to do it carefully.

The new telco DFV standard protects customers. Who protects the workers?

Australia’s Telecommunications DFSV Standard 2025 is a landmark for victim-survivors. But it has a blind spot — and it’s the people answering the phones.

On 1 July 2025, Australia’s first enforceable telecommunications standard for domestic, family and sexual violence (DFSV) came into force. The Telecommunications (Domestic, Family and Sexual Violence Consumer Protections) Industry Standard 2025, made by the ACMA under Part 6 of the Telecommunications Act 1997, replaced a decade of voluntary guidelines with binding obligations. Full compliance for large providers was required by 1 January 2026, and small providers by 1 April 2026.

It is, without question, a significant step forward for consumers experiencing violence. Telcos must now urgently reverse disconnections where there’s a safety risk, provide specialist DFV-trained staff, offer new accounts not linked to a perpetrator, and critically — must not require evidence of abuse or force a victim-survivor to engage with their abuser for account changes or debt resolution.

This is good regulation. I support it.

I’m not writing this from the sidelines. RWTS was an initial signatory to the telecommunications industry DFV pledge, and we’ve been working on DFV-aware practices internally for three years — training our team, building processes, and thinking seriously about what it means for a technology services provider to handle these situations well.

This issue came into sharp focus for me yesterday during a discussion with the Internet Association of Australia’s Public Policy Advisory Panel, where we were examining ACMA enforcement priorities for the new standard. As the conversation progressed, I kept coming back to a question that nobody in the room had a good answer for.

We’ve spent years building protections for the people on one side of the phone call. But there is a gap in this standard that nobody appears to be talking about — and I believe it represents a real and quantifiable risk to the people on the other side.

The numbers we can’t ignore

In April 2023, the Australian Child Maltreatment Study (ACMS)— Australia’s first nationally representative prevalence study — was published in the Medical Journal of Australia. It surveyed 8,503 Australians aged 16+ and found that 62.2% of Australians experienced at least one form of maltreatment during childhood.

Not a small minority. Not a marginal statistic. The majority.

The breakdown is confronting:

  • 39.6% were exposed to domestic violence as children
  • 32.0% experienced physical abuse
  • 30.9% experienced emotional abuse
  • 28.5% experienced sexual abuse
  • 39.4% experienced multiple types of maltreatment

Separately, the ABS 2021–22 Personal Safety Survey confirmed that 1 in 7 Australians (14.1%) experienced childhood physical and/or sexual abuse — the narrower but equally authoritative measure covering abuse by adults before age 15.

Among those with maltreatment histories, 48% met criteria for at least one mental disorder, compared with 21.6% of those without. A 2024 University of Sydney analysis attributed up to 40% of Australia’s mental health burden to childhood maltreatment.

The workforce the standard forgot

The DFSV Standard mandates two tiers of training: general DFV awareness for all personnel (Section 21), and specialised training for customer-facing staff (Section 22) covering the nature of DFV, intersectionality, and how to identify and engage with affected customers.

There is exactly one reference to worker safety in the entire standard — a requirement that training cover “recognising and prioritising the safety of personnel engaging with perpetrators.”

What’s missing is everything else:

  • No mandatory employee counselling or debriefing tied to DFV disclosure handling
  • No clinical supervision or peer support structures
  • No workload management to limit DFV case exposure per worker
  • No re-traumatisation risk framework for staff
  • No requirement for Employee Assistance Program access specifically linked to this work

The standard requires “trauma-informed” policies and training — but this framing is exclusively consumer-directed. The same principle is not extended to the workers delivering the support.

When the person answering the call is also a survivor

Here’s the maths that should concern every telco executive and every regulator.

If 62% of Australians experienced childhood maltreatment, and 39.6% specifically witnessed domestic violence as children, then in any team of 10 frontline telco staff, statistically 6 carry some form of childhood trauma, and 4 witnessed the very type of violence they are now required to respond to professionally.

This isn’t speculation. Research on DFV advocates (Slattery & Goodman, 2009) found that 55% were themselves survivors of past abuse, and that survivor status was the only individual factorthat significantly predicted secondary traumatic stress. Without organisational support structures, every DFV disclosure a telco worker receives becomes a potential re-traumatisation event.

The neurodivergent dimension

There is a compounding factor that makes this even more acute for the technology and telecommunications sector specifically.

Two WHO-commissioned Lancet meta-analyses have established that children with disabilities experience 2–4 times the rate of abuse compared to non-disabled peers. Condition-specific research deepens this: a 2023 meta-analysis found 44% of autistic individuals reported victimisation, while research on ADHD found 45.6% had experienced maltreatment (OR 2.39). Adults with dyslexia reported childhood physical abuse at five times the rate of those without.

The tech and telco workforce has disproportionate neurodivergent representation. Australia’s TechDiversity Foundation “Tech Reflects” 2024–25 study found 12.4% of participants self-identified as neurodivergent — likely an undercount, given that international surveys show up to 50% self-identification when asked directly, with 57% having never disclosed at work. Simon Baron-Cohen’s Cambridge research on 450,000+ participants found STEM workers score significantly higher on measures of autistic traits.

The intersection is clear: a workforce with elevated neurodivergent representation, where neurodivergent individuals face 2–4× the baseline abuse risk, is now required to handle DFV disclosures — without any mandated psychological safety framework.

The research that doesn’t exist

Here is perhaps the most concerning finding from my research into this topic:

No peer-reviewed research exists — in Australia or internationally — specifically studying vicarious or secondary trauma among telecommunications workers handling DFV cases.

The same gap applies to energy, water, and banking workers, despite all these sectors now having DFV-specific regulatory obligations. The vicarious trauma literature overwhelmingly focuses on clinical, social work, and emergency service settings.

What does exist points clearly to the risk. The Law Society of NSW explicitly identifies call centre staff as a population vulnerable to vicarious trauma. Australian call centre workers take 10–15 days of sick leave annually compared to a national average of 6–7 days. Safe Work Australia reports that psychological injury claims have increased 80% over five years.

A 2023 study published in PMC warned that even trauma-informed practice training itself can re-traumatise attendeeswhen it includes detailed survivor accounts.

The WHS obligation already exists — it just hasn’t been connected

Safe Work Australia’s Model Code of Practice: Managing Psychosocial Hazards at Work (2022) creates obligations under WHS legislation that arguably already cover the psychosocial risks for telco workers handling DFV disclosures. Exposure to traumatic content, emotionally demanding work, and inadequate support are all recognised psychosocial hazards.

But no regulator — not the ACMA, not Safe Work Australia, not any state WHS authority — has explicitly connected these obligations to the new reality of mandatory DFV disclosure handling in Australia’s telco workforce.

What I’m calling for

To be clear: I am not calling for the standard to be weakened. I am calling for it to be completed.

But I want to be equally clear about where the responsibility for completing it should sit. This is not a problem for small telcos to solve on their own.

The government created this obligation. The ACMA enforces it. The standard applies to every carrier and carriage service provider in the country — from Telstra down to a regional ISP with a handful of staff. The compliance burden is already significant, particularly for smaller providers who don’t have dedicated DFV teams, in-house psychologists, or the scale to absorb the cost of specialist training and support infrastructure.

You cannot mandate that an entire industry’s workforce absorb trauma disclosures as a regulatory obligation and then leave it to individual businesses — many of them small — to figure out the psychological safety implications on their own. That is a transfer of public health risk onto private employers without the resources to manage it. And the data tells us this isn’t a niche risk — it’s a majority-of-your-workforce risk.

The Australian Government committed $22.4 million to fund ACMS Wave 2, which will for the first time include disability and neurodivergence breakdowns. That investment signals the government understands this is a public health issue. The workforce side of the equation deserves the same recognition.

Specifically, I’m calling on the Australian Government to:

  1. Fund the development of a workforce psychological safety framework for the DFSV Standard — not as an unfunded add-on for providers to build themselves, but as a government-resourced initiative developed with DFV experts, WHS specialists, and the telco industry. This should include model policies for counselling access, debriefing protocols, workload management, and re-traumatisation risk assessment that providers of any size can adopt.
  2. Commission and fund research through the NHMRC or ARC into vicarious and secondary trauma among non-clinical workers in telco, energy, banking and other sectors now handling DFV disclosures as a regulatory obligation. This is a gap in the evidence base that the government created by extending DFV obligations to these workforces — the government should fund closing it.
  3. Direct Safe Work Australia to issue explicit guidanceconnecting existing psychosocial hazard obligations under WHS legislation to DFV disclosure handling in regulated industries. The legal framework already exists — it just hasn’t been connected to this new class of occupational exposure.
  4. Establish a funded support program — potentially through the Telco Together Foundation or a similar vehicle — that provides subsidised access to trauma-informed supervision, EAP services, and peer support networks for smaller providers who cannot build these capabilities in-house. The energy sector’s approach to hardship programs offers a model: industry-wide infrastructure funded collectively, not left to individual small businesses.
  5. Require the ACMS Wave 2 study to include occupational exposure analysis — specifically examining whether workers in DFV-disclosure-handling roles experience different psychological outcomes, and whether neurodivergent workers in these roles face compounded risk.

This is a public health issue that the government has, rightly, chosen to address through regulation. But regulation without resourcing is just risk transfer. If we’re serious about protecting victim-survivors — and we should be — we need to be equally serious about not creating a new class of psychological casualties in the workforce tasked with delivering that protection.

An invitation

I would welcome engagement from researchers, regulators, unions, DFV advocacy organisations, and telco operators on this issue. If research exists that I’ve missed, I want to know about it. If organisations are already addressing this gap internally, I want to hear how. And if you’re in government and this has crossed your desk — I’d genuinely welcome the conversation about how to get this right.

The standard asks telco workers to hold space for some of the most traumatic disclosures a person can make. The least the government can do is resource the industry to hold space for them in return.


References & further reading:

Why Your AI Coding Agent Forgets What It’s Building (And What We’re Doing About It)

If you’ve spent any serious time building software with an AI coding agent — Claude Code, Cursor, Copilot Workspace, Kiro, or any of the others — you’ve probably noticed something uncomfortable. The agent starts brilliantly. It reads your specification, creates a thoughtful work plan, and begins implementing with real understanding. Then, somewhere around the 30-minute mark, things quietly fall apart.

Requirements get simplified. Edge cases vanish. Features appear that nobody asked for. And the agent, if you ask it, will cheerfully tell you everything is on track.

At Real World, we’ve been researching this for around 18 months — and what we’ve found goes well beyond “just give it a bigger context window.”

The problem has a name now

The research community has converged on a term: specification drift. It refers to the progressive loss of connection between an AI agent’s output and the requirements it was given. It’s a specific manifestation of a broader phenomenon. It is called context rot. LLM performance degrades as input volume grows, even when every token in the context is relevant.

This isn’t a niche concern. Laban et al. measured an average 39% performance drop in multi-turn versus single-turn LLM interactions across 200,000 simulated conversations. Du et al. found performance degrades up to 85% as input length increases, even when the model can perfectly retrieve all relevant information. Liu et al.’s foundational “Lost in the Middle” study showed that LLMs attend reliably to the beginning and end of their context. However, they significantly degrade for information in the middle. This is exactly where your specification ends up once generated code starts accumulating.

Here’s how it plays out in practice:

The agent reads your spec, builds a plan, and starts implementing. As the conversation grows, the context window fills up and compaction kicks in — the system automatically summarises older content to make room. During compaction, specific requirements, section references, and constraints are lost. The agent continues implementing from a degraded recollection of what was asked for, not the actual specification. Gaps are discovered late. Expensive rework follows. And the cycle repeats.

We observed this consistently across production projects. Gap analyses at the end of implementation sessions consistently revealed requirements the agent had marked as complete but that were actually missing key behaviours. The specification was still “in context” — but effectively invisible.

What everyone else is doing (and why it’s not enough)

2025 saw spec-driven development emerge as a recognised practice. Thoughtworks called it one of the year’s key new engineering practices. GitHub released Spec Kit. Amazon launched Kiro with a built-in spec-to-code pipeline. JetBrains Junie adopted spec-driven workflows. The basic idea — write a specification before you write code — is sound, and it’s a meaningful step forward from “vibe coding.”

But every one of these tools shares a fundamental limitation: they rely on soft guardrails. They tell the agent “do not write code yet” through prompt-level instructions, and hope the agent listens.

This isn’t a prompting failure. It’s a structural one. The AgentIF benchmark showed that even the best-performing models follow fewer than 30% of agentic instructions perfectly. Research from Anthropic (Denison et al., 2024) demonstrated that LLMs generalise from simple specification gaming to sophisticated reward tampering. Telling an AI agent “please follow the specification” is roughly as effective as telling a developer “please write tests.” The intent is right. The enforcement mechanism is missing.

What we’ve been building

We’ve been approaching this from both ends of the software development lifecycle. On the specification side, we’ve been developing structured approaches to writing specs that survive AI context management. These documents are designed from the ground up. They are meant to be consumed by agents, not just humans. On the implementation side, we’ve built /implement, a publicly available Claude Code skill that enforces specification discipline through structural mechanisms rather than polite suggestions.

The specification work isn’t something we’re releasing as a product — every team’s specification process is different, and we’d encourage you to develop your own. But the principles we’ve discovered apply regardless of how you write your specs. The /implement skill embodies the implementation side and is freely available.

Here’s what we learned building both.

Hard guardrails that survive context loss

The first thing we tried was embedding hard rules in the skill’s instructions — not suggestions, but absolute prohibitions. “Do not produce code during the specification phase.” “Do not mark a requirement as complete without running tests.”

This worked great until context compaction occurred. Then the rules got compressed away along with everything else. The agent lost awareness of its own constraints.

The fix required pairing every hard guardrail with a persistent recovery mechanism — a tracker file on disk that serves as the authoritative source of workflow state. After compaction, the agent reads the tracker, re-establishes where it is in the process, and the guardrails reload. The tracker isn’t just a progress log. It’s a recovery mechanism that embeds its own instructions: “If you’re reading this, here’s what this file means, here’s where we’re up to, and here’s what to do next.”

Neither component is sufficient alone. Hard rules without recovery fail at compaction boundaries. Recovery without hard rules fails under context pressure. The pairing is what works.

Structural indexing: let the orchestrator navigate, not comprehend

When you ask an agent to work with a large specification, the naive approach is to load the whole thing into the conversation. A 130,000-token spec consumes most of the available context window, leaving almost nothing for reasoning.

We developed a pattern we call structural indexing: the main conversation loads only a lightweight index of section identifiers and file sizes. Sub-agents, dispatched to work on specific sections, read the full content directly from disk. The main conversation’s job is navigation and dispatch, not comprehension.

The result was dramatic — context consumption dropped by over 98% with no measurable quality degradation. The insight was architectural: the orchestrating conversation doesn’t need to understand the specification. It needs to know where things are and which agent should read what. Comprehension happens at the edges, in fresh sub-agent contexts with full attention on their assigned work.

This principle turned out to be universal. Every time the main conversation tried to hold large volumes of content — specifications coming in, agent outputs coming back, planning artefacts, even the skill’s own instructions — the same failure mode appeared. The solution was always the same: keep the orchestrator lightweight and let it coordinate, not consume.

Anti-rationalisation: the finding nobody expected

This was the genuinely surprising discovery. We found that LLMs don’t just forget workflow steps — they actively construct locally coherent justifications for skipping them.

An agent assigned to update a tracker file after completing a task would reason: “I know the state from the current conversation, so updating the tracker is redundant right now.” An agent told to write a plan to disk before executing would argue: “The next plan is obvious from context, so I’ll save time by executing directly.” Each justification is individually reasonable-looking. Each one silently breaks the recovery mechanism.

We call these anti-rationalisation failures, and they’re distinct from the model ignoring instructions or adversarial jailbreaking. The model convinces itself, through plausible reasoning, that a required step doesn’t apply right now.

The countermeasure is surprisingly specific: you have to name the exact excuses you want to prohibit. A general rule (“always update the tracker”) can be rationalised around. A rule that says “you must not skip this step, and specifically, these justifications are not valid: ‘I know the state from the current conversation,’ ‘this is the same session,’ ‘the next plan is obvious’” — that holds. The named excuses don’t recur.

This has broader implications for anyone designing AI workflows. General rules invite creative interpretation. Specific prohibitions close the rationalisation loop.

Context isolation as a verification advantage

Here’s a reframing we’re particularly proud of. The standard view of LLM context boundaries is that they’re a limitation — agents can’t see each other’s work, so coordination is hard. We found that for verification, isolation is a feature.

In our TDD workflow, the test-writing agent reads only the specification. It never sees the implementation. The implementation agent works from a different context entirely. When their independent interpretations of the specification disagree, that disagreement surfaces ambiguities and catches drift early — before it compounds.

This is conceptually similar to N-version programming (Avizienis, 1985), where multiple independent teams develop from the same specification. The SAGA study validated the principle for LLMs specifically, finding that LLM-generated test suites have systematic blind spots mirroring the generating model’s error patterns. If your test agent sees the implementation, it inherits the implementation’s blind spots.

The Agile Manifesto 25th Anniversary Workshop concluded that test-driven development produces dramatically better results from AI coding agents by preventing them from writing tests that verify broken behaviour. We arrived at the same conclusion independently.

Multi-skill interference: the next frontier

As the skill ecosystem around AI coding agents grows, a new problem is emerging. When multiple skill frameworks coexist in a single session — each with its own workflow assumptions and hard gates — they interfere in ways that neither would exhibit in isolation.

We have identified three distinct failure patterns. The first is workflow capture, where the most recently invoked skill overrides earlier guardrails. Next is sub-agent context isolation, meaning dispatched agents don’t inherit any skill context and default to generic behaviour. The third pattern is planning framework deadlock, with two skills both trying to manage plan execution simultaneously. Recent research has confirmed this isn’t just our experience. Li (2025) found a phase transition in skill selection accuracy as library size grows. Performance drops sharply once semantic confusability between skills reaches a threshold.

We’ve developed pattern-specific countermeasures. However, a general solution to multi-skill interference remains an open question. It is an active area of our research.

What this means if you’re building with AI agents

The METR randomised controlled trial (Becker et al., July 2025) found that experienced open-source developers completed tasks 19% slower with AI assistance — while believing they were 24% faster. This perception gap is the real danger. Teams won’t self-correct toward better specification practices because they genuinely believe things are going well.

Structural enforcement of specification discipline is necessary precisely because the humans in the loop can’t accurately assess when AI assistance is helping versus hindering.

If you take one thing from this post, let it be this: the problem isn’t the AI. The problem is the absence of structural discipline around the AI. Every principle we’ve discovered boils down to the same insight — don’t rely on the agent’s good intentions. Build the structure that makes doing the right thing the only available path.

Treat specifications as persistent artefacts on disk, not conversation context. Use hard enforcement, not soft suggestions. Keep your orchestrator lightweight and delegate comprehension to sub-agents. Name the specific excuses you want to prevent, because general rules invite creative interpretation. And exploit context isolation for verification — don’t fight the boundaries between agents, use them.

Try it yourself

The /implement skill is publicly available and works with Claude Code. We’d encourage you to try it on a real project and see how it changes the way your agent handles specifications.

The principles matter more than any specific tool. /implement is one implementation of them — freely available, and a good starting point. Your specification workflow should reflect how your team actually works; ours reflects how we work, and that’s the point.

If you want proof the methodology holds up in practice, Props is an open source inventory management system we built specification-first using exactly this process.

We’re continuing this research. Multi-skill interference, sub-agent schema compliance, and partial completion detection are all open problems we’re actively working on. If you’re tackling similar challenges, we’d love to hear from you.

Remember

You don’t expect the tendrils of grief
To reach out and strangle your heart

You don’t expect the moments of life
To call back the regrets, the pain, or the loss

You don’t expect the places, spaces or times
To remind you of what you lost

And yet – without expectation
Without planning
They rise up
And exhale
Unfurling

It was as we drove in and he said
“I remember this place.”
“It was the last time.”
It was. The last time we saw her.

But the pain isn’t even that.

It’s years of loss.
Years of struggle
Years of heartache and hard work
Years of stress and stressors

And these moments call it all back.

In ways I didn’t expect.

So today I sit.

And I cried

That’s not new news. But it’s been a while.

To be honest, it’s an emotion I’d lost.

Dread. Anxiousness. Nervousness. Joy. Anticipation.

These had become my friends.

But tears? I’d forgotten how.

The uncertainty is not it.

The fear of having missed the window.

The concern over whether something different could be done.

These aren’t the problem. So what is?

If the world would just stop.

Stop, so I can catch my breath.

Stop, so I can breathe.

Stop, so I can adjust to the thoughts of the new.

But it doesn’t. It cant. We must keep moving.

Forever forward. Forever onward. Forever… something.

And is it this? I don’t know.

Change is hard.

When you know that things will not be the same.

When the faces move. When the normal is disrupted.

These things matter too.

But in all of it, what’s hardest is feeling alone.

Not knowing who or what or where can the load be shared.

Who will listen and love, and who will silently judge.

Not knowing to whom or where to turn.

But I cried.

I’d forgotten how to do that.

It’s a year

Since we got the news.

Since we heard those words.

Since I made those phone calls.

Since the start of the grief.

Since the shock and disbelief.

Since…

Life is short. And yours was too short.

We miss you.

We expect to see your face, but it’s gone.

We are always asked “where are you?” by C.

You always lit up the room with your smile.

We loved your joy.

We loved how much you loved us.

We know we’ll see you soon.

But until we do, we grieve your absence.

It hurts.

God is Dead

It’s Good Friday.

The day where I celebrate a death. The death of a man. Moreover, I celebrate the death of God.

On a cross, 2000 years ago, the world was changed. When Jesus spoke those last words – “It is finished” – it changed history forever. Because it was finished. God himself died. Why?

Sacrifice.

His death was to pay the price for all of the evil in the world.

Only the death of the majestic unlimited ruler of the universe could be enough to sacrifice for the whole world.

For it is in this beautiful demonstration of servant hearted leadership and sacrifice that I am given hope.

It’s not hope of nothing. It’s not hope of an empty future. It’s a hope of forgiveness. It’s a hope of redemption. It’s a hope given because God himself stepped in to death for me. Willingly.

The story doesn’t stop here. His death was not the end.

But that’s a story for another day.

RUOK – but not really?

At a meeting I recently attended, my friend Rachel astutely observed that we are often more likely to be open about what our thoughts, struggles and challenges are with those who have short term impacts on our lives. There is something to be said about the safety of being able to close off ourselves after a period of time.

But as I reflect on this, I’m confronted with the sadness that we feel compelled to shield our lives from those around us. We’re concerned that our vulnerability might cause people to think less of us. We’re concerned that others might judge us. We’ve learnt that being vulnerable gets us burnt, and so we choose to lock our feelings, thoughts and emotions away.

Days like today (it is RUOK day if you missed it) try to challenge this, by encouraging us to take the first step to really enquire as to the wellbeing of another. To genuinely care.

But as many of my friends have observed the challenge with this is that those who are genuinely struggling don’t just need those “moments” of connection; they need long term, ongoing support, encouragement and care. They need friendship from people who will not just suspend judgement of their feelings, but who will genuinely decide that the person who is experiencing the downs of life is more important than the feelings and emotion that surrounds it. They need people who will support them regardless of the situation and concern.

If you are going to ask RUOK today (and I would encourage you to do so), also take the time to assess your own reactions, thoughts and emotional responses to those you are trying to support. Try to recognise where your responses are judgemental rather than caring and compassionate. Challenge yourself to “check” those responses, and consider how you can build relationships where the person you are relating to is more important than the weight of experience they are going through. Consider how today can just be one step in a longer journey of care and support.

While it might be easier to share how you are feeling with someone who is only there for a moment, it would be so much better to share with someone who is going to be beside you for the long haul.

Supporting someone who is dealing with the dark side of life can be hard – but remember, your actions in supporting them are probably no where near what the person is going through.

Walk this road together. There is light in the darkness, even if we sometimes can’t see it.

(Photo by Morteza Yousefi on Unsplash)

Also posted to Facebook and Linkedin