This is a copy of the original publication on LinkedIn. The original article is available here.
AI agents are already filing tax returns, drafting legal documents, and submitting applications on behalf of citizens. In the United States, courts have seen multiple notable instances of AI-generated filings, some containing fabricated case citations, prompting several jurisdictions to introduce AI disclosure rules. In France, developers are independently building MCP servers and AI agent skills so that Claude can calculate French taxes using publicly available DGFiP source code, without official government endorsement. In Estonia, in March 2026, developer Stefano Amorelli demonstrated Claude Code executing tax payments through the country's Tax and Customs Board (MTA) and LHV bank with no human touching a user interface. Rather than a standard government process, it was a proof-of-concept with the technical achievements making the broader point.
These are not edge cases. They are early signals of a structural shift in how citizens interact with the state. The question is whether public sector organisations (PSOs) are catching up with the governance architecture this requires.
This piece reviews a recent academic paper by Chris Schmitz, Jonathan Hvithamar Rystrøm, and Jan Batzner on agentic AI oversight in PSOs and puts it in conversation with a follow-up published this month by Luukas Ilves and Ott Velsberg on Estonia's progress against the 12-layer "Agentic State" framework.
While the hook above spans the US, France and Estonia, the case study analysis here focuses on Estonia, where the response to this structural shift is most visible and best documented to date. The paper identifies five governance challenges that PSOs need to address. Estonia is a case study that maps onto each one, illustrating how the governance architecture described in research is shaped in practice and where the observed differences are. As of this week, the timing is especially relevant: Estonia officially approved its new Digital State Agenda for 2026–2035, placing AI adoption in public and private sectors at the centre of national strategy for the next decade.
1. The Paper: Five Challenges for PSOs
Schmitz et al. interviewed civil servants from six German federal, state, and municipal agencies, asking how existing AI governance practices map onto the requirements of agentic AI deployment. Their core finding: agentic AI does not introduce entirely new governance problems. It intensifies the ones PSOs already have.
The five governance areas they identify as essential for responsible agent deployment are:
Cross-departmental implementation: coordinating responsibilities across organisational units, since agent systems span multiple components and data sources.
Comprehensive evaluation: pre-deployment testing that is system-level, task-specific, and designed in close collaboration with the operational workers who will use the system.
Enhanced security protocols: addressing novel risks from agent autonomy (jailbreaking, data exfiltration, DDoS exposure) through cross-departmental threat mapping and red-teaming.
Operational visibility: visibility at the system level (agent indexes, model cards) and at the operational level (real-time monitoring) to enable both organisational awareness and intervention in misbehaving systems.
Systematic auditing: building external compliance validation on top of the previous capacities.
The interview findings are thought-provoking but align with earlier findings. Most PSOs handle AI through governance structures designed for traditional digital projects: episodic approval triggers, dedicated compliance units sitting outside operational teams, and what interviewees described as "adversarial" relationships between project teams and compliance teams. What is notable: several interviewees reported "self-censorship" — teams reducing project scope before involving compliance — to avoid lengthy approval cycles.
The authors are honest about scope. Their validation sample is small (six interviews) and German-specific. But their hypothesis is corroborated consistently: existing PSO governance structures are only partially compatible with what agentic AI requires.
2. Estonia Mapped onto the Five Challenges
Estonia is building an action plan, not a generic Agentic AI policy. It has twelve interlocking layers of state operation, of which nine are detailed in the Ilves and Velsberg follow-up. Below I map the layers onto the paper's five challenge groups, noting partial fits and observed differences.
Cross-departmental implementation. This is the most dense one. Four layers contribute directly. Bürokratt, Estonia's public-sector virtual assistant, is now live across 18 government agencies and is designed as a framework for multi-agent collaboration. A function repository was launched in March 2026 to address a chronic problem: agencies building the same solutions and workflows separately. Estonia is also updating its law-making guidelines to require that new regulatory initiatives consider, at the drafting stage, whether AI could address the problem more effectively. Procurement plans across agencies are being consolidated into one searchable, machine-readable system covering 9,000 procurements per year and €6 billion in annual volume.
Comprehensive evaluation. This is where the alignment between Estonia's published infrastructure and the paper's specification differs most. The Trustworthy AI toolbox (Layer 7), which entered early beta in March 2026, addresses the observation that public-sector AI typically fails not at the model layer but earlier, at problem framing, data quality, procurement, testing, and accountability. TARK provides centralised trust assessments for cloud-based and AI tools currently covering ChatGPT, Gemini, Claude, and Copilot. Neither of these is yet a full pre-deployment evaluation regime in the sense Schmitz et al. describe. The full evaluation architecture is still taking shape.
Enhanced security. Andmejälgija (the Data Tracker) gives citizens visibility into government access to their data and becomes mandatory for all agencies by 2028. A four-stage PII pipeline (detection, tokenisation, anonymised cloud query, token rehydration) allows AI training on sensitive data without exposing it to external providers. Estonia is also moving toward local HPC capacity and shared European infrastructure for sensitive use cases.
Operational visibility. This is where Estonia's design choices are most distinctive. The function repository includes an algorithmic registry providing an overview of AI systems used across the public sector, with an ambition toward mandatory disclosure requirements. RIA is piloting an Agent Registry to manage AI agents across government, with cryptographic machine identities (extending the national eID to machines), capability manifests defining what each agent can and cannot do, authorisation scopes, and kill switches. Andmejälgija is not just internal oversight — it is operationalised visibility as a citizen right. This goes beyond the paper's framing, where visibility is treated primarily as managerial and operational.
Systematic auditing. Estonia has multiple live audit-style pilots: TTJA monitors advertisement compliance, the Gender Equality and Equal Treatment Commissioner scans job advertisements, and the Data Protection Inspectorate automates checks on personal data processing conditions. EquiTech, a dedicated initiative for detecting bias, discrimination, and opaque AI decision-making, builds practical methods and training materials for fairness audits. What is still under development, by the reviewed paper's standard, is independent third-party agent auditing entities and standards.
3. The Paper's Discussion in Conversation with Estonia
Schmitz et al. close with three aggregated shortcomings of existing PSO governance structures, and three design principles for technical work on agent oversight.
Shortcoming 1: Continuous oversight. The paper argues that segmented, event-triggered governance cannot scale to the frequency of events agentic systems generate. Their conclusion: governance responsibilities must be diffused towards operational departments whose work is augmented by agents. Estonia addresses exactly this point. Andmejälgija makes data access continuously visible by default. The Agent Registry attaches kill switches and capability manifests to individual agents, not to project approvals. The function repository and algorithmic registry distribute visibility horizontally rather than concentrating it in compliance units.
Shortcoming 2: New governance capabilities throughout the lifecycle. The paper notes that visibility and evaluation require deeper integration of subject knowledge and technical understanding, necessitating upskilling and a redefined role for operational workers. Estonia's centralised licensing of approved AI tools through eesti.ai, paired with Digiriigiakadeemia and the planned 80% daily-use target, is a direct attempt to build operational governance capability rather than externalise it. Survey data shows 37% of public-sector workers and 45% of leaders already use AI daily, but only 2% believe full job substitution is possible.
Shortcoming 3: Interdepartmental coordination. The function repository is the cleanest answer to this in any country I have looked at. By making cross-cutting workflows shareable as reusable components, it both reduces duplication and creates a structural mechanism for cross-departmental visibility into how agents are being deployed.
Design principle 1: Tooling for distinct user groups. The paper recommends designing observability tooling for either technical external teams or subject matter experts, not interfaces that combine both. TARK is closer to this — it provides clear, centrally vetted risk profiles to non-technical adopters in agencies, while keeping technical assessment work elsewhere.
Design principle 2: Non-technical interfaces with public servants. The Trustworthy AI toolbox addresses this most directly, by intervening at problem framing and data quality — the points where non-technical workers actually engage with AI systems — rather than at model evaluation alone.
Design principle 3: Anticipating legacy systems. The Administrative Procedure Act (HMS) reform, approved by the Estonian government on 3 April 2026, is the standout move here. Instead of requiring legislative changes for each new automated process, it establishes a horizontal legal basis for automated administrative decisions. Safeguards include automatic notification of decisions, the right to contact a human official, transparency about decision logic, and human review on request.
And the Inbound Agents Problem
The Estonia follow-up flags what may become the most consequential governance challenge of all. AI agents are increasingly initiating interactions with the state on behalf of citizens — tax filings, funding applications, queries. The paper does not address this directly, focusing instead on agents deployed by the state, which is a deliberate scoping choice. The Agentic State follow-up takes it head on through the "Agent Residency" proposal: to give every AI agent a kind of digital ID and allow rights and responsibilities to be delegated from individuals or companies to their agents.
This is not yet policy, but it reframes the problem productively. Denying or prohibiting citizen-side agent access is unlikely to work if the procedure is standardised and the data is already in state systems. Better to design the interaction.
Closing Thoughts
Two things stand out from putting these texts in conversation.
First, the paper's diagnosis is accurate, and its scope is deliberately bounded. The five challenge areas are necessary, but not sufficient for the full governance terrain. People and culture, citizen-facing visibility, and inbound agent interactions are all areas where agentic AI is changing the landscape, and where the Agentic State framework is more complete than the paper's research lens, by design.
Second, Estonia is not a generalisable model. Estonia is a small, digitally mature, high-trust state with a digital government legacy going back over two decades and a specific focus on AI-driven governance since 2018. Most PSOs do not have those conditions. But Estonia is useful as a living test of which parts of the governance architecture are actually possible to build in practice. Where Estonia has built something concrete (Bürokratt, Andmejälgija, the function repository, TARK), other states can study the design choices. Where Estonia is still working on it (formal third-party agent auditing, full pre-deployment evaluation regimes), this is a signal that even leading states are not solving these problems alone.
For PSOs starting later, the value of these documents read together is that they provide a shared vocabulary for what to build and for what is genuinely hard.
References
[1] Schmitz, C., Rystrøm, J. and Batzner, J. (2025). Oversight Structures for Agentic AI in Public-Sector Organizations. Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025), pp. 298–308. ACL. aclanthology.org
[2] Ilves, L. and Velsberg, O. (2026). Building the Agentic State in Estonia: What is already taking shape. Substack, 24 April 2026. luukasilves.substack.com
[3] Ilves, L., Kilian, M., Peixoto, T. and Velsberg, O. (2025). The Agentic State. Vision Paper, Tallinn Digital Summit 2025. agenticstate.org
Read full article on LinkedIn →