Why Plugins Are the Path to AGI: The App Store of AI

The Wrong Question

The dominant debate in AI research is fixated on the wrong variable. Scale alone will not get us to Artificial General Intelligence. Parameters alone will not get us there. Training data alone will not get us there. The question is not how big the model is. The question is what the model can reach.

Sam Altman put it plainly in January 2025: "We are now confident we know how to build AGI as we have traditionally understood it." What he was describing was not a bigger model. He was describing a system of agents equipped with tools, capable of taking autonomous, multi-step action in the world. "We believe that, in 2025, we may see the first AI agents join the workforce and materially change the output of companies." The emphasis was not on reasoning capacity in isolation. It was on the combination of reasoning and doing. Thinking and acting. Model and tool.

This thesis argues that tools and plugins are not accessories to AI systems. They are the primary lever by which a capable but bounded model becomes something general. And the mechanism by which those tools get built, distributed, and monetized is not novel. We have seen this model before. It is the App Store.

The Historical Pattern: Intelligence Multiplied by Tools

The relationship between cognition and tools is not new to AI. It is the oldest story in human civilization.

Fire did not make the brain larger. It extended what the brain could accomplish. It compressed the time required to digest food, freeing metabolic energy for cognition. It provided protection and warmth that allowed settlement. It became the center of social gathering, accelerating language and culture. Fire was, in the most precise sense, a cognitive multiplier: the brain was the same, but its effective output expanded dramatically.

Writing did not improve individual memory. It externalized it. For the first time in human history, knowledge could survive a single person's death and could be retrieved by someone who never met the author. The printing press scaled that retrieval from hundreds of handwritten copies to millions of printed ones, and the result was the Renaissance, the Reformation, and the Scientific Revolution inside two centuries.

The pattern is consistent. The wheel did not improve human muscular capacity. The telescope did not improve human eyesight. The computer did not improve human arithmetic ability. Each tool extended the surface area over which native intelligence could operate, and each one produced a civilizational discontinuity that looked impossible immediately before it happened.

The base cognitive hardware changed slowly. The tools changed fast. And the tools are what mattered.

The LLM Without Tools: A Locked Room

An LLM in isolation is extraordinary at what it does and structurally incapable of what it needs to do to be general.

The constraints are not subtle. A raw model has no access to live information. It cannot remember what happened in the conversation before this one. It cannot send a message, update a record, execute code against real data, or take any action that persists in the world beyond generating text. It can reason about booking a flight. It cannot book one.

Research from 2024 and 2025 confirms this gap is structural, not cosmetic. A landmark paper co-authored by over 30 researchers including Yoshua Bengio identified persistent, autonomous long-term memory as the single most critical architectural gap between current systems and true AGI. Models "score near 0%" on tasks requiring persistent memory across long horizons, not because the reasoning engine is flawed, but because it has no persistent state. The CPU has no RAM. The mind has no notebook.

This is the locked-room problem. Brilliant reasoning, no exit.

Tools are the exit.

The App Store Moment: Third-Party Intelligence at Scale

In 2007, Steve Jobs launched the iPhone with no native third-party apps. His position was that the device was already complete. Board members including Art Levinson and Phil Schiller argued back. Their case, as Walter Isaacson documented, was that the true intelligence of the platform would be unlocked by the creativity of a massive developer ecosystem. Jobs eventually agreed, and in 2008 the App Store launched.

What followed is the most instructive case study in platform economics in modern history. Apple did not need to know that someone would build a navigation app, or a food delivery app, or a banking app, or a camera replacement, or a medical monitoring tool. Apple built the platform. The developers built the capability. The store curated and distributed it. And collectively, the combination of iOS plus the App Store became something qualitatively different from what either Apple or any single developer could have built alone.

The smartphone was already capable. The ecosystem made it general.

This is the exact structure now playing out in AI.

The Model Context Protocol (MCP), introduced by Anthropic in late 2024 and rapidly adopted across the industry, is the USB-C of the AI world. It is a standardized layer that allows any agent to connect to any tool without custom integration work per combination. Microsoft Copilot Studio, Google Gemini Workspace Actions, and OpenAI's GPT Store are all building variations of the same architecture: a central intelligent runtime that dynamically loads specialized plugins contributed by third-party developers. The parallel is not metaphorical. It is structural.

OpenAI's ToolACE research (March 2025) built an automatic pipeline that synthesized an API pool of over 26,000 diverse tools, achieving GPT-4 level performance on tool-use tasks using models as small as 8 billion parameters. The implication is significant: the intelligence multiplier from a broad, well-curated tool library can outperform raw model scale. A smaller model with better tools beats a larger model with none. Breadth of the App Store matters as much as the power of the phone.

Plugins as Apps: The Developer Economy for Cognition

The App Store analogy holds not just architecturally but economically.

When Apple opened the App Store, it created a new class of developer. Not systems engineers building OS-level infrastructure, but product developers building specific, useful capabilities on top of a standardized platform. The barrier to entry dropped. The surface area of what the phone could do exploded. And monetization happened at the capability layer, not just the platform layer.

The same transition is underway for AI. Third-party developers are now building plugins that extend what an agent can do: domain-specific data access, specialized calculation engines, integration with vertical software like ERP systems, real-time data feeds, and tools that handle tasks no foundation model was trained to handle natively. The agentic AI market, estimated at $7-9 billion in 2025, is projected to exceed $90 billion by 2032 as enterprise adoption moves from chatting to doing (futureforce.ai, 2025). Approximately 67% of Fortune 500 companies have deployed production-grade agentic workflows as of this year.

The monetization models are evolving beyond the flat App Store purchase. They include usage-based billing per API call, outcome-based billing where a developer earns only when the agent successfully completes a task, subscription models for high-value domain agents, and revenue-share arrangements modeled loosely on the App Store's 70/30 split. The developer who builds a best-in-class legal research tool for an AI agent runtime does not need to build a model. They build the plugin, list it in the registry, and earn every time an agent calls it.

This is the economics of a platform, not a product. And platform economics compound.

PIE as the Operating System for This Transition

This is where theory meets practice.

The PIE architecture is a direct implementation of the App Store model for intelligence. The .pie file registry is the App Store. Plugins are the apps. The Harness is iOS. The LLM is the chip.

The mapping is precise:

AI Component	PIE Implementation	App Store Analog
LLM	Core reasoning engine	The A-series chip
Plugin / Tool	Dynamically loaded capability	App
.pie File Registry	Portable agent bundles with tools + memory + soul	App Store
Harness	Runtime orchestration layer	iOS
MCP / API Integrations	Standardized connection to external services	App Store APIs and SDKs
E2B Sandboxed Sessions	Isolated plugin execution	App sandboxing
Token Budget	Consumable execution resource	Battery / compute
Memory Layer	Persistent compounding context	iCloud / persistent state

When a third-party developer builds a plugin on PIE, they are doing exactly what an iOS developer does: writing to a standardized interface, packaging their capability into a portable unit, and making it available to any agent that needs it. The PIE registry curates and distributes. The Harness loads and executes. The developer earns from usage.

The critical distinction from the original App Store model is agency. An iOS app waits for a human to tap it. A PIE plugin is called by an agent that has decided it needs that capability to complete a task. The developer is not building for human attention. They are building for machine intent. And machine intent, unlike human attention, scales without fatigue.

Anthropic's research on tool discovery confirms the direction: the frontier research goal for 2025 and beyond is "zero-shot tool use," where an agent reads documentation for a new tool and begins using it without additional training. This is the equivalent of an iPhone that can install and use any app it encounters without the user having to consciously open it. The agent becomes its own power user of the App Store it lives inside.

From Narrow to General: The Compounding Effect

The ARC-AGI benchmark, which tests novel problem-solving rather than memorization, became the gold standard for tracking this transition in 2024-2025. OpenAI's o3 model reached approximately 87% accuracy, approaching human-level reasoning for the first time. What changed was not just scale. It was the integration of search-based reasoning, structured outputs, and tool orchestration into the inference loop. The model did not get smarter in isolation. It got smarter because it was better equipped.

This is the compounding dynamic. Each new plugin added to an agent's repertoire expands the set of problems it can solve. Each memory write makes future reasoning more accurate. Each API integration adds a new domain of real-world effect. The Berkeley Function Calling Leaderboard (BFCL v3), the industry standard for measuring multi-turn tool interaction quality, shows consistent improvement as these integrations deepen and as models are specifically trained to use tools rather than treating tool use as an afterthought.

The current research frontier, "Autonomous Tool Discovery," where agents can read documentation and adopt new tools without retraining, is the last significant barrier before tool-augmented agents can generalize across genuinely novel domains. Once an agent can acquire new capabilities on demand, the distinction between a capable agent and a generally intelligent system becomes academic.

Conclusion: The Ecosystem Is the Intelligence

Apple did not become the most valuable company in history because it made good chips. It became the most valuable company in history because it built a platform that let millions of developers extend what those chips could do, packaged it in a way that users could access and trust, and created an economic structure that rewarded everyone for participating.

The same logic applies to AGI. No single model will be general because no single team has the domain knowledge, the data access, or the engineering capacity to build every capability that general intelligence requires. But a platform that lets thousands of developers build specialized plugins, distributes them through a curated registry, executes them safely in sandboxed environments, and lets agents call them dynamically at runtime can be general. The platform becomes the intelligence because the ecosystem fills in every gap that the core model cannot.

Fire was not the end. It was the infrastructure. The App Store was not the end. It was the infrastructure. The tool layer for AI agents is not the end of this story either. It is the infrastructure on which the rest of what follows gets built.

PIE is building that infrastructure. The .pie registry is the store. The plugins are the apps. The Harness is the OS. And when those pieces compound over time, with memory deepening context and tools expanding reach and third-party developers racing to fill every vertical niche worth filling, the question of whether the resulting system is generally intelligent will answer itself.

Sources

Altman, S. (2025). Reflections. samaltman.com. OpenAI blog on agentic AI workforce predictions.
ToolACE Research (March 2025). Automatic agentic pipeline for tool learning; 26,000+ API pool synthesis. Referenced via ml-science.com.
Berkeley Function Calling Leaderboard v3 (September 2024). Industry standard for multi-turn tool interaction evaluation.
Bengio, Y. et al. (2025). Long-term memory as critical AGI bottleneck. 30+ researcher collaborative paper. Referenced via openreview.net.
Isaacson, W. Steve Jobs (2011). Simon & Schuster. On the App Store origin and third-party developer ecosystem decision.
Anthropic (May 2024). Tool Use General Availability and Computer Use. Claude 3.5 Sonnet documentation.
ARC-AGI Prize (2024-2025). OpenAI o3 benchmark results. arcprize.org.
Model Context Protocol (MCP). Anthropic, late 2024. "USB-C for AI" standard for agent-to-tool connectivity.
futureforce.ai (2025). Agentic AI market size projections: $7-9B in 2025, $90B+ by 2032.
agilayer.com (2025). Agentic AI as the architectural bridge to AGI: survey of 2025 research landscape.

Tools as the App Store of Intelligence: Why Plugins Are the Path to AGI