Some Assembly Required

Building an Application Graph

Early Release: The following content is an early preview and should be considered a work in progress. Information may be missing or incomplete until the final release

Picture this: you’re asked to explain exactly what goes into your organization’s flagship application. Sounds simple, right? But as you start digging, you realize it’s like trying to map an iceberg. There’s the obvious stuff on the surface—the repositories, the build pipelines, maybe some documentation scattered across various wikis. But underneath, there’s a sprawling web of dependencies, microservices, AI models, runtime environments, third-party integrations, and countless other moving pieces that somehow work together to deliver value to your users.

Welcome to the complexity of modern software development, where applications aren’t just monolithic codebases anymore, but distributed ecosystems of interconnected components. Architecture mapping has traditionally been a strategic planning process that outlines a software system’s structure, features, and functionalities, serving as a blueprint for development and providing a clear roadmap for how systems will be built and operated. But what if we could take this concept further and create something that captures not just the theoretical architecture, but the entire living ecosystem of assets that make up a real, running application?

That’s where the concept of an Application Graph—or AppGraph for short—comes in.

What Is an Application Graph?

Let’s get one thing straight from the start: an AppGraph isn’t a product you can buy off the shelf, a vendor solution you can implement next quarter, or another tool to add to your already overflowing toolchain. Think of it as a conceptual framework—a way of thinking about and systematically organizing all the pieces that come together to create, deploy, operate, and maintain your applications throughout their entire lifecycle.

The basic idea is refreshingly straightforward, even if the implementation can be complex. Instead of having scattered information about your application’s components living in different tools, documentation systems, monitoring platforms, and people’s institutional knowledge, an AppGraph brings everything together into a unified, coherent view. It’s like having a detailed architectural blueprint that shows not just the final building, but every beam, wire, pipe, fixture, and foundation element that makes it work—along with the relationships between all these components.

And here’s the crucial part that sets this apart from traditional security-focused asset discovery: this isn’t primarily about security, at least not initially. The focus is on creating a unified way to understand what exactly constitutes an application and how all its parts fit together across all teams, environments, and stages of the development lifecycle. Once we have that solid foundation of understanding, we can start layering on security controls, compliance policies, and risk assessments based on the actual composition and behavior of our applications rather than guessing or relying on outdated documentation that may not reflect reality.

The Building Blocks of an AppGraph

So what actually goes into an AppGraph? Think of it as systematically collecting and correlating data about every asset that touches your application throughout its entire lifecycle—from initial development through production operation and eventual retirement. Here are the key data sources and categories we need to tap into:

Source Code Intelligence

The journey starts with your Source Code Management (SCM) systems, which serve as the foundational source of truth for what your application actually is. CI/CD pipelines are automated processes that allow developers to integrate, test, and deploy code efficiently and reliably, but before we can understand those pipelines and their outputs, we need to understand what’s actually contained within our repositories.

This means connecting to systems like GitHub, GitLab, Bitbucket, or your enterprise SCM solution and going far beyond just identifying repositories. We need to dig deep into what’s actually inside them, analyzing the code itself to generate comprehensive inventories of:

Programming languages and their specific versions, including both primary languages and embedded scripting languages
File types and their architectural implications—what they tell us about the application’s structure, patterns, and potential complexity
Asset types within repositories, including configuration files, documentation, deployment scripts, database migrations, and infrastructure definitions
Dependencies declared in package files, requirements.txt, package.json, Gemfiles, or similar manifests, along with their version constraints and update patterns
Security-relevant files like secrets configurations, certificate stores, and access control definitions
Documentation artifacts that provide context about the application’s purpose, architecture decisions, and operational requirements

This isn’t just academic cataloging for the sake of completeness. Understanding the detailed composition of your codebase helps identify potential security vulnerabilities, licensing conflicts, technical debt accumulation, and architectural inconsistencies before they become operational problems that impact users or business operations.

CI/CD Pipeline Mapping

By automating the processes that support continuous integration and deployment, development and operations teams can minimize human error and maintain consistent, repeatable processes for how software is released. But to include these pipelines effectively in our AppGraph, we need to analyze build files, deployment configurations, and automation systems to map out exactly how code flows from initial commit through to production deployment.

This involves connecting to your CI/CD platforms—whether that’s Jenkins, GitHub Actions, Azure DevOps, GitLab CI, or whatever combination of tools your organization has evolved to use—and developing a comprehensive understanding of:

The stages and steps in each pipeline, including their dependencies, failure conditions, and success criteria
What triggers builds and deployments, from code commits to scheduled releases to manual interventions
Which environments code flows through and what transformations or configurations happen at each stage
Dependencies between different pipelines, including shared libraries, artifacts, and deployment sequences
Who has access to modify or execute these workflows and what approval processes govern changes
Integration points with external systems like testing frameworks, security scanners, artifact repositories, and notification systems

The goal isn’t to replace your existing CI/CD monitoring and analytics tools, but to understand how these automated processes fit into the broader application ecosystem and how they contribute to the overall risk and operational profile of your applications.

Internal Developer Portal Integration

Here’s where things get really interesting and where we can leverage existing organizational investments. An Internal Developer Platform is built by platform teams to create golden paths and enable developer self-service capabilities, consisting of many different technologies and tools integrated in ways that lower cognitive load on developers while maintaining operational standards.

If your organization has invested in an Internal Developer Portal or Platform, it’s probably already collecting, curating, and maintaining a wealth of valuable metadata about your applications. Internal developer portals serve as the primary interface through which developers discover and access internal platform capabilities, making them natural and authoritative sources for AppGraph data.

We can pull in rich information including:

Established tech stacks and their approved components, including preferred libraries, frameworks, and architectural patterns
Service catalogs that define application boundaries and ownership, providing clear context about what constitutes a distinct application versus a shared component
Metadata about connected runtimes, domains, and auxiliary services, including their relationships and dependencies
Team ownership and responsibility mappings, including escalation paths and subject matter experts
Documentation and architectural decision records that provide historical context for why systems are built the way they are
Compliance and governance metadata that tracks adherence to organizational standards and policies

The beauty of tapping into IDPs is that they often already have established processes, workflows, and incentives in place for keeping this information current and accurate. Instead of building parallel systems that compete for developer attention, we can leverage the work teams are already doing as part of their normal development processes.

The AI and Automation Layer

Here’s something that most traditional application mapping approaches miss entirely, but which is becoming increasingly critical: the growing role of AI models, automated agents, and intelligent tooling in modern software development. The Model Context Protocol is an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools, creating new categories of dependencies and integration points.

As AI becomes more deeply integrated into our development workflows, production applications, and operational processes, we need to systematically track:

AI models used in development, testing, or production, including their versions, capabilities, limitations, and update schedules
MCP servers for popular enterprise systems like Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer that enable AI tools to interact with our systems and data
Automated agents and their permissions and capabilities, including what systems they can access and what actions they can perform
AI-assisted code generation tools and their outputs, understanding which parts of your codebase were AI-generated and may require different review or maintenance approaches
Training data dependencies and lineage, particularly for custom models that may have specific compliance or privacy requirements
AI service providers and their SLAs, understanding the external dependencies that your AI-powered features rely on

This isn’t just about keeping track of the latest technological trends. AI components can introduce entirely new categories of security considerations, compliance requirements, operational dependencies, and failure modes that need to be understood and managed with the same rigor as any other critical system component.

Runtime Environment Mapping

Finally, we need to capture comprehensive information about where applications actually run and how they interact with their operational environment. This includes:

Available deployment environments (development, staging, production, etc.) and their configurations, differences, and data flows
Infrastructure components and their configurations, including compute resources, storage systems, networking components, and their relationships
Service meshes and networking configurations that define how components communicate and what security policies govern those interactions
Monitoring and observability tooling that provides visibility into application behavior and performance
Data stores and their relationships, including databases, caches, message queues, and file systems, along with their access patterns and dependencies
External integrations and APIs that your application depends on or provides services to
Security controls and access policies that govern how the application operates and who can interact with it

Much of this information might already exist in various forms within your infrastructure-as-code repositories, Kubernetes configurations, cloud management platforms, or monitoring systems. The key challenge and opportunity is pulling it all together so you can see how runtime environments relate to the applications that use them and understand the full operational context of your software systems.

The Challenge: No One-Size-Fits-All Solution

Here’s the reality check that’s important to acknowledge upfront: because software architecture decisions always come down to trade-offs, there is never one universally right way to solve all these challenges. There is no single tool, platform, or solution that comprehensively addresses all of these requirements today. And honestly, there probably shouldn’t be, given the diversity of organizational contexts, technological choices, and operational requirements across different companies and teams.

Every organization has evolved a different combination of tools, processes, cultural practices, and technical requirements. What works perfectly for a 50-person startup using a straightforward GitHub Actions workflow and deploying to a single cloud provider is going to be completely different from what works for a multinational corporation with hundreds of teams using dozens of different technologies across multiple cloud providers and on-premises infrastructure.

The idea behind AppGraphs isn’t to replace your existing tools, force you into a specific vendor’s ecosystem, or impose a one-size-fits-all solution. Instead, it’s to provide a flexible framework for thinking about how to collect, organize, and correlate all this information in a way that makes sense for your specific organizational context, technical environment, and business requirements.

Some organizations might find success building custom solutions that integrate tightly with their existing toolchains and leverage their specific technology choices. Others might prefer a combination of open-source tools, commercial platforms, and custom scripts that can be adapted as their needs evolve. Still others might start with manual processes and simple spreadsheets before gradually automating the most valuable and frequently-changing information.

The important thing is to start collecting and systematically organizing this information, even if your first version is pretty basic and doesn’t cover every possible data source or use case.

From Data Collection to Action

The real value of an AppGraph isn’t in the data collection itself—it’s in what you can do with that data once you have it organized and accessible. Think of the current effort as building a solid foundation for more advanced capabilities that will pay dividends over time.

Once you understand what’s actually contained within your applications and how all the pieces fit together across their entire lifecycle, you can start applying security scans, compliance controls, and operational policies based on the actual risk profile and business context of each application, rather than treating all applications as identical or making decisions based on incomplete information.

For example, an application that processes payment data, uses third-party AI services, has write access to customer databases, and integrates with external partner systems presents a fundamentally different risk profile than a simple internal dashboard that reads from a single database and displays metrics to internal users. With a comprehensive AppGraph, you can automatically identify these different risk profiles and apply appropriate controls, monitoring, and operational procedures without manual analysis or guesswork.

You can also start answering complex questions that are nearly impossible to address accurately today:

Impact analysis: Which applications would be affected if we deprecated a particular library, service, or infrastructure component?
Security assessment: What’s the potential blast radius if a specific CI/CD pipeline is compromised or a particular service account is breached?
Change management: Which teams need to be involved if we want to change how a particular technology is used or if we need to implement new compliance requirements?
Compliance tracking: How many applications are using AI models that might have specific regulatory requirements or privacy implications?
Cost optimization: Which applications are using expensive infrastructure components or external services that might be candidates for optimization?
Technical debt management: Where are we accumulating the most technical debt and what’s the business impact of addressing it?

Building Your Application Language

At the end of the day, each AppGraph should provide what we might call a standardized “language” framework that describes applications and all of their components in a consistent, comprehensive way. Traditional mapping in software architecture involves creating visual representations or diagrams that illustrate the relationships between various components, layers, and modules within a system, but AppGraphs extend this concept to include operational, organizational, and business context.

Think of it like creating a standardized vocabulary and grammar for describing applications that everyone in your organization can understand and use effectively. Whether you’re a developer trying to understand dependencies before making changes, a security team member assessing risk and implementing controls, a site reliability engineer troubleshooting a production issue, or an executive trying to understand what you’re actually building and running, the AppGraph provides a common language and shared understanding.

This language should be rich enough to capture the full complexity of modern applications—including their technical architecture, operational dependencies, business context, and organizational ownership—but simple enough that teams can actually maintain it without it becoming an overwhelming burden. It should grow and evolve organically with your applications, not become a static snapshot that’s out of date the moment it’s created.

The language should also be actionable, meaning it provides the information necessary to make informed decisions about changes, investments, and risk management rather than just serving as documentation for its own sake.

Getting Started: Think Small, Start Now

If this all sounds overwhelming, here’s the good news: you don’t have to build the complete AppGraph vision on day one. In fact, trying to do everything at once is usually a recipe for failure. Instead, start with one application, one team, or one data source that’s causing you pain right now.

Pick something specific that represents a real operational challenge—maybe it’s an application where you’re constantly surprised by what breaks when you make changes, or a team that spends too much time tracking down dependencies and understanding how their changes might impact other systems, or a compliance requirement that’s difficult to verify because you don’t have clear visibility into your application components.

Begin by manually collecting some of this information for that one focused use case. Document what repositories are involved, what the CI/CD pipeline actually looks like, what services and external dependencies it relies on, and where it runs. Start building that common language for describing the application and its ecosystem.

As you do this work, you’ll start to see patterns emerge naturally. You’ll identify which data sources are most valuable for your specific context, which information changes frequently and would benefit from automation, and what questions you’re trying to answer that you can’t effectively address today with your current tools and processes.

Then, gradually start automating the collection of the most valuable and frequently-changing information. Connect to one or two key systems. Build some simple scripts, dashboards, or integration points. The goal isn’t to solve every possible problem, but to prove the value of the approach, build organizational momentum, and establish patterns that can be extended over time.

The Future of Application Understanding

We’re living through a period of unprecedented change in how software is built, deployed, and operated. The tools, patterns, and practices that worked effectively five years ago are being challenged and transformed by new approaches like AI-assisted development, infrastructure as code, cloud-native architectures, and increasingly sophisticated automation.

Large language models have become widely adopted across the software development lifecycle, and AI-related innovation is now focusing on specialized models, agentic AI systems, and intelligent automation that can understand and manipulate complex software systems. These trends are fundamentally changing what it means to “understand” an application and how we need to think about visibility, control, and governance.

In this rapidly evolving environment, having a clear, comprehensive, and current understanding of what you’re actually building and running isn’t just nice to have—it’s becoming essential for making informed decisions about security, compliance, performance, reliability, and business strategy.

The AppGraph concept provides a practical framework for organizing this understanding in a way that’s both comprehensive and actionable. It acknowledges the inherent complexity of modern software development while providing a clear path toward better visibility and control that can evolve with your organization’s needs.

Most importantly, it recognizes that the goal isn’t to create perfect documentation or comprehensive inventories for their own sake. The goal is to enable better decision-making by providing the context, relationships, and information that teams need to understand and effectively manage the systems they’re responsible for.

So start small, think big, and begin building your own Application Graph. Your future self, your security team, your compliance auditors, and your development teams will thank you for the investment in understanding what you’re actually building and running.