Apr 5, 2026

Apr 5, 2026

The Data Foundation That Makes AI Work

The Data Foundation That Makes AI Work

Institutional allocators run lean teams. And a high share of their capacity gets spent on data gathering, document reconciliation, formatting, and manual upkeep of systems.

AI’s real promise in this context is not doing their job. It is removing the operational drag that prevents teams from doing the job at the level they are capable of. But there is a condition underneath that promise. The domains where AI is freeing up meaningful capacity share a common trait: the data was already structured. Consider credit underwriting. A loan application arrives with a FICO score, standardized income documentation, and a structured repayment history. The taxonomy is defined by regulation. An AI system operating in that environment can assess risk, flag anomalies, and generate a recommendation without anyone preparing the data first. The infrastructure was built over decades, and AI inherits it.

That is the part most people gloss over. The conversation tends to focus on models, on capabilities, on what AI can do. The harder question is what AI needs before it can do anything useful. And the answer, almost always, is a data foundation.

An Institutional Investing practice

A mid-sized endowment might hold commitments to 60 to 80 managers across private equity, venture, real assets, hedge funds, and credit. Each manager reports on their own schedule, in their own format, using their own terminology. A capital call notice from one GP looks nothing like one from another. Quarterly letters range from designed PDFs with embedded charts to plain-text emails with Excel attachments. There is no shared schema, no standard taxonomy, no common data model across the industry.

The raw material for AI-driven outcomes exists in abundance, but almost none of it is structured. Fund names appear in dozens of variations. Cash flows live in PDFs. The institutional knowledge that matters most, the relationships between managers and funds, funds and commitments, commitments and cash flows, exists in people’s heads and in ad hoc systems that AI tools do not query.

An LLM can summarize a document you give it. It cannot reason across your portfolio if the portfolio is not represented as data. The intelligence layer needs a structured data layer underneath it.

Governance Infrastructure

Before AI can deliver outcomes in any domain, someone has to do the work of cleaning, normalizing, and connecting the data. In allocator workflows, that means ingesting documents across formats, resolving entities to a canonical schema, maintaining relationships as they evolve, and keeping the whole system current as new information arrives. It is unglamorous work, and it is the single biggest bottleneck in effective AI adoption.

What makes this particularly hard is that the data is not static. New documents arrive weekly. Terms get amended. Personnel change. A fund that was in its investment period last quarter may now be in harvest mode. The knowledge layer cannot be built once and left alone. It has to be maintained as a living system, continuously updated as the portfolio evolves. 

Meetings as the missing Input

There is one source of data that is particularly rich but underutilized: meetings. An allocator team might take 300 to 400 meetings a year with GPs, consultants, co-investors, and internal stakeholders. In those conversations, information surfaces that do not appear in any document. A GP mentions a key hire joining next quarter. A fundraising timeline shifts. A co-investment window opens for two weeks. A portfolio company is underperforming in ways that the quarterly letter has not yet reflected.

This is high-signal information, and most of it often doesn’t get captured. It exists in handwritten notes, typed up partially after the fact, and occasionally enters the CRM. Due to that, a GP’s comment about deployment pace might stay in a notebook. It does not update the fund record. It does not inform the next IC memo. It does not trigger a follow-up. 

There is a second, less obvious gap. A significant share of an allocator’s meetings are with prospective managers: new funds being evaluated, emerging managers taking first meetings, GPs re-entering the market with a new strategy. These conversations are often the most information-dense. However, most teams closely track developments around existing managers and maintain them in the CRM or RMS, less so for the prospective managers. A first meeting with a GP might cover fund strategy, team composition, target returns, deployment timeline, terms, and competitive dynamics, all in 45 minutes. But it often lives in an email to the team or a set of notes on someone’s desktop.

Over time, this creates a blind spot. When a fund comes back to market 18 months later and the team tries to reconstruct what they learned in the first meeting, the information is fragmented or gone. The institutional memory that should inform the next decision was never captured in a way the organization can use.

When meeting output feeds directly into a structured data foundation, both of these gaps close. Every conversation, whether with an existing GP or a prospective one, becomes an input to the knowledge graph. A first meeting with a new manager creates a structured record: entity resolved, strategy tagged, key contacts linked, notes connected to the right fund and vintage. A follow-up call six months later builds on that record instead of starting from scratch. Qualitative signals accumulate alongside quantitative data. The system’s understanding of each manager's relationship deepens not just when a new document arrives, but every time someone on the team has a conversation.

Over the course of a year, hundreds of meetings worth of context transform from scattered notes into connected, queryable institutional memory. The pipeline becomes as well-documented as the portfolio. And the AI operating on that foundation can draw on the full breadth of what the organization knows, not just what someone remembered to enter into a system.

The Flywheel

This is where the compounding happens. Structured data makes AI more effective. More effective AI generates more structured data. Each meeting, each document, each interaction adds to the foundation. The queries get sharper. The outputs get more reliable. The institutional memory gets deeper.

The allocators who recognize this will compete on the quality of their data infrastructure. A three-person investment team with a well-maintained knowledge layer and AI tools that operate on it will evaluate more managers, prepare for IC meetings faster, and act on time-sensitive opportunities sooner than a ten-person team still reconciling spreadsheets. It is not about AI replacing people. It is about AI multiplying the impact of the people who have given it the right foundation to work from.

This is what we are building at Finpilot. The platform starts with the data layer: ingesting and normalizing the full document universe of an allocator’s portfolio into a structured knowledge graph. Every product we build, from research tools to reporting to meeting intelligence, operates on that same foundation. The infrastructure comes first. The outcomes follow.

If you are keen to improve how your investment office operates, we would love to chat. Book a 15-minute demo with Finpilot today.

Ready to supercharge your team with AI?

See how Finpilot works with your data and turns a lean team into an analytical powerhouse

Ready to supercharge your team with AI?

See how Finpilot works with your data and turns a lean team into an analytical powerhouse

Ready to supercharge your team with AI?

See how Finpilot works with your data and turns a lean team into an analytical powerhouse

Subscribe to our newsletter

© 2025 Finpilot. All rights reserved.

Subscribe to our newsletter

© 2025 Finpilot. All rights reserved.

Subscribe to our newsletter

© 2025 Finpilot. All rights reserved.