Alternative Data for Portfolio Managers: A Buyer's Guide

How to evaluate, select, and deploy alt data in 2026 — including the signal decay problem everyone knows about and few actually solve for.

The alternative data market reached $11.65 billion in 2025. Estimates project $135 billion by 2030. The growth reflects genuine demand — institutional investors have made alt data a standard part of the research toolkit, and the data vendors have proliferated to meet them.

The result is a buyer's market for data. There are hundreds of vendors, dozens of data types, and enough overlapping claims that evaluating any single dataset has become its own analytical exercise. This guide covers what to evaluate, how to evaluate it, and the traps that even sophisticated buyers fall into.

What Alt Data Actually Is (and What It Isn't)

Alternative data is any information that does not come from company financial disclosures, sell-side research, or traditional financial data providers (Bloomberg, Refinitiv, FactSet). The defining characteristic is that it derives from economic activity in the real world rather than from management-curated communications.

The major categories:

Data Type What It Measures Primary Use Case Typical Cost / Year
Credit & debit card transactionsConsumer spending by merchantRetail, restaurants, travel$150K–$500K
Web-scraped pricing & inventoryReal-time price changes, stock levelsE-commerce, consumer goods$50K–$250K
App downloads & engagementApp installs, DAUs, session metricsTech, fintech, consumer apps$75K–$300K
Satellite & geospatial imageryFoot traffic, inventory, constructionRetail, energy, industrials$100K–$400K
Job postingsHiring by function, location, seniorityLeading indicator for revenue/capex$50K–$150K
Web trafficSite visits, search share, engagementTech, media, e-commerce$30K–$120K
Shipping & logistics dataContainer volumes, port throughputIndustrials, consumer, macro$80K–$200K
Social media sentimentBrand mentions, sentiment shiftsConsumer, pharma, crypto$40K–$150K
Patent filingsR&D direction, competitive intentTech, biotech, industrials$20K–$80K
Clinical trial dataTrial enrollment, outcomes, pipelineBiotech, pharma$50K–$200K

Alt data is not a single thing. Credit card data and satellite imagery have almost nothing in common except that both are non-traditional. Buying "alt data" without specifying the investment question is like buying "research" without specifying the company. The data type must match the analytical question.

The Signal Decay Problem

Every alt data buyer eventually faces this: the dataset they're evaluating has a strong backtest, impressive predictive accuracy, and a compelling pitch. And then they find out 200 other funds already subscribe to it.

Signal decay is the erosion of alpha as a dataset spreads through the market. The economics are straightforward: when a dataset is exclusive, the buyer prices the information before the market does. As ownership spreads, the market prices the information faster — eventually before the PM can trade on it. The alpha window compresses from weeks to days to hours.

The datasets with the highest claimed backtested alpha are typically the ones with the worst decay profile. Vendors sell based on backtests. The backtest period often predates the dataset's wide distribution. A backtest showing 400bps of annual alpha from 2018–2022 tells you almost nothing about alpha in 2026 if the dataset went from 50 buyers to 500 buyers in that period.

How to assess decay before buying:

  1. Ask the vendor directly: how many funds subscribe? Reputable vendors will tell you. If they won't, assume it's bad. 50 subscribers is very different from 500.
  2. Segment the backtest by year. If alpha was 600bps in 2018–2020 and 80bps in 2022–2024, the decay is real and the current alpha is 80bps — not 400bps average.
  3. Assess how long the data takes to process. A weekly file that requires a data science team to clean has less competition than a real-time API that plugs into any quant system in 30 minutes. Infrastructure friction is a moat against decay.
  4. Ask who else is using it. If your prime broker tells you 12 of your top 20 competitors subscribe to the same feed, the signal is crowded.

The Vendor Evaluation Framework

Beyond signal decay, there are seven dimensions to evaluate before committing to any alternative data contract:

1. Data Quality and Coverage

How complete is the dataset? What is the coverage rate — percentage of transactions, locations, or entities captured relative to the true universe? A credit card dataset that captures 8% of US transactions behaves very differently from one that captures 40%. A web-scraping dataset that covers 300 SKUs across 5 retailers is not comparable to one covering 50,000 SKUs across 200 retailers.

Ask for documented coverage statistics. If the vendor can't provide them, the dataset is not production-ready for institutional use.

2. Latency and Frequency

How current is the data? A weekly file with a 5-day lag has a fundamentally different use case than a daily feed with same-day availability. For event-driven strategies around earnings, latency is critical. For longer-duration fundamental strategies, weekly data may be sufficient.

Understand the data's natural reporting cadence as well. App download data updates daily. Satellite crop imagery is constrained by cloud cover and orbital frequency. Match the data's natural latency to your holding period.

3. Historical Depth

How far back does the dataset go? Backtesting requires at least 3–5 years of history across multiple market regimes. A dataset with only 18 months of history cannot be meaningfully backtested. Vendors sometimes launch with thin history and backfill as they grow — verify whether the historical data was collected contemporaneously or reconstructed.

4. Legal and Compliance Risk

This is the underappreciated dimension. Data derived from web scraping, third-party tracking pixels, or app telemetry carries legal risk that has grown significantly since GDPR and CCPA. SEC enforcement actions against alternative data providers and their buyers are no longer theoretical.

Questions to ask every vendor: How is the underlying data collected? Do data subjects consent? Has the vendor obtained a legal opinion? Does the contract include representations and warranties on data legality? Does it include indemnification?

If the vendor's legal documentation is thin, the compliance risk is yours — not theirs.

5. Infrastructure Requirements

What does it take to actually use the data? Raw feeds require data engineering to ingest, clean, and normalize. If your team doesn't have those capabilities, a raw data feed is not the right format. Many vendors offer processed outputs — aggregated metrics, pre-built signals — that reduce infrastructure requirements at the cost of flexibility.

Infrastructure friction is also a competitive moat. A dataset that requires three months to integrate has fewer active buyers than one that ships as a clean Excel file. More friction means less competition and slower decay.

6. Vendor Stability and Support

The alternative data vendor landscape is not stable. Startups enter with novel datasets, get acquired, pivot their product, or shut down. Before signing an annual contract, evaluate: How old is the company? Who are the backers? What is the renewal rate among institutional clients? Is there documented SLA coverage and data quality guarantees?

A dataset that disappears six months into a research process is worse than not having bought it.

7. Exclusivity and Access Tiers

Some vendors offer tiered access — a base tier available to all subscribers, and an exclusive tier available to one or a small number of funds. Exclusivity is expensive but directly addresses the signal decay problem. If the premium for exclusive access is $200K and the signal generates $1M in additional alpha, the math works. If the premium is $500K and the incremental alpha is unclear, it doesn't.

Exclusive access arrangements also require careful contract drafting — define what exclusivity means (sector, geography, usage), what happens if the exclusivity is violated, and what remedies are available.

Build vs. Buy

Some funds build proprietary data collection rather than buying from vendors. This is expensive, slow, and — when it works — extremely valuable. A proprietary dataset by definition has no competitors. Signal decay is minimized until someone reverse-engineers or replicates the collection method.

The build vs. buy decision depends on three factors:

  1. Data uniqueness. If a vendor already sells a clean version of the data you could build, buying is almost always better. You pay for speed and quality. Build only when the data genuinely doesn't exist commercially.
  2. Technical capability. Building a production-grade data pipeline requires engineers, data scientists, legal review of data collection methods, and ongoing maintenance. Factor the full cost of the team, not just the infrastructure.
  3. Strategic value. A proprietary dataset that generates consistent alpha becomes a durable competitive advantage and contributes to AUM growth. If the alpha is large enough and durable enough, the NPV of building can exceed the cost.

A practical heuristic: buy for datasets that are commoditizing (credit card data, web traffic, app downloads) and consider building for datasets where no commercial supplier has adequate coverage of your specific investment universe.

How Alt Data Integrates with Fundamental Research

Alt data is most valuable when it is integrated with — not substituted for — fundamental research. A credit card transaction signal that shows consumer spending at a retailer declining 8% year-over-year is interesting. A credit card signal showing that decline, combined with a fundamental model that identifies margin fragility at that revenue level, and a channel check confirming management is aware but has no ready response — that is an actionable thesis.

The failure mode is treating alt data as a standalone signal generator. Alt data is a source of inputs, not outputs. The PM still has to form a thesis, construct a scenario framework, and determine whether the alt data signal changes the probability distribution on the scenarios.

Funds that integrate alt data best tend to have a clear protocol:

  1. The investment thesis is formulated first (fundamental view)
  2. The alt data question is defined based on the thesis (what signal would confirm or refute this view?)
  3. The relevant dataset is identified and monitored
  4. The signal is assessed against the fundamental model, not as a standalone trade

Alt data that arrives in search of a thesis is noise. Alt data that is sought to answer a specific question is signal.

Cost-Adjusted Alpha: The Only Number That Matters

The end-to-end evaluation of any alternative data investment comes down to one calculation: does the expected alpha generated by this dataset — net of its cost, infrastructure expenses, and time investment — exceed the alpha achievable from deploying those same resources elsewhere?

A $300K dataset generating 200bps of annual alpha on a $500M book is worth $10M per year. The ROI is obvious. A $300K dataset generating 40bps of alpha on a $200M book is worth $800K per year. The ROI is marginal, especially when infrastructure, staff time, and legal costs are included.

Run this calculation explicitly. The alt data market is full of signals that are statistically significant, academically interesting, and economically not worth buying at current pricing.

Alternative data signals in every report. Semper Signum incorporates web traffic, job posting trends, app engagement, and supply chain signals into each covered name — as part of a 22-section analytical framework. Request a sample report →

← All Insights