Build vs Buy · 2-minute read

The AI research system serious firms are trying to build.
Running today.

Across finance, serious research shops are quietly assembling the same four things. AI embedded into existing research workflows — across listed equity, private firms, policy topics, and macro events. An evaluation framework with drift monitoring. A full audit trail with model and prompt versioning. A short pilot that produces measurable ROI before commitment. Semper Signum is that system, operational on day one.

See a real production report → Request the 30-day pilot scope

What firms are trying to build. What we already ship.

Ten requirements we see on every serious institutional roadmap. Left column is the requirement in the buyer's own language. Right column is how Semper Signum delivers it today, on real tickers, with logged evaluations.

The requirement
"Embed AI into existing research workflows, not build standalone tools."
What Semper Signum does
JSON in, HTML + PDF out. Reports integrate into your existing PM workflow without new UI to adopt.
The requirement
"Build an evaluation framework with drift monitoring."
What Semper Signum does
Per-stage evaluators score every output before it flows downstream. Drift catalogued per ticker, per stage, per model.
See how the evaluator works →
The requirement
"Full audit trail for every agent decision. Model and prompt versioning. Rollback capability."
What Semper Signum does
Every model call is logged with prompt hash, parameters, timestamp. Every stage rollback is traceable. Compliance-ready.
See the audit schema →
The requirement
"Hallucination control and reliability at institutional scale."
What Semper Signum does
Step-level verification against source filings. Contradictions trigger rollback and re-run with different model or prompt before the user sees anything.
The requirement
"Governance and Responsible AI posture that compliance and risk can sign off on."
What Semper Signum does
Data lineage diagram, model-provider list, retention policy, deletion SLA, PII rules. Reads like a security whitepaper.
Full governance posture →
The requirement
"Human-in-the-loop gating for high-risk decisions."
What Semper Signum does
Configurable thresholds on any stage. When the evaluator score drops below threshold, the stage escalates to a reviewer before publication.
The requirement
"4-8 week pilot that produces measurable ROI before commitment."
What Semper Signum does
30-day flat-fee pilot. Five production reports, evaluation framework tuned to your workflow, audit-trail handoff. Clean exit at day 30.
Pilot scope →
The requirement
"Adoption metrics, not deployment alone. Are analysts using it in their workflow?"
What Semper Signum does
Every report generation, every view, every export is logged. Pilot handoff includes an adoption dashboard tied to your named users.
The requirement
"Cost and latency tradeoffs explicit in the architecture."
What Semper Signum does
Failover model router: Bailian Qwen for cost, Claude Opus for quality, Claude Sonnet for latency. Customer-configurable per stage.
The requirement
"Integration with existing systems: Aladdin, Bloomberg, FactSet, S3, internal data lake."
What Semper Signum does
Standard REST / S3 / CSV inputs. Outputs as static HTML, PDF, or JSON. Your data stays in your stack; we add the analytical layer.

The math on build.

Rough year-one numbers to ship an equivalent internal system. US market comp, current cloud and tooling costs. Your numbers will differ; the shape will not.

Build it yourself

Hire the team

Technical lead (fully loaded)$280-340k
Two senior engineers$500k
Data / ML engineer$200k
Cloud, model APIs, tooling$80k
Time to first production artifact6-9 months
Risk of deprecation / team churnHigh
Year 1 total~$1.1M
Buy Semper Signum

Run the pilot

30-day pilot (flat fee)$50k
Production reports delivered5+
Evaluation + audit layerIncluded
Time to first report7 days
Internal team requiredNone
Exit after day 30Clean
Pilot total$50k

You can still build internally. A 30-day pilot gives your team a running start with a production system, an evaluator framework, and an audit trail instead of a blank repo.

What production looks like.

Not a demo environment. Not a slide. Actual output shipped on real names, with the audit log compliance expects from a human analyst's workpapers.

Executive Summary
Variant Perception & Thesis
Valuation: DCF + Comps + Scenarios
Risk Framework & Kill Criteria
Competitive Positioning
Adversarial Challenge

The report

Twenty-two structured sections on any subject: public company, private firm, policy topic, or macro event. Same depth, applied consistently. Thesis, valuation through three independent methods, competitive position, risk framework, management assessment. The same analysis a senior analyst would produce in two to four weeks, delivered in hours.

JPM · MC fair_value = -$47 · silent error
eval: ocf_classification · score 0.22 · FAIL
cause: bank OCF structurally negative → FCF margin -81%
action: rollback stage · apply NI-margin proxy
eval: ocf_classification · score 0.94 · PASS
JPM · MC fair_value = $212 · P5 $133 · P95 $352

A real catch, on JPM

Banks carry structurally negative operating cash flow, which silently broke the Monte Carlo valuation into nonsense territory. The per-stage evaluator flagged it, rolled the stage back, applied the correct income-margin proxy, and published $212. Every model call, score, and rollback is logged. See the full JPM report →

NVDA: 47 reports · drift 0.03 · stable
AAPL: 52 reports · drift 0.04 · stable
JPM: 38 reports · drift 0.12 · monitored
BABA: 19 reports · drift 0.28 · review
MSFT: 44 reports · drift 0.02 · stable
total: 90 tickers · avg drift 0.06

Drift monitoring

Across a production book, every ticker carries a drift score. When drift crosses threshold, the ticker enters review and the evaluator is retuned against fresh filings. Nothing silently degrades. Adoption dashboard and evaluator coverage tie directly to your named users.

Semper Signum operator view: grid of running and completed pipeline cards, each showing ticker, stage, agent status, model routing, and evaluator scores.

The operator view: every pipeline, every agent, every model call, with evaluator scores and rollback events surfaced in real time. This is the live surface the 30-day pilot hands you on day one.

The 30-day pilot.

One flat fee. Five reports. A tuned evaluation framework. A clean exit. No procurement drama.

What you get

  • Five production-quality Deep Dive reports on tickers you choose
  • Evaluation framework tuned to your workflow and risk tolerance
  • Audit-trail schema deployed in your VPC or ours
  • Drift-monitoring dashboard with your named users
  • Handoff runbook for your internal team

The ask

$50kflat, 30 days
  • Week 1: discovery + integration
  • Week 2: first production reports
  • Week 3: evaluation calibration
  • Week 4: handoff + training
Full week-by-week scope →

What IT, compliance, and legal will ask.

Eight questions we get in every institutional procurement cycle. If your reviewers have more, the governance page has the long form.

Do you work on-prem?
Hybrid deployment. Orchestration can run in your VPC or in ours. Models are called via your own API keys (Bailian, Anthropic, OpenAI, any combination). Output artifacts land wherever you route them: S3, SharePoint, internal wiki, your PM desk.
Which models do you use, and can we swap them?
Default failover queue: Bailian Qwen for cost, Claude Opus for quality, Claude Sonnet for latency. The router is config-driven. You can pin a single model, swap in your preferred provider, or route different stages to different models based on your cost and latency tradeoffs.
What is the data retention policy?
Configurable. Default: 90 days on logs, 0 days on source data after report generation. On cancellation, full export within 7 business days, full deletion of logs and artifacts within 30 days. Exit is clean and documented.
How do you handle PII?
We do not accept it. Semper Signum researches companies—public and private—using publicly available and licensed data only. We never ingest client-identifying information, personal holdings, or portfolio data. If your workflow requires processing PII, we are not the right vendor. This is a hard constraint, not a policy.
Can compliance audit your evaluation framework?
Yes. Every scoring rule is documented. Thresholds are configurable per stage, per ticker, per customer. You can add your own evaluators and gating thresholds. The evaluator schema is handed over during pilot handoff and re-auditable in production.
What happens to our data if we cancel?
Full export in a standard format within 7 business days. Full deletion of logs, artifacts, and cached model output within 30 days. Written confirmation of deletion provided. No retention beyond that.
Is this reseller-friendly?
Yes for hedge funds, PE firms, asset managers, RIAs, family offices, boutique investment banks, and wealth platforms whose clients are the end readers. Not for direct competitors in the research-platform space. License terms in the pilot contract are explicit on this.
We already have an internal AI roadmap. Why Semper Signum?
The pilot is how your internal effort accelerates. Instead of spending 6-9 months on infrastructure before the first production artifact, your team starts from a working system with an evaluator framework and an audit trail already in place. They spend year one extending, not building from zero.

Send me the pilot scope PDF.

A one-page PDF summarizing the 30-day pilot, who it is for, and what changes after day 30. Goes out the same day.

Or skip straight to a 20-minute call: calendly.com/sempersignum/intro