AI Labs · Systems Architecture

02 — Data flow

Ingest → Index → Answer

Every source normalizes content into an S3 object plus a .metadata.json sidecar. One Bedrock Knowledge Base indexes them; three surfaces retrieve with the same filter vocabulary.

Sources

Gmail

3-day cron + webhook

Drive

file mirror + extract

Redmine

projects · issues

Slack / Lark

tracked channels

Fathom · Fireflies

meeting transcripts

HubSpot

10-min cron

Store

S3 Bucket

object + .metadata.json

PostgreSQL

mirror + dedup rows

Index

Bedrock KB

S3 Vectors · embeddings + equals-filter metadata

Ingestion coalescer

1 job at a time · debounce + 409 retry

Answer

Web Chat

stream + citations

Slack / Lark

classifier → agent

MCP Server

4 tools · external clients

03 — Core idea

Two data planes, kept in parallel

Every source writes both representations. The structured plane makes the next cron tick cheap and powers deterministic queries; the semantic plane powers free-text retrieval.

// semantic

S3 object + metadata sidecar

Indexed by the Bedrock Knowledge Base for free-text retrieval across chat, Slack/Lark and MCP. The sidecar carries the filterable metadata (type, source, month, year…).

// structured

Postgres mirror tables

Dedup ("have I seen this id?"), sync bookkeeping, and deterministic queries. A row's existence is what lets the next tick skip already-ingested ids — and what the Sales system reads.

04 — Anatomy

Every source module looks the same

controller.ts

HTTP routes — status, manual "sync now", account CRUD, webhook (signature-verified).

service.ts

Fetch + normalize logic for that source's external API.

ingest.ts

The run loop — fetch → filter → write S3 (+meta) → upsert Postgres → trigger KB.

cron.ts

Interval scheduler wired into main.ts; runs wrapped by runSyncTask.

05 — The contract

Metadata schema & S3-Vectors limits

The store supports only equals / notEquals — no ranges. That single constraint shapes the whole metadata design.

Source	Key filterable metadata	Note
shared	`month` `year` `updated_at`	Pre-computed at ingest — the equals-only workaround for date ranges.
meeting	`type` `title` `recording_id` `date` `is_external`	Fathom = `type:meeting`, Fireflies = `type:fireflies`.
slack	`source` `channel` `client_id` `is_private`	`client_id` enables strict per-customer scoping.
redmine	`source` `project_name` `ticket_id` `status`	Omits `month`/`year` to stay under the ~10-attribute cap.
email	`source` `account` `from_addr` `subject` `direction`	Attachments inherit the parent's metadata.

Hard cap: ~10 filterable attributes per chunk. A field is promoted to metadata only if it's filtered on; everything else stays inside the embedded body text. That's why duration_minutes lives in the markdown, not the sidecar.

06 — Answer surfaces

Three ways in, one filter vocabulary

Web Chat

Two backends: bedrock (InvokeAgent — full orchestration + memory) and stream (Retrieve + ConverseStream, real token deltas). Citations dedupe to inline [1] refs.

Slack / Lark

A cheap Converse call classifies the question to an enabled agent, then InvokeAgent answers scoped to the current channel. Suffixed with — answered by <byline>.

MCP Server

search_knowledge, get_customer_brief, get_meeting_summary, list_recent_docs — for external MCP clients (Claude Desktop, etc.).

KB filter preset → RetrievalFilter slackScope · larkScope meeting · fireflies · redmine hubspot · email · drive stored on slack_agents.kbFilter

02 — Data flow

Mirror → Aggregate → Reason → Deliver

Deterministic aggregation reads the mirror tables first; LLM agents only narrate, rank and classify on top — they never invent figures.

Mirror tables

hubspot_deals

+ _deal_snapshots

redmine_projects

links · parentId

meetings

attendees · domains

reps · settings

roster · quota targets

Deterministic

pipeline + analytics

read-time aggregation · no LLM

buildSalesFacts()

daily snapshot grounding

LLM agents

dealRisk

5 risk types

upsell

launch-approaching

execAction

chief-of-staff rerank

narrative

best-effort summary

Deliver

/sales Report

6 sections

Sales Chat

surface: sales

@SilkSales

mid-day push + DM

03 — Design stance

Deterministic first, LLM second

Numbers come from read-time SQL over mirror tables. LLMs narrate, rank and classify on top — pinned to candidate indices so they can never fabricate a deal or a figure.

// deterministic

pipeline + salesanalytics

open/won/lost, by-stage/owner/quarter, Deal Pulse (stale/overdue), rep health, weighted forecast vs target, win-loss, velocity, movement, quota — all read-time aggregation, owner-scoped.

// generative

salesreport/agents

dealRisk · upsell · execAction · narrative — one sales_reports row per run with deltas vs the last. Best-effort: a failed narrative still ships a SUCCESS report.

04 — The join

Customer linking (CRM ↔ delivery)

The match between a HubSpot company and a Redmine project is what makes "existing customer" and upsell possible. Auto after every sync, with a human review gate for fuzzy matches.

autoMatch

Exact + suffix-stripped name match, then a hierarchy walk so a subproject inherits its parent's company. MatchType = exact · normalized · inherited · manual.

Durability convention

manual+company = pinned · manual+null = durable "not a customer" exclude · null = auto. Auto-match never overrides a manual decision.

Review gate

Fuzzy (normalized) matches surface in a "Needs review" filter; POST /customers/confirm pins a page of them to manual.

05 — Daily Report

Six sections, read in five minutes

Executive Summary

3 headline numbers + AI daily briefing + weighted-Q / of-target stats.

Rep Activity

Per-rep health (Active / Quiet / Dark), tier, calls-this-week, last activity.

Top-5 Action List

Deterministic candidates → LLM chief-of-staff re-rank, never fabricated.

Deal Risk Radar

push_risk · competitive · stalled · quarter_slip · se_gap.

Deal Pulse

Filterable table: stale ≥30d, overdue, days-since-update, risk badge.

Pipeline vs Target & Upsell

Won/open vs quarter & year target; upsell incl. launch-approaching.

06 — Proactive

@SilkSales — a separate bot

Its own Slack app (token + signing secret). In "sales mode" it answers everything via the sales agent + daily-snapshot grounding. A mid-day cron pushes critical signals.

quarter_slip — close date pushed blocker — ≥ $100K stuck critical_health — ≥ $45K call-flagged exec_flag — newly in exec actions

HubSpot credential gotcha: needs a Private App access token (pat-…, scopes crm.objects.{deals,companies,contacts}.read). A bare Developer API Key / legacy hapikey does not work — until set, the whole intelligence layer runs on empty data.

RAG / ingest	Holds
email_messages	Gmail dedup + S3 key + direction
drive_files	Mirrored Drive files
redmine_*	projects · issues · members · users
slack_messages	Tracked-channel messages
lark_messages	Tracked-chat messages
fathom_meetings	Transcripts + attendees
fireflies_meetings	Transcripts + attendees
slack_agents	Personas + kbFilter presets
sync_tasks · items	Per-run bookkeeping
eval_*	Agent regression tests

Sales-specific	Holds
hubspot_companies	CRM companies
hubspot_contacts	CRM contacts + emails
hubspot_deals	Deals + ownerEmail
hubspot_deal_snapshots	One row per deal per day
sales_reports	Daily report payload + deltas
risk_flags	Deal-risk agent output
exec_actions	Ranked action list
upsell_cards	Upsell opportunities
reps	Active-AE roster
sales_users · settings	SSO portal + quota targets

Two systems,
one knowledge · one pipeline

How the two systems relate

AI Labs Knowledge Base (RAG)

Sales Intelligence Dashboard

Ingest → Index → Answer

Two data planes, kept in parallel

S3 object + metadata sidecar

Postgres mirror tables

Every source module looks the same

controller.ts

service.ts

ingest.ts

cron.ts

Metadata schema & S3-Vectors limits

Three ways in, one filter vocabulary

Web Chat

Slack / Lark

MCP Server

Mirror → Aggregate → Reason → Deliver

Deterministic first, LLM second

pipeline + salesanalytics

salesreport/agents

Customer linking (CRM ↔ delivery)

autoMatch

Durability convention

Review gate

Six sections, read in five minutes

Executive Summary

Rep Activity

Top-5 Action List

Deal Risk Radar

Deal Pulse

Pipeline vs Target & Upsell

@SilkSales — a separate bot

Shared tech stack

Elysia · Bun

Prisma · Postgres

Amazon Bedrock

React 19 · AntD

Data model at a glance