A retrieval-augmented Knowledge Base that ingests everything into Bedrock — and a Sales Intelligence Dashboard built on top of it. Architecture & data flows, visualized.
The RAG knowledge base is the shared foundation. The Sales Dashboard reuses its ingest pipeline and mirror tables, then layers deterministic analytics + LLM agents on top.
Normalizes 8 sources into S3 + sidecar metadata, indexes them in a Bedrock Knowledge Base, and answers questions through web chat, Slack/Lark bots, and an MCP server.
Fuses HubSpot / Fathom / Redmine / Gmail into a daily pipeline report, a grounded sales chat, and the proactive @SilkSales Slack bot. Deterministic first, LLM second.
Every source normalizes content into an S3 object plus a .metadata.json sidecar. One Bedrock Knowledge Base indexes them; three surfaces retrieve with the same filter vocabulary.
Every source writes both representations. The structured plane makes the next cron tick cheap and powers deterministic queries; the semantic plane powers free-text retrieval.
Indexed by the Bedrock Knowledge Base for free-text retrieval across chat, Slack/Lark and MCP. The sidecar carries the filterable metadata (type, source, month, year…).
Dedup ("have I seen this id?"), sync bookkeeping, and deterministic queries. A row's existence is what lets the next tick skip already-ingested ids — and what the Sales system reads.
HTTP routes — status, manual "sync now", account CRUD, webhook (signature-verified).
Fetch + normalize logic for that source's external API.
The run loop — fetch → filter → write S3 (+meta) → upsert Postgres → trigger KB.
Interval scheduler wired into main.ts; runs wrapped by runSyncTask.
The store supports only equals / notEquals — no ranges. That single constraint shapes the whole metadata design.
| Source | Key filterable metadata | Note |
|---|---|---|
| shared | month year updated_at | Pre-computed at ingest — the equals-only workaround for date ranges. |
| meeting | type title recording_id date is_external | Fathom = type:meeting, Fireflies = type:fireflies. |
| slack | source channel client_id is_private | client_id enables strict per-customer scoping. |
| redmine | source project_name ticket_id status | Omits month/year to stay under the ~10-attribute cap. |
source account from_addr subject direction | Attachments inherit the parent's metadata. |
Hard cap: ~10 filterable attributes per chunk. A field is promoted to metadata only if it's filtered on; everything else stays inside the embedded body text. That's why duration_minutes lives in the markdown, not the sidecar.
Two backends: bedrock (InvokeAgent — full orchestration + memory) and stream (Retrieve + ConverseStream, real token deltas). Citations dedupe to inline [1] refs.
A cheap Converse call classifies the question to an enabled agent, then InvokeAgent answers scoped to the current channel. Suffixed with — answered by <byline>.
search_knowledge, get_customer_brief, get_meeting_summary, list_recent_docs — for external MCP clients (Claude Desktop, etc.).
Deterministic aggregation reads the mirror tables first; LLM agents only narrate, rank and classify on top — they never invent figures.
Numbers come from read-time SQL over mirror tables. LLMs narrate, rank and classify on top — pinned to candidate indices so they can never fabricate a deal or a figure.
open/won/lost, by-stage/owner/quarter, Deal Pulse (stale/overdue), rep health, weighted forecast vs target, win-loss, velocity, movement, quota — all read-time aggregation, owner-scoped.
dealRisk · upsell · execAction · narrative — one sales_reports row per run with deltas vs the last. Best-effort: a failed narrative still ships a SUCCESS report.
The match between a HubSpot company and a Redmine project is what makes "existing customer" and upsell possible. Auto after every sync, with a human review gate for fuzzy matches.
Exact + suffix-stripped name match, then a hierarchy walk so a subproject inherits its parent's company. MatchType = exact · normalized · inherited · manual.
manual+company = pinned · manual+null = durable "not a customer" exclude · null = auto. Auto-match never overrides a manual decision.
Fuzzy (normalized) matches surface in a "Needs review" filter; POST /customers/confirm pins a page of them to manual.
3 headline numbers + AI daily briefing + weighted-Q / of-target stats.
Per-rep health (Active / Quiet / Dark), tier, calls-this-week, last activity.
Deterministic candidates → LLM chief-of-staff re-rank, never fabricated.
push_risk · competitive · stalled · quarter_slip · se_gap.
Filterable table: stale ≥30d, overdue, days-since-update, risk badge.
Won/open vs quarter & year target; upsell incl. launch-approaching.
Its own Slack app (token + signing secret). In "sales mode" it answers everything via the sales agent + daily-snapshot grounding. A mid-day cron pushes critical signals.
HubSpot credential gotcha: needs a Private App access token (pat-…, scopes crm.objects.{deals,companies,contacts}.read). A bare Developer API Key / legacy hapikey does not work — until set, the whole intelligence layer runs on empty data.
One Bun process. Elysia API with end-to-end type safety via Eden Treaty; React 19 + Ant Design front-end; Amazon Bedrock for all inference.
Single-process API, /api prefix, public + guarded route groups.
Mirror tables, sync bookkeeping, deterministic queries.
Retrieve · InvokeAgent · Converse. Default Nova Pro → Claude when unlocked.
Eden Treaty end-to-end types; Zustand state; violet _sales/ui kit.
38 Prisma models. The RAG mirror + the Sales-specific analytics tables.
| RAG / ingest | Holds |
|---|---|
| email_messages | Gmail dedup + S3 key + direction |
| drive_files | Mirrored Drive files |
| redmine_* | projects · issues · members · users |
| slack_messages | Tracked-channel messages |
| lark_messages | Tracked-chat messages |
| fathom_meetings | Transcripts + attendees |
| fireflies_meetings | Transcripts + attendees |
| slack_agents | Personas + kbFilter presets |
| sync_tasks · items | Per-run bookkeeping |
| eval_* | Agent regression tests |
| Sales-specific | Holds |
|---|---|
| hubspot_companies | CRM companies |
| hubspot_contacts | CRM contacts + emails |
| hubspot_deals | Deals + ownerEmail |
| hubspot_deal_snapshots | One row per deal per day |
| sales_reports | Daily report payload + deltas |
| risk_flags | Deal-risk agent output |
| exec_actions | Ranked action list |
| upsell_cards | Upsell opportunities |
| reps | Active-AE roster |
| sales_users · settings | SSO portal + quota targets |