As-built · Bun · Elysia · Prisma · Bedrock

Two systems,
one knowledge · one pipeline

A retrieval-augmented Knowledge Base that ingests everything into Bedrock — and a Sales Intelligence Dashboard built on top of it. Architecture & data flows, visualized.

8ingest sources
3answer surfaces
6report sections
4LLM agents
38data tables
01 — The big picture

How the two systems relate

The RAG knowledge base is the shared foundation. The Sales Dashboard reuses its ingest pipeline and mirror tables, then layers deterministic analytics + LLM agents on top.

AI Labs Knowledge Base (RAG)

Normalizes 8 sources into S3 + sidecar metadata, indexes them in a Bedrock Knowledge Base, and answers questions through web chat, Slack/Lark bots, and an MCP server.

Sales Intelligence Dashboard

Fuses HubSpot / Fathom / Redmine / Gmail into a daily pipeline report, a grounded sales chat, and the proactive @SilkSales Slack bot. Deterministic first, LLM second.

02 — Data flow

Ingest → Index → Answer

Every source normalizes content into an S3 object plus a .metadata.json sidecar. One Bedrock Knowledge Base indexes them; three surfaces retrieve with the same filter vocabulary.

Sources
Gmail
3-day cron + webhook
Drive
file mirror + extract
Redmine
projects · issues
Slack / Lark
tracked channels
Fathom · Fireflies
meeting transcripts
HubSpot
10-min cron
Store
S3 Bucket
object + .metadata.json
PostgreSQL
mirror + dedup rows
Index
Bedrock KB
S3 Vectors · embeddings + equals-filter metadata
Ingestion coalescer
1 job at a time · debounce + 409 retry
Answer
Web Chat
stream + citations
Slack / Lark
classifier → agent
MCP Server
4 tools · external clients
03 — Core idea

Two data planes, kept in parallel

Every source writes both representations. The structured plane makes the next cron tick cheap and powers deterministic queries; the semantic plane powers free-text retrieval.

// semantic

S3 object + metadata sidecar

Indexed by the Bedrock Knowledge Base for free-text retrieval across chat, Slack/Lark and MCP. The sidecar carries the filterable metadata (type, source, month, year…).

// structured

Postgres mirror tables

Dedup ("have I seen this id?"), sync bookkeeping, and deterministic queries. A row's existence is what lets the next tick skip already-ingested ids — and what the Sales system reads.

04 — Anatomy

Every source module looks the same

controller.ts

HTTP routes — status, manual "sync now", account CRUD, webhook (signature-verified).

service.ts

Fetch + normalize logic for that source's external API.

ingest.ts

The run loop — fetch → filter → write S3 (+meta) → upsert Postgres → trigger KB.

cron.ts

Interval scheduler wired into main.ts; runs wrapped by runSyncTask.

05 — The contract

Metadata schema & S3-Vectors limits

The store supports only equals / notEquals — no ranges. That single constraint shapes the whole metadata design.

SourceKey filterable metadataNote
sharedmonth year updated_atPre-computed at ingest — the equals-only workaround for date ranges.
meetingtype title recording_id date is_externalFathom = type:meeting, Fireflies = type:fireflies.
slacksource channel client_id is_privateclient_id enables strict per-customer scoping.
redminesource project_name ticket_id statusOmits month/year to stay under the ~10-attribute cap.
emailsource account from_addr subject directionAttachments inherit the parent's metadata.

Hard cap: ~10 filterable attributes per chunk. A field is promoted to metadata only if it's filtered on; everything else stays inside the embedded body text. That's why duration_minutes lives in the markdown, not the sidecar.

06 — Answer surfaces

Three ways in, one filter vocabulary

Web Chat

Two backends: bedrock (InvokeAgent — full orchestration + memory) and stream (Retrieve + ConverseStream, real token deltas). Citations dedupe to inline [1] refs.

Slack / Lark

A cheap Converse call classifies the question to an enabled agent, then InvokeAgent answers scoped to the current channel. Suffixed with — answered by <byline>.

MCP Server

search_knowledge, get_customer_brief, get_meeting_summary, list_recent_docs — for external MCP clients (Claude Desktop, etc.).

KB filter preset → RetrievalFilter slackScope · larkScope meeting · fireflies · redmine hubspot · email · drive stored on slack_agents.kbFilter
07 — Foundation

Shared tech stack

One Bun process. Elysia API with end-to-end type safety via Eden Treaty; React 19 + Ant Design front-end; Amazon Bedrock for all inference.

Elysia · Bun

Single-process API, /api prefix, public + guarded route groups.

Prisma · Postgres

Mirror tables, sync bookkeeping, deterministic queries.

Amazon Bedrock

Retrieve · InvokeAgent · Converse. Default Nova Pro → Claude when unlocked.

React 19 · AntD

Eden Treaty end-to-end types; Zustand state; violet _sales/ui kit.

08 — Persistence

Data model at a glance

38 Prisma models. The RAG mirror + the Sales-specific analytics tables.

RAG / ingestHolds
email_messagesGmail dedup + S3 key + direction
drive_filesMirrored Drive files
redmine_*projects · issues · members · users
slack_messagesTracked-channel messages
lark_messagesTracked-chat messages
fathom_meetingsTranscripts + attendees
fireflies_meetingsTranscripts + attendees
slack_agentsPersonas + kbFilter presets
sync_tasks · itemsPer-run bookkeeping
eval_*Agent regression tests
Sales-specificHolds
hubspot_companiesCRM companies
hubspot_contactsCRM contacts + emails
hubspot_dealsDeals + ownerEmail
hubspot_deal_snapshotsOne row per deal per day
sales_reportsDaily report payload + deltas
risk_flagsDeal-risk agent output
exec_actionsRanked action list
upsell_cardsUpsell opportunities
repsActive-AE roster
sales_users · settingsSSO portal + quota targets