Practical AI · Built in Melbourne
Knowledge Management that cites its sources
Search across your OneDrive, SharePoint, or Drive. Every answer cites the document and page so your team can verify before they act.
Why corporate search still falls short
Most organisations have a document estate that grew across decades — multiple SharePoint sites, an old shared drive, a current OneDrive, a wiki nobody maintains. The information is in there. The retrieval is the broken part. Three failures keep showing up when we audit existing knowledge stacks.
Keyword search misses anything phrased differently.
SharePoint search returns 47 results for the exact phrase and zero for the synonym. Staff give up after the second non-result and ask a colleague who's been there longer.
Newer assistants ignore your existing permission rules.
An off-the-shelf RAG tool indexes everything it can read — including the documents your IT team set permissions on for a reason. The first audit finding is usually a privacy breach the team didn't realise they'd shipped.
Answers without citations get treated as opinion.
When the assistant says 'the policy is X' without a link to the source document, your staff reasonably refuse to act on it. Trust collapses on the first ambiguous answer; the tool gets sidelined.
How we build it
Reliable knowledge retrieval is a permission-aware indexing problem first and a retrieval problem second. We start by mapping your document estate — SharePoint sites, OneDrive folders, Drive shares, internal wikis — and the permission graph that already governs access. The crawler indexes documents with their ACL metadata attached, so retrieval respects the existing rules: a user only sees passages from documents they could already open. LangGraph orchestrates the retrieval pipeline; pgvector and LlamaIndex handle the embedding store. Claude composes the answer over passages we've explicitly retrieved, with citations linking back to the source document and page. Freshness flags surface when the answer relies on a document older than your policy threshold. The whole stack runs on Vercel infrastructure or in your own cloud tenancy if data residency demands it. Recrawl runs on a schedule you set — typically daily for active SharePoint sites, weekly for stable archives.
Tools we lean on: LangGraph · Claude · pgvector · LlamaIndex · SharePoint API · OneDrive Graph
Pipeline shape · Knowledge Management
- 01Crawl
- 02Index
- 03Embed
- 04Retrieve
- 05Cite
What the knowledge layer does, end to end
Six capabilities every knowledge deployment we ship includes by default.
Cross-source synthesis.
One question retrieves across SharePoint, OneDrive, Drive, internal wikis, and policy archives. Answers reconcile the sources rather than picking one and ignoring the rest.
Citation on every answer.
Every claim links back to the source document and page. Click-through verification for the user; full audit trail for governance reviews.
SharePoint, OneDrive, and Drive integration.
Native API connectors for Microsoft 365 and Google Workspace. Handles nested folder structures, shared drives, and permissioned sub-sites without flattening them.
Permission-aware retrieval.
The crawler indexes documents with their ACL attached. A user only sees passages from documents they could already open — privacy and permissions inherit from your existing setup.
Confidence and freshness flags.
Every answer carries a confidence score and a freshness signal. Answers relying on documents older than your policy threshold surface a flag; low-confidence answers route to a human reviewer.
Document classification and metadata.
Documents tagged at index time — by topic, document type, business unit, or any taxonomy you already use. Retrieval can filter by metadata; archives stay searchable without polluting current results.
Packages start at three sizes
Most clients land on Scale. We re-quote against estate size and source mix after the audit.
Automate
From $2,000 AUD
Single-source knowledge layer — typically one SharePoint site or one OneDrive folder — with citation-on-every-answer.
- Single source ingest
- Citation on every answer
- Web search interface
- Daily recrawl
Scale
From $5,000 AUD
Multi-source knowledge layer with permission-aware retrieval, freshness flags, and a search interface for non-technical staff.
- Up to 5 sources
- Permission-aware retrieval
- Freshness + confidence flags
- Document classification
- Quarterly retrieval tuning
Transform
From $10,000 AUD
Enterprise knowledge layer — unlimited sources, ongoing crawler, analytics dashboard, and custom classification taxonomy.
- Unlimited sources
- Ongoing crawler with alerting
- Search analytics dashboard
- Custom classification taxonomy
- Ongoing index + retrieval tuning
Real-world scenario · 2025
Decades of policy docs searchable in 2 weeks
An Australian university had a document estate that grew across thirty years and four content management systems. Around 200,000 policy documents, course outlines, governance records, and committee minutes lived across SharePoint, an older shared drive, and a wiki the IT team hadn't touched in five years. Staff couldn't find anything reliably; new hires asked the same question dozens of times before they got a clean answer.
The build ran for two weeks. We crawled the four sources, preserving each document's existing permissions, and built a unified embedding index over pgvector with LlamaIndex doing the chunking. Claude handled the composition; every answer cited the source document and page. Permission-aware retrieval meant the search results respected the access rules already in place — a casual lecturer saw fewer results than the dean, and that was correct.
By the end of week two, the search was answering 87% of internal queries with a citation that staff could verify. The IT team kept the existing access controls; governance reviewers got an audit trail showing every retrieval and every cited source. Six months later the index was being recrawled nightly with no maintenance from the operations team.
Read the full case studyDocuments indexed across 4 sources
Accuracy with citation match
Build window from kickoff to live
Questions clients ask before they book the call
Will it expose documents people are not supposed to see?
No. The crawler indexes documents with their existing ACL attached, and retrieval enforces those rules at query time. A user only ever sees passages from documents they could already open through SharePoint or OneDrive directly. We map the permission graph during the discovery audit, document the boundary, and do a privacy review before the search goes live. Permission-aware retrieval is the load-bearing differentiator — generic RAG tools that ignore ACLs are not safe for enterprise knowledge bases.
What if our documents are old and contradict each other?
They almost certainly do. Older policy estates accumulate contradictions over years of governance updates. We surface this rather than hide it: the answer cites every source it pulled from, freshness flags appear when the answer relies on a document older than your policy threshold, and contradicted passages get flagged for governance review. The tool becomes a contradiction-finder as well as a search interface.
How accurate are the answers really?
Answer accuracy with citation match lands at 85%+ on a domain-specific evaluation set. Where the corpus is large and contradicting itself, accuracy can drop into the high 70s — the assistant flags these cases for human review before sending. We measure per-category accuracy at week two so your governance team can adjust the confidence threshold rather than discover it in production. If a category can't reach the threshold reliably, we route it to hand-off by default.
Where does our document data live?
Your choice. The default deployment runs on Vercel infrastructure in Sydney, with the embedding store in the AU region. For stricter sovereignty, we deploy the pipeline to your own AWS or Azure tenancy. The crawler reads documents from your existing SharePoint or OneDrive — we don't copy them out of your tenant by default, only the embeddings + metadata. Anthropic offers Claude through AWS Bedrock in the Sydney region for clients with explicit data residency requirements.
How long does a typical project take?
Knowledge Management projects on the Scale tier ship in 4-6 weeks from kickoff. Automate-tier projects ship in 2-3 weeks. Transform-tier projects (custom taxonomy, ongoing crawler, analytics dashboard) typically run 8-12 weeks. The first two weeks are always discovery — mapping the estate, walking through the permission rules, and selecting the initial classification taxonomy.
Free 30-min audit · No prep required
See what your archives could answer.
Book a free 30-minute audit. We'll walk through your document estate, sketch what a RAG layer could surface, and tell you whether the corpus is ready — or not.