AskSHU: University RAG Chatbot

Why I Built This

I’ve watched people waste 20+ minutes searching university websites for basic information—admission deadlines, fee structures, faculty contacts. The information exists, it’s just buried across dozens of pages and PDFs that nobody wants to dig through.

I wanted to see if I could make an entire university website searchable through a simple chat interface. Two weeks later, I had a working prototype.

How It Works

The system is a Retrieval-Augmented Generation (RAG) chatbot. When someone asks a question:

The query gets converted to an embedding
Vector search finds the 20 most relevant content chunks from the database
Those chunks + the question go to Llama 3.3 70B
The model generates a response grounded in actual university content

If the answer isn’t in the database, the model says so. No hallucinations about fake deadlines.

Tech Stack

I needed this cheap and simple. No budget, no ops overhead.

Cloudflare’s edge stack handled everything:

Layer	Tool	Why
Compute	Workers	No cold starts, runs globally
Vector DB	Vectorize	Built-in embeddings
Storage	D1	SQLite, co-located with Workers
LLM	Llama 3.3 70B	Free via Workers AI
Sessions	KV	Chat history persistence

Frontend is Next.js 15 on Vercel with streaming responses.

Total cost: $0. Workers AI has generous free limits—we’re nowhere near hitting them.

The Content Pipeline

RAG is only as good as your content. I built a Python crawler that:

Traverses the university site (BFS, respects robots.txt)
Converts pages to clean markdown
Chunks text recursively (1024 chars, 200-char overlap)

408 pages became 2,500+ searchable chunks. The whole pipeline runs in under 2 hours.

What I Learned

Chunking matters more than model size. I tested 3B and 7B models—they worked fine when the retrieved context was good. Bad chunking breaks even the best models.

Streaming changes perception. Users don’t care about total response time as much as time-to-first-token. Streaming made the whole thing feel instant.

Edge deployment is underrated. Sub-400ms TTFB globally. No regional latency surprises.

Results

Students get answers in seconds instead of digging through the website. Staff spend less time answering the same questions. The university has a 24/7 information system that costs nothing to run.

For a two-week project, that’s a solid outcome.