An AI-powered opportunity marketplace that helps students discover personalised scholarships, grants, and programmes using a RAG pipeline over curated opportunity data.
Repository is private. Live deployment in controlled production environment.
The hardest problem on OpHunter wasn't generating AI responses — it was making retrieval reliable at low latency without unpredictable infrastructure costs. Opportunity data varies wildly in structure: some records are richly detailed, others sparse. Early iterations produced inconsistent retrieval where semantically similar queries returned different results depending on minor wording changes. We needed a system that could surface the right opportunities predictably, quickly, and cost-efficiently — built and maintained by a small team without large-scale infrastructure spend.
I owned the full system — from the FastAPI retrieval service and LangChain pipeline to the Next.js search interface. The core focus was making the RAG pipeline robust, observable, and independently deployable so it could be iterated on without disrupting the user experience.
Decision
Separated the retrieval pipeline from application logic as a standalone modular service
Why
Tightly coupling the RAG pipeline to the web app would have made it impossible to iterate on embeddings, chunking strategies, and prompt logic without risking regressions in the user experience. Keeping them decoupled meant each layer could evolve independently — the prompt could be tuned without touching the frontend, and chunking strategies could be updated without redeploying the API.
Outcome
Allowed the team to iterate on chunking, similarity thresholds, and prompt structure without touching the frontend or the core API layer — significantly reducing the cost of each experiment.
Decision
Vector search over keyword search for opportunity retrieval
Why
Keyword search would have required users to know exact terminology — 'scholarship', 'grant', 'fellowship'. Vector search enables semantic matching, so a query like 'funding for computer science students in Africa' correctly retrieves relevant opportunities regardless of exact phrasing.
Outcome
Retrieval success rate improved from 72% to 91% with semantic matching enabled.
Decision
Server-side retrieval — all embedding and similarity search runs on the backend
Why
Running retrieval client-side would have exposed embedding logic, similarity thresholds, and prompt structure to end users. Beyond the security concern, client-side retrieval would produce inconsistent latency on the low-powered devices common among the target demographic.
Outcome
Kept proprietary retrieval logic secure and maintained consistent sub-second response times across all user devices.
Models
GPT-4o for response generation. text-embedding-3-small for vector indexing and query embedding — chosen for its balance of accuracy and cost at scale.
Retrieval Pipeline
Opportunity content is ingested, normalised, and split into chunks of 300–500 tokens. Each chunk is embedded and stored in Supabase Vector. At query time:
System Prompt (Simplified)
User-Facing Feature
Users see a search interface labelled "Find Opportunities." They enter a natural-language query — for example, "internships for computer science students in West Africa" — and the system returns matching opportunities with title, description, eligibility criteria, deadline, and an apply link. The AI assistant summarises the most relevant matches and explains why they fit the query. Users can refine the search or save opportunities to revisit.
If I rebuilt OpHunter today, I'd invest much earlier in formalising the data pipeline and standardising how opportunity data is structured before building features on top of it. Early iterations focused on getting retrieval working and the UI functional — but because the input data schema wasn't locked down, debugging retrieval inconsistencies often meant chasing problems that were actually data quality issues, not model failures. Defining a strict schema, implementing validation at ingestion, and adding structured logging from day one would have cut debugging time significantly and made it easier to evaluate retrieval quality objectively rather than by feel.