Semantic Search for Procurement

2025 · OpenAI Embeddings, FAISS, Python, Streamlit


Problem

Procurement decisions relied on manual document review — slow, inconsistent, and hard to scale. Three specific gaps:


Solution

Items are clustered based on keywords into a three-level hierarchy: main category → subcategory → leaf category. At the leaf level, items from different suppliers should represent the same product — that's what makes supplier price comparison meaningful.

A Streamlit UI sits on top and allows procurement analysts to:


Architecture


Status

Still in evaluation. The product isn't perfect — embedding quality at the leaf-category level is the main challenge, since very similar items need to be distinguished from slightly different ones. But it already unlocks insights that weren't possible before and saves real time in benchmarking.


Next Steps