Semantic Search for Procurement
2025 · OpenAI Embeddings, FAISS, Python, Streamlit
Problem
Procurement decisions relied on manual document review — slow, inconsistent, and hard to scale. Three specific gaps:
- No visibility into how item prices develop over time
- No easy way to compare prices across suppliers for the same or similar items
- Benchmarking new price offers against existing ones required significant manual effort
Solution
Items are clustered based on keywords into a three-level hierarchy: main category → subcategory → leaf category. At the leaf level, items from different suppliers should represent the same product — that's what makes supplier price comparison meaningful.
A Streamlit UI sits on top and allows procurement analysts to:
- Monitor price trends across any cluster level
- Compare items between suppliers
- Drag in a PDF, Word doc, or Excel with a new price offer and instantly find similar products from the same supplier or comparable alternatives
Architecture
Status
Still in evaluation. The product isn't perfect — embedding quality at the leaf-category level is the main challenge, since very similar items need to be distinguished from slightly different ones. But it already unlocks insights that weren't possible before and saves real time in benchmarking.
Next Steps
- Improve embedding quality for fine-grained product distinction
- Scale the FAISS index as the document corpus grows
- Integrate with procurement workflows so decisions are logged alongside the search that informed them