Warehouse Stock Scraper

2020 · Python · BeautifulSoup · Gmail API · openpyxl

Impact

  • Replaced 20 minutes of daily manual copy-paste with a single button click
  • Eliminated transcription errors that came from copying HTML tables into Excel by hand
  • Rolled out to operations teams in Singapore, Australia, the US, and the Netherlands

Business Problem

The logistics partner sent stock update emails in HTML format: tables embedded in the email body, one email per warehouse. The operations team would manually copy these tables into Excel, once per warehouse, across multiple countries. The work was repetitive, error-prone, and done daily across teams in different time zones.

Solution Design

A local application with a single button. On click: authenticate with Gmail, find the relevant emails, parse the HTML tables with BeautifulSoup, format the data, and write it to a structured Excel file with one sheet per warehouse, correctly laid out for the operations workflow.

No server, no infrastructure, no maintenance overhead. A button that replaces 20 minutes of copy-paste.

3PL stock update email scraper architecture
architecture_v1.svg

Technical Challenges

HTML email inconsistency. The 3PL stock update emails were not produced from a single template: different warehouses had slightly different table layouts, and formatting changed over time. BeautifulSoup parsing had to handle column ordering differences, extra header rows, and inconsistent whitespace without brittle column-index assumptions.

Local OAuth flow without a server. Gmail API authentication for a locally-run app with no backend means handling the OAuth redirect in a browser and storing credentials safely on the user's machine. Getting this to work reliably for non-technical operations users in four countries, without requiring any infrastructure, was the main setup challenge.

Per-country configuration. Each team had different Gmail account setups and warehouse naming conventions in the emails. The tool needed enough configuration flexibility to handle these differences without requiring technical knowledge from the end users.

Status

  • Delivered and deployed in 2020
  • Used daily by operations teams in Singapore, Australia, the US, and the Netherlands at time of deployment
  • Current status uncertain: internal workflows and the logistics partner's email format may have changed since 2020

Next Steps

  • Resilience to template changes: make the HTML parsing self-healing when the 3PL changes email formats, rather than requiring a code fix each time
  • Stock level alerting: flag warehouses that breach a minimum threshold automatically, rather than requiring the operations team to read the output manually