Skill Similarity Engine, Workforce Mobility at Scale

When an organisation reshapes its job architecture, the first question is not “what jobs exist?” but “which roles could someone move between, and what skills would they need?” Answering that at enterprise scale needs more than a spreadsheet join, and answering it fast enough to be useful in a planning conversation needs the work done before anyone asks.

The problem

A large employer can maintain thirty-five thousand active job profiles, each carrying dozens of skills. Pairwise similarity across that population is combinatorially expensive. Computing it on demand, every time an analyst opens a workbook, is the difference between a tool that gets used and a tool that gets quoted by the people who decided not to use it.

The analysts asking for this work do not want a black box. They want mobility pathways they can defend, skill-gap analysis that identifies what is actually missing rather than what is roughly missing, and outputs that drop into the BI surface they already trust.

The constraint, put simply: compute once, query fast, explain everything.

The approach

A precompute-then-query architecture with explicit configuration and a CLI that respects the platform analysts are on.

Load and normalise. Role and skill data ingested through configurable YAML environments. default, production, and local profiles override paths, thresholds, and category proportions without code changes. Reproducible runs across environments because the inputs are described, not assumed.

Precompute similarities. All job-to-job similarity scores calculated in advance using memory-aware chunking and parallel processing. The arithmetic that would crush an on-demand query becomes a one-off batch job, and the results sit ready for sub-second lookup.

Asymmetric similarity. Roles are not symmetric. Moving from a senior analyst to a junior manager is different from the reverse, and the score reflects it. Symmetric similarity averages away exactly the information a workforce planner needs.

Skill gap analysis. For any job pair, the engine names the skills present in one role and absent in the other. The output is a list a development plan can be written against, not a number that needs translating.

BI-ready exports. Similarity matrices, gap reports, and mobility summaries shaped for Power BI consumption. Analysts open the dataset and the data model is already correct, with column names that mean what they say.

CLI operations. Click-based command line for batch runs, environment switching, and pipeline stages, with Windows-friendly path handling throughout. The tool runs where the analyst’s laptop is, not only where a Linux server might be.

Precompute-then-query architecture for job-to-job skill similarity

What the engine supports

The same precomputed surface answers three planning scenarios without re-running the heavy work.

Reskilling programmes. Identify specific skill gaps between an employee’s current role and a target role, and target development effort on the gap rather than the broader profile.
Cross-department mobility. Surface non-obvious similar roles in business units the employee has never worked in. The most efficient moves are often the least visible ones.
Scenario modelling. When the job architecture itself changes, re-run the precompute and compare the mobility graph before and after. The platform tells you what mobility you have lost and what you have gained.

Conceptual mobility map showing similar roles connected by skill overlap

Evidence

35,000+ active role profiles supported
Pairwise similarity precomputed, queries return in sub-second time
Asymmetric similarity scoring, gap analysis surfaced per pair
Configuration-driven environment switching with no code changes
Power BI-ready outputs that drop into existing analyst workbooks

This is the enterprise counterpart to the rest of my people-analytics portfolio: large-scale data preparation, analyst-facing BI outputs, and insight that lands on a reskilling programme or a mobility review rather than dying in a notebook.