Skills Normalization

How intelligent skills matching works

All parsed resumes include normalized skills matched against a 32,000+ skill taxonomy, with automatic implied skill expansion.

How It Works

1. Skill Extraction

The AI model extracts skills from:

Explicit skills sections
Job descriptions
Project descriptions
Certifications

2. Normalization

Each extracted skill is matched to our standard taxonomy:

Exact matches: "JavaScript" → skill: "JavaScript", match_method: "exact"
Aliases: "React JS", "ReactJS" → skill: "React", match_method: "alias"

On match, the skill field is updated to the canonical name (preferred synonym from taxonomy). All known synonyms are returned in the synonyms array.

3. Implied Skills

Skills that are implied by matched skills are automatically expanded. For example, matching "Django" implies knowledge of "Python":

Implied skills have match_method: "implied" and implied_by listing the originating skill IDs
Derived attributes (proficiency, years_experience, last_used) are calculated from the originating skills
Confidence uses a diminishing-returns formula based on implication strength

4. Proficiency Estimation

Each skill is assessed for detailed proficiency based on the full resume context:

proficiency_score: Numeric score (0-1) representing estimated proficiency level
proficiency: Text label derived from the score — basic (<0.375), intermediate (0.375-0.625), advanced (0.625-0.875), expert (>=0.875)
years_experience: Approximate years of experience with the skill
last_used: When the skill was last used (YYYY-MM format)
confidence: Certainty in the proficiency estimate (0-1)

For implied skills, proficiency_score is derived from the parent skills' scores, weighted by each parent's confidence.

Output Format

Matched Skill (alias)

{
  "skill": "React",
  "skill_id": "react",
  "synonyms": ["react", "reactjs", "react js", "react.js"],
  "proficiency": "expert",
  "proficiency_score": 0.92,
  "category": "Web Frameworks",
  "skill_type": "hard",
  "years_experience": 6,
  "last_used": "2026-01",
  "confidence": 1.0,
  "match_method": "alias",
  "implied_by": null
}

Implied Skill

{
  "skill": "Python",
  "skill_id": "python",
  "synonyms": ["python", "python3", "cpython"],
  "skill_type": "hard",
  "proficiency": "advanced",
  "proficiency_score": 0.75,
  "years_experience": 5.0,
  "last_used": "2026-01",
  "confidence": 0.98,
  "match_method": "implied",
  "implied_by": ["django", "flask"]
}

Unmatched Skill

{
  "skill": "Internal Tool",
  "skill_id": null,
  "synonyms": null,
  "match_method": "none",
  "implied_by": null
}

Fields

skill: Canonical skill name (preferred synonym on match, original text otherwise)
skill_id: Taxonomy skill identifier (string, null if unmatched)
synonyms: All known synonym names for the matched skill (null if unmatched)
proficiency: basic | intermediate | advanced | expert (derived from proficiency_score)
proficiency_score: Numeric proficiency (0-1); for implied skills, derived from parent skills
years_experience: Approximate years of experience with the skill
last_used: When skill was last used (YYYY-MM format)
category: Skill category (e.g., "Programming Languages")
skill_type: hard | soft (overwritten from taxonomy on match)
confidence: 1.0 for exact/alias matches; diminishing-returns formula for implied
match_method: exact | alias | implied | none
implied_by: List of skill IDs that implied this skill (only for implied skills)

Implied Skill Confidence

For implied skills, confidence is calculated using a diminishing-returns formula:

Sort implication strengths descending
Start: c = s₁ (strongest)
Each subsequent: c = c + (1 - c) × sₙ

Example with strengths [0.9, 0.8]: c = 0.9, then c = 0.9 + 0.1 × 0.8 = 0.98

Benefits

Consistency

Different candidates writing "React", "ReactJS", "React.js" all resolve to the same canonical skill for easy comparison.

Searchability

Search your candidate database by skill_id, not fragmented text variants.

Richer Profiles

Implied skills automatically expand candidate profiles — a "Django" developer is recognized as knowing "Python".

Analytics

Track skill trends using consistent taxonomy IDs and synonyms.

Using Normalized Data

Compare Candidates

// Both candidates have same canonical skill
candidate1.skills.find(s => s.skill_id === "react")
candidate2.skills.find(s => s.skill_id === "react")

Filter by Proficiency

const experts = candidates.filter(c =>
  c.skills.some(s =>
    s.skill_id === "react" &&
    s.proficiency === "expert"
  )
)

Aggregate Skills

const skillCounts = resumes
  .flatMap(r => r.skills)
  .filter(s => s.skill_id)
  .reduce((acc, s) => {
    acc[s.skill_id] = (acc[s.skill_id] || 0) + 1
    return acc
  }, {})

Coming Soon

Custom skills taxonomy for Professional tier subscribers (define your own mappings and categories).

Search documentation