Skills Normalization

How intelligent skills matching works

All parsed resumes include normalized skills matched against a 32,000+ skill taxonomy, with automatic implied skill expansion.

How It Works

1. Skill Extraction

The AI model extracts skills from:

  • Explicit skills sections
  • Job descriptions
  • Project descriptions
  • Certifications

2. Normalization

Each extracted skill is matched to our standard taxonomy:

  • Exact matches: "JavaScript" → skill: "JavaScript", match_method: "exact"
  • Aliases: "React JS", "ReactJS" → skill: "React", match_method: "alias"

On match, the skill field is updated to the canonical name (preferred synonym from taxonomy). All known synonyms are returned in the synonyms array.

3. Implied Skills

Skills that are implied by matched skills are automatically expanded. For example, matching "Django" implies knowledge of "Python":

  • Implied skills have match_method: "implied" and implied_by listing the originating skill IDs
  • Derived attributes (proficiency, years_experience, last_used) are calculated from the originating skills
  • Confidence uses a diminishing-returns formula based on implication strength

4. Proficiency Estimation

Each skill is assessed for detailed proficiency based on the full resume context:

  • proficiency_score: Numeric score (0-1) representing estimated proficiency level
  • proficiency: Text label derived from the score — basic (<0.375), intermediate (0.375-0.625), advanced (0.625-0.875), expert (>=0.875)
  • years_experience: Approximate years of experience with the skill
  • last_used: When the skill was last used (YYYY-MM format)
  • confidence: Certainty in the proficiency estimate (0-1)

For implied skills, proficiency_score is derived from the parent skills' scores, weighted by each parent's confidence.

Output Format

Matched Skill (alias)

{
  "skill": "React",
  "skill_id": "react",
  "synonyms": ["react", "reactjs", "react js", "react.js"],
  "proficiency": "expert",
  "proficiency_score": 0.92,
  "category": "Web Frameworks",
  "skill_type": "hard",
  "years_experience": 6,
  "last_used": "2026-01",
  "confidence": 1.0,
  "match_method": "alias",
  "implied_by": null
}

Implied Skill

{
  "skill": "Python",
  "skill_id": "python",
  "synonyms": ["python", "python3", "cpython"],
  "skill_type": "hard",
  "proficiency": "advanced",
  "proficiency_score": 0.75,
  "years_experience": 5.0,
  "last_used": "2026-01",
  "confidence": 0.98,
  "match_method": "implied",
  "implied_by": ["django", "flask"]
}

Unmatched Skill

{
  "skill": "Internal Tool",
  "skill_id": null,
  "synonyms": null,
  "match_method": "none",
  "implied_by": null
}

Fields

  • skill: Canonical skill name (preferred synonym on match, original text otherwise)
  • skill_id: Taxonomy skill identifier (string, null if unmatched)
  • synonyms: All known synonym names for the matched skill (null if unmatched)
  • proficiency: basic | intermediate | advanced | expert (derived from proficiency_score)
  • proficiency_score: Numeric proficiency (0-1); for implied skills, derived from parent skills
  • years_experience: Approximate years of experience with the skill
  • last_used: When skill was last used (YYYY-MM format)
  • category: Skill category (e.g., "Programming Languages")
  • skill_type: hard | soft (overwritten from taxonomy on match)
  • confidence: 1.0 for exact/alias matches; diminishing-returns formula for implied
  • match_method: exact | alias | implied | none
  • implied_by: List of skill IDs that implied this skill (only for implied skills)

Implied Skill Confidence

For implied skills, confidence is calculated using a diminishing-returns formula:

  1. Sort implication strengths descending
  2. Start: c = s₁ (strongest)
  3. Each subsequent: c = c + (1 - c) × sₙ

Example with strengths [0.9, 0.8]: c = 0.9, then c = 0.9 + 0.1 × 0.8 = 0.98

Benefits

Consistency

Different candidates writing "React", "ReactJS", "React.js" all resolve to the same canonical skill for easy comparison.

Searchability

Search your candidate database by skill_id, not fragmented text variants.

Richer Profiles

Implied skills automatically expand candidate profiles — a "Django" developer is recognized as knowing "Python".

Analytics

Track skill trends using consistent taxonomy IDs and synonyms.

Using Normalized Data

Compare Candidates

// Both candidates have same canonical skill
candidate1.skills.find(s => s.skill_id === "react")
candidate2.skills.find(s => s.skill_id === "react")

Filter by Proficiency

const experts = candidates.filter(c =>
  c.skills.some(s =>
    s.skill_id === "react" &&
    s.proficiency === "expert"
  )
)

Aggregate Skills

const skillCounts = resumes
  .flatMap(r => r.skills)
  .filter(s => s.skill_id)
  .reduce((acc, s) => {
    acc[s.skill_id] = (acc[s.skill_id] || 0) + 1
    return acc
  }, {})

Coming Soon

Custom skills taxonomy for Professional tier subscribers (define your own mappings and categories).