← Back to Notes

Reimagining LinkedIn’s search tech stack

2026-01-21blog
Originally Published ↗Download PDF ⬇

Reimagining LinkedIn’s search tech stack

LinkedIn has transformed its search experience by moving from keyword matching to a semantic search powered by Large Language Models (LLMs). This shift allows the system to interpret user intent more accurately, handling millions of queries per second with a balance of quality and efficiency.

Key Concepts

  • Semantic Search Infrastructure:

    • Query Understanding: A unified LLM layer interprets intent and extracts facets (e.g., title, company), replacing brittle named-entity recognition (NER) systems.
    • Retrieval: Uses GPU-enabled Embedding-Based Retrieval (EBR). Queries and documents are encoded into a shared semantic space, using exhaustive vector search (k-NN) to find candidates.
    • Ranking: A Cross-Encoder Small Language Model (SLM) deployed on SGLang refines candidates. It combines query, job, and member features to generate relevance scores.
  • Quality Measurement:

    • Product Policy: Product Managers define "golden" grades and policies, acting as a "Supreme Court" to resolve ambiguities.
    • LLM Judge: A pipeline where large LLMs (distilled into 8B parameter models) grade tens of millions of query-document pairs daily, ensuring alignment with product policy.
  • Efficiency & Scalability:

    • Model Pruning: Structured pruning removes entire neurons, attention heads, or layers to reduce model size for efficient GPU execution.
    • Context Pruning: Long descriptions are summarized by a 1.7B LLM to fit context windows without losing semantic value.
    • Embedding Compression: Text is condensed into single-token embeddings to reduce inference costs.