Reimagining LinkedIn’s search tech stack

LinkedIn has transformed its search experience by moving from keyword matching to a semantic search powered by Large Language Models (LLMs). This shift allows the system to interpret user intent more accurately, handling millions of queries per second with a balance of quality and efficiency.

Key Concepts

Semantic Search Infrastructure:
- Query Understanding: A unified LLM layer interprets intent and extracts facets (e.g., title, company), replacing brittle named-entity recognition (NER) systems.
- Retrieval: Uses GPU-enabled Embedding-Based Retrieval (EBR). Queries and documents are encoded into a shared semantic space, using exhaustive vector search (k-NN) to find candidates.
- Ranking: A Cross-Encoder Small Language Model (SLM) deployed on SGLang refines candidates. It combines query, job, and member features to generate relevance scores.
Quality Measurement:
- Product Policy: Product Managers define "golden" grades and policies, acting as a "Supreme Court" to resolve ambiguities.
- LLM Judge: A pipeline where large LLMs (distilled into 8B parameter models) grade tens of millions of query-document pairs daily, ensuring alignment with product policy.
Efficiency & Scalability:
- Model Pruning: Structured pruning removes entire neurons, attention heads, or layers to reduce model size for efficient GPU execution.
- Context Pruning: Long descriptions are summarized by a 1.7B LLM to fit context windows without losing semantic value.
- Embedding Compression: Text is condensed into single-token embeddings to reduce inference costs.