Elasticsearch

Last updated on Dec 15, 2025

9 min read

Basic Concepts
Types of Nodes
API’s
Mapping & Schema
Search & Query Concepts
FAQs

Basic Concepts

Document - Basic unit of data (JSON)
Index - It’s a collection of documents
Mapping - it’s the scheme of the documents in a index. It defines the fields the document in an index will have, the datatype of each field, and other properties like the indexing behavior
Field - A key-value field inside the document
Lucene - it’s the core search engine library of Elasticsearch. It provides indexing, scoring, searching, segments etc.
Segment - is a low-level storage file inside a Lucene index. It’s an immutable block of indexed documents.
Shard - A lucene instance storing a part of the index
Primary Shard - is the original shard that receives documents for indexing. All write operations go here first. After indexing into the primary, Elasticsearch replicates data to replicas. They are created when an index is created.
Replica Shard - it’s the copy of the primary shard. They exist to provide high availability, load balancing search queries. They server only read/search requests only. It cannot be on the same node as primary shard.
Cluster - A collection of one or more nodes working together
Node - A single server instance in the cluster

Types of Nodes

Master Node
- Basically it’s a master eligible node.
- It maintains cluster state.
- It elects master node
- Manages creating/deleting indices, assigning shards, cluster management etc.
Data Node
- Stores the actual data and handles indexing, search queries and aggregations
- Manages shard storage and operations
Ingest Node
- used to preprocess and transform data before indexing
Coordinating only Node
- a node becomes coordinating only when all roles are disabled
- is responsible for coordinating the search requests across the cluster. It’s the node that receives the search request from the client and sends it to the appropriate nodes.
ML Node
- supports ml features of Elasticsearch like anomaly detection, regression, classification, outlier detection etc.

API’s

Index API - add or replace a single document in the index
Get API - get a single document by id
Update API - modify a single document by id using partial updates (send only fields that need to be modified), scripted update (using painless script to perform complex logic like incrementing a counter), upserts (if document doesn’t exist then create it, else update it)
Delete API - delete a document by id
Bulk API - batch index/update/delete/create operations
Multi-Get (MGET) API - retrieve multiple documents from one or more index by id
Delete By Query API - Delete all documents that match a query
Update By Query API - Update all documents that match a query
Reindex API - Copy data from one index to another. This is usually used to modify field mappings, upgrading cluster versions, changing index settings, data transformation etc.
Refresh API
- A refresh makes recent operations performed on indices available for search. Elasticsearch performs refresh every 1 second. This can be changed using the index.refresh_interval setting
- Following are the refresh API options
  - refresh - false (default) - Do not refresh after this operation. The document becomes searchable after the refresh
  - refresh - true - performs a refresh immediately after the operation
  - wait_for - the API waits until the next scheduled refresh, ensuring the operation is searchable before returning. No forced refresh
- Refresh API can be used with following operations - index, update, update_by_query, delete, delete_by_query, _bulk
Search API - Perform full-text search, filtering, sorting, and aggregations
Explain API - a debugging tool used to understand why a specific document received its search score, or why it matched (or didn’t match) a specific query.
Profile API - to analyze query execution performance
Multi Search API - run multiple search queries in a single request
Validate Query API - used to validate a query without executing it
Field Capabilities API - used to get information about the capabilities of specific fields across one or more indices
Rank Evaluation API - evaluate quality of ranked search results
Open/Close index API - close indices to reduce resource usage. Can be reopened later
Aliases API - add/remove/update index aliases
Mapping API - to modify field mappings
Shrink / Split / Clone index API - used to manage and optimize shard distribution within a cluster
Force Merge API - used to reduce segments to improve it for read heavy workloads
Flush API - to permanently store in-memory index operations to disk and clear the internal transaction log (translog)
Clear Cache API - to manually evict data from internal memory caches to free up resources or prepare for performance testing
Index Stats API - get statistics on indexing, search, caching, merges
Segments API - inspect lucene segments inside shards.
Cluster Health API - view cluster health (green/yellow/red)
Cluster State API - used for debugging, diagnostics, and monitoring
Cluster Stats API - view operational statistics across nodes
Cluster Settings API - change number of replicas, refresh interval, number of shards and other cluster settings
Pending Tasks API - view tasks waiting to be processed by the cluster
Reroute API - used to manually move shards across the cluster
Allocation Explain API - used to identify why shards are not being distributed as expected within a cluster
Nodes Info API - view modules, plugins, roles, JVM info etc
Nodes Stats API - view node-level metrics such as memory, CPU, cache, utilization, latency, throughput etc.
Hot Threads API - a diagnostic tool to identify performance bottlenecks by providing a snapshot of the busiest Java threads running on node
Usage API - provide insights into how system features are being utilized, how much storage different components consume, and how frequently specific data structures are accessed
Cat APIs - provides a human-readable, command-line friendly way to quickly monitor and troubleshoot your cluster
- /_cat/indices - lists all the indices in the cluster
- /_cat/nodes - shows a summary of all the nodes in the cluster
- /_cat/shards - shows shard level details for every index
- /_cat/health - shows quick view of cluster wide health
- /_cat/count - gives total number of documents in selected indices
- /_cat/aliases - shows alias to index mapping
- /_cat/repositories - lists snapshot repositories registered in the cluster
- /_cat/snapshots - lists snapshots inside a specific repository
Ingestion Pipeline API - create/manage pipelines with processors. Put Pipeline / Get Pipeline / Delete Pipeline / Simulate Pipeline
Node/Stats for Ingest - Track ingest performance.
Snapshot API - create/restore/get/delete snapshots of indices/cluster state.
Snapshot Repository API - Register local or cloud-backed repositories.
User APIs - create/update/delete users
Role APIs - assign permissions
Role Mapping API - map users/groups to roles
API Keys API - create/manage API keys
Token API - manage access tokens for login flows
SSL / Certificates API - manager cluster TLS certificates
ML APIs - Anomaly Detection, Data Frame Analytics, Model Management, Deployment etc
Index Lifecycle Management API - define policies: hot, warm, cold, delete phases.
Explain Lifecycle API - see which phase a shard is in
Script API - used for managing, storing, testing, and executing custom scripts within the cluster
Template API - they are the mechanism by which Elasticsearch applies settings, mappings, and other configurations when creating indices

Mapping & Schema

Mapping defines datatype of the field and it’s indexing behavior
Automatic field creation and mapping can be achieved using dynamic mapping. Elasticsearch creates new fields automatically based on incoming document structure.
Manual schema can be achieved using explicit mapping
Use {index_name}/_mapping to create, update or delete mappings
Following are the field types supported - keyword, text, numeric types, date, boolean, geo_point, geo_shape, nested object
Analyzers in text fields determines the tokenization strategy, stemming, case folding, stopwords etc.

Search & Query Concepts

Full-text Queries - used for human language search where analysis occurs
- match - analyzes search text and finds relevant docs
- multi_match - searches multiple fields at once
- match_phrase - exact sequence of words with ordering and proximity control
- query_string - lucene syntax, supports AND/OR/NOT, fields, wildcards
- simple_query_string - a safer but limited version of query_string
Term-level Queries - No analysis. useful for exact matches
- term - exact match for keyword, numbers, boolean
- terms - match any from a list
- range - filter by numeric, date, or string ranges
- exists - field must be present
- prefix, wildcard, regexp - pattern-based queries
Boolean
- bool - must (AND), should (OR), must_not (NOT)
Pagination
- from/size - basic pagination
- search_after - for deep pagination. It uses the sort values of the last result as the starting point for the next page.
- scroll - used for large exports or batch processing. Not meant for real-time user queries
Aggregations
- Metric aggregations - compute values
  - min/max - minimum/maximum value of numeric field
  - avg/sum - average/sum of numeric field
  - stats - returns count, min, max, sum, average
  - extended_stats - returns variance, standard deviation, sum of squares, standard deviation bounds
  - cardinality - approximate count of unique values
  - percentiles / percentile_ranks - compute percentiles
  - top_hits - return sample documents from each bucket
  - value_count - count number of values for a field
- Bucket aggregations - group documents into buckets
  - terms - group by a field
  - range - group into buckets based on the ranges provided for numeric fields
  - date_range - group into buckets based on the ranges provided for date fields
  - histogram - group numeric data based on provided fixed interval
  - date_histogram - group date data based on provided fixed interval
  - filters - manually define buckets based on queries
- Nested aggregations - work on nested fields
  - nested - if your document contains an array of objects, nested is used
- Pipeline aggregations - perform computations on the output of other aggregations, rather than directly on the documents in an index. They enable complex statistical and mathematical calculations like moving averages, cumulative sums, and derivatives by chaining aggregation results together
- Multi-Level aggregations (Sub-Aggs) - aggregations can be nested to to build hierarchical analytics

FAQs

Why does update query in Elasticsearch doesn’t update immediately?
- Elasticsearch updates the document immediately in the transaction. The delay happens because of how Elasticsearch handles index refreshes. The updates won’t be available for searches until the next refresh. The default refresh interval is 1 second. Until that refresh happens that document won’t appear in the searches.
- Elasticsearch avoids refreshing for every update, because refresh is really expensive. Refreshing for every update would create too many segments and hurt performance significantly. Instead it batches and refreshes periodically based on refresh_interval setting.
- When a document is updated, Elasticsearch fetches the old version, rewrites it with new changes, marks the old version as deleted and indexes the new version.
- If you want to force a refresh, then pass ?refresh=true in the query parameter of the update request.

Elasticsearch

Table of Contents

Basic Concepts

Types of Nodes

API’s

Mapping & Schema

Search & Query Concepts

FAQs