Elasticsearch: configuration options

Elasticsearch comes with a lot of setup and configuration that can confuse even expert engineers. Although it uses conventions over configuration paradigms and uses default values most of the time, custom configuration is essential before putting an application into production.

Here, we’ll introduce some properties that fall into different categories, and discuss their importance and how to tune them. There are three configuration files we can tweak:

  • elasticsearch.yml – This configuration file is the most frequently edited, where we can set the cluster name, node information, data and log paths, and network and security settings.
  • log4j2.properties – Let’s set the logging level of the Elasticsearch node.
  • jvm.options – Here we can set the heap memory of the running node.

These files are located in the following directory of the Elasticsearch installation:

$ pwd
/Users/liuxg/elastic/elasticsearch-8.6.1
$ tree config/ -L 2
config/
├── certificates
│ ├── http.p12
│ ├── http_ca.crt
│ └── transport.p12
├── elasticsearch-plugins.example.yml
├── elasticsearch.keystore
├── elasticsearch.yml
├── jvm.options
├── jvm.options.d
├── log4j2.properties
├── role_mapping.yml
├── roles.yml
├── users
└── users_roles

These files are read by Elasticsearch nodes from the config directory, which is basically a folder under the Elasticsearch installation directory. For binary (zip or tar.gz) installations, this directory defaults to $ES_HOME/config (the ES_HOME variable points to the directory where Elasticsearch was installed). If you installed using a package manager such as Debian or an RPM distribution, the situation is different, and the default is /etc/elasticsearch/config.

If you wish to access your config files from a different directory, you can set and export a path variable called ES_PATH_CONF that points to the new config file location. In the next few sections, we’ll cover some important settings to understand, not only for administrators, but for developers as well.

Main configuration file

Although the folks at Elastic developed Elasticsearch to run with defaults (convention over configuration), it’s unlikely we’ll be relying on defaults when putting nodes into production. We should adjust properties to set specific network information, data or log paths, security aspects, etc. To do this, we can modify the elasticserch.yml file to set most of the properties we need for our running application.

Elasticsearch exposes network properties as network.* properties. We can set the hostname and port number using this property. For example, instead of keeping the default port of 9200, we can change the port number of Elasticsearch to 9900: http.port:9900. You can also set transport.port if you want to change the port on which the node communicates internally.

Depending on your requirements, you may need to change many properties. If you want to learn more about these properties, please refer to the official documentation: Networking | Elasticsearch Guide [8.6] | Elastic

If you want to configure your Elasticsearch to be accessed externally instead of localhost, you can change the following settings:

network.host: 192.168.0.1

In the above, we set the private address of the computer so that it can be accessed by the LAN where it is located. We can also set it like this:

network.host: 0.0.0.0

This will bind Elasticsearh to all IP interfaces. We can usually use the following method to get all IPs:

ifconfig | grep inet
$ ifconfig | grep inet
inet 127.0.0.1 netmask 0xff000000
inet6::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
inet6 fe80::acbc:f2ff:fe5e:d6e9%anpi0 prefixlen 64 scopeid 0x4
inet6 fe80::acbc:f2ff:fe5e:d6eb%anpi2 prefixlen 64 scopeid 0x5
inet6 fe80::acbc:f2ff:fe5e:d6ea%anpi1 prefixlen 64 scopeid 0x6
inet6 fe80::f4d4:88ff:fe6a:c36d%ap1 prefixlen 64 scopeid 0xe
inet6 fe80::c6b:334b:459e:a8fb%en0 prefixlen 64 secured scopeid 0xf
inet 192.168.0.101 netmask 0xffffff00 broadcast 192.168.0.255
inet6 fe80::a082:13ff:fe68:d82f%awdl0 prefixlen 64 scopeid 0x10
inet6 fe80::a082:13ff:fe68:d82f%llw0 prefixlen 64 scopeid 0x11
inet6 fe80::1699:5325:c1de:b41%utun0 prefixlen 64 scopeid 0x12
inet6 fe80::ce81:b1c:bd2c:69e%utun1 prefixlen 64 scopeid 0x13
inet6 fe80::c22c:882d:15c7:d083%utun2 prefixlen 64 scopeid 0x14
inet6 fe80::10cf:86ce:6771:979%en4 prefixlen 64 secured scopeid 0x1a
inet 192.168.0.3 netmask 0xffffff00 broadcast 192.168.0.255

As shown above, when we set network.host to 0.0.0.0, it binds to 127.0.0.1 as shown above and 192.168.0.3 and 192.168.0.101. In addition to the above configuration, we can also use the following definitions:

definition Description
_local_ on the system Any loopback address, such as 127.0.0.1.
_site_ Any site-local address on your system, eg 192.168.0.1 . This way we don’t have to hardcode private addresses
_global_ any global A range of addresses, such as 8.8.8.8.

Configuration file format

The configuration format is YAML. Here is an example of changing the data and log directory paths:

path:
    data: /var/lib/elasticsearch
    logs: /var/log/elasticsearch

Settings can also be flattened as follows:

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

In YAML, you can format non-scalar values as sequences:

discovery.seed_hosts:
   - 192.168.1.10:9300
   - 192.168.1.11
   - seeds.mydomain.com

Although less common, you can also format non-scalar values as arrays:

discovery.seed_hosts: ["192.168.1.10:9300", "192.168.1.11", "seeds.mydomain.com"]

Environment variable substitution

Environment variables referenced with the ${…} notation in configuration files are replaced with the value of the environment variable. For example:

node.name: ${HOSTNAME}
network.host: ${ES_NETWORK_HOST}

The value of an environment variable must be a simple string. Use a comma-separated string to provide values that Elasticsearch will parse into a list. For example, Elasticsearch will split the following string into a list of values for the ${HOSTNAME} environment variable:

export HOSTNAME="host1,host2"

Logging options

Elasticsearch is developed in Java, and like most Java applications, Elasticsearch uses Log4j 2 as its logging library. A running node outputs INFO-level logging information to the console and to files (using the Kibana console and RollingFile Appenders, respectively).

The Log4j properties file (log4j2.properties) contains system variables (sys:es.logs.base_path, sys:es.logs.cluster_name, etc.) that are resolved at application runtime. Because these properties are exposed by Elasticsearch, they are available to Log4j, which allows Log4j to set its log file directory location, log file mode, and other properties. For example, sys:es.logs.base_path points to the path where Elasticsearch writes logs, which resolves to the $ES_HOME/logs directory.

By default, most of Elasticsearch runs at the INFO level, but we can customize the settings on a per-package basis. For example, we can edit the log4j2.properties file and add a logger for the index package, as shown in the listing below.

Set the logging level for a specific package:

logger.index.name = org.elasticsearch.index
logger.index.level = DEBUG

By doing this, we allow the index package to output logs at DEBUG level. Instead of editing this file on a specific node and restarting that node, we can set the DEBUG log level at the cluster level (you may need to do this for each node if you didn’t manage to do this before creating the swarm) for this package . The next listing demonstrates this setup:

Globally set the temporary log level:

PUT _cluster/settings
{
  "transient": { #A
    "logger.org.elasticsearch.index":"DEBUG" #B
  }
}

As the query shows, we set the logger level property of the index package to DEBUG in the transient block. transient block means the property is not persistent (only available while the cluster is up and running). If we restart the cluster or it crashes, the setting will be lost because it is not permanently stored on disk.

We can set this property by calling the cluster settings API (_cluster/settings), as shown in the code in the listing. When this property is set, any further logging information related to indexes in the org.elasticsearch.index source package will be output at DEBUG level.

Elasticsearch also provides a way to persist cluster properties. If we need to store properties permanently, we can use persistent blocks. The following listing replaces transient blocks with persistent blocks.

Set log level permanently:

PUT _cluster/settings
{
  "persistent": {
    "logger.org.elasticsearch.index": "DEBUG",
    "logger.org.elasticsearch.http": "TRACE"
  }
}

This code sets the DEBUG level on the org.elasticsearch.index package and the TRACE level on the org.elasticsearch.http package. Because both are persistent properties, the logger writes verbose logs at these levels set on the package, which also persist across cluster restarts.

Note that such properties are set permanently using the persistent attribute. My suggestion is to enable logging level DEBUG or TRACE during troubleshooting or debugging. When you’re done with the “firefighting” episode in production, reset it back to INFO to avoid writing lots of requests to disk.

For more reading, see the article “Elastic: Configuring Elasticsearch server logs”.

JVM options

Because Elasticsearch uses the Java programming language, a lot of tuning can be done at the JVM level. For obvious reasons, it would be unfair to discuss such a vast topic in this article. However, if you’re curious and want to understand the nitty-gritty of the JVM or fine-tune performance at a lower level, see books like Optimizing Java (Ben Evan and Jame Gough) or Java Performance (Scott Oaks). I highly recommend them as they provide not only the basics but also how-to tips and tricks.

Elasticsearch provides a jvm.options file in the /config directory, which contains JVM settings. However, this file is for informational purposes only (for example, to check a node’s memory settings) and should not be edited. Automatically sets the heap memory for the Elasticsearch server based on the available memory of the node.

Warning: Under no circumstances should we edit the jvm.options file. Doing so may disrupt the inner workings of Elasticsearch.

If we want to upgrade the memory or change any JVM settings, we have to create a new file with .options as the file extension, provide the appropriate tuning parameters, and place the file under config in a file named jvm.options.d directory for installation as an archive (tar or zip). We can give our custom file any name, but we need to include the fixed .options extension in its filename.

For RPM/Debian package installations, this file should be located in the /etc/elasticsearch/jvm.options.d/ directory. Likewise, mount options file under /usr/share/elasticsearch/config folder for Docker installation.

We can edit the settings in this custom JVM options file. For example, to upgrade the heap memory in a file called jvm_custom.options, we can use the code in the following listing.

Upgrade heap memory:

# Setting the JVM heap memory in jvm_custom.options file
-Xms4g
-Xmx4g

The _Xms flag sets the initial heap memory, while _Xmx adjusts the maximum heap memory. The unwritten rule is not to let the _Xms and _Xmx settings exceed more than 50% of the node’s total RAM. Apache Lucene running under the hood uses the second half of memory for its staging, caching, and other processes.