Docker Elasticsearch Tutorial
Elasticsearch is a document-oriented, schema-free, distributed search engine based on Apache Lucene. This powerful tool that allows to index a huge data volume and, after, perform complex searches on it, including full-text searches. This tutorial will show how to use it with Docker.
For this tutorial, Linux Mint 18 and Docker version 1.12.1 have been used.
You may skip installation and configuration, and jump directly to the beginning of the tutorial below.
1. Installation
Note: Docker requires a 64-bit system with a kernel version equal or higher to 3.10.
We can install Docker simply via apt-get
, without the need of adding any repository, just installing the docker.io
package. We will also need cURL for this tutorial.
sudo apt-get update sudo apt-get install docker.io curl
For more details, you can follow the Install Docker on Ubuntu Tutorial.
1.1. Increasing the virtual memory
In order to use Elasticsearch, we have to increase the virtual memory. For doing so, we have to execute the following command:
sudo sysctl -w vm.max_map_count=262144
Note: the virtual memory changes will be lost after reboot. To change it permanently, we have to modify the /etc/sysctl.conf
file.
2. Using the official image
We can find the official Elasticsearch Docker repository on DockerHub. We can pull the latest version:
docker pull elasticsearch
2.1. Creating the container
For creating an Elasticsearch container, we have to bind the container 9200 port to the host:
docker run -p 9200:9200 --name=elasticsearch1 elasticsearch
Note: we usually will want to run the container in detached mode; for that, add the -d
option.
Once the container is running, we should be able to access it, in this case, at http://localhost:9200
, which will return a JSON object:
{ "name" : "nZBbX-z", "cluster_name" : "elasticsearch", "cluster_uuid" : "kysJT3MbR5iBOHKPb6VlTA", "version" : { "number" : "5.1.1", "build_hash" : "5395e21", "build_date" : "2016-12-06T12:36:15.409Z", "build_snapshot" : false, "lucene_version" : "6.3.0" }, "tagline" : "You Know, for Search" }
So, that’s it! We now have a Elasticsearch node running within that container. Let’s make a little test before going further with its configuration within the container.
Let’s start indexing a simple document like the following:
{ "message": "Hello world!" }
With cURL, we would have to execute:
curl -XPOST localhost:9200/mails/message/1 -d ' { "message": "Hello world!" } '
If everything worked, we would have received a response like:
{"_index":"mails","_type":"message","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}
Now, we can GET
that document, either accessing the URL with a browser, or with cURL:
curl -XGET localhost:9200/mails/message/1
3. Installing plugins
Elasticsearch has many plugins available to extend the default functionality. In this section we will see how we can install them. But, as a Docker user, you should already know that every change we make within a container will be lost if the container is removed, because the instance where we have made the changes is deleted.
Being that told, for installing Elasticsearch plugins being ran within a Docker container, we first need to get the command line of the container:
docker exec -it elasticsearch1 /bin/bash
Note: we execute the shell as superuser since root permissions are needed for plugins installation.
Now we have to execute the Elasticsearch binary for installing plugins. The binaries are located in the bin
directory of the Elasticsearch installation directory, which, by default, is /usr/share/elasticsearch
.
So, being inside the container, we would just have to execute:
/usr/share/elasticsearch/bin/elasticsearch-plugin install <plugin-name>
For example, for installing the EC2 Discovery Plugin, we would execute:
/usr/share/elasticsearch/bin/elasticsearch-plugin install discovery-ec2
4. Using Dockerfiles
In most of the cases, we will want to customize our Elasticsearch instance, installing plugins for instance, as we did in the previous section. In this cases, writing a Dockerfile is the recommended practice, instead of executing manually the commands within the container.
A Dockerfile for doing so, would be just the following:
Dockerfile
FROM elasticsearch # We use '-b' ('--batch') option for automatic confirmation. RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install -b discovery-ec2
Now we have to build the image:
docker build -t myelasticsearch . # Path to the Dockerfile.
And we could create a container from the image:
docker run -p 9200:9200 --name=elasticsearch2 myelasticsearch
5. Using data containers
A typical scenario for this cases is to use containers just for the data that is going to be used by the application. This allows to keep the data layer untouched, regardless the changes we make within the running service.
For this, we will use the following directory structure and files:
. ├── elasticsearch-data │ ├── Dockerfile │ ├── elasticsearch.yml │ └── log4j2.properties └── elasticsearch-master └── Dockerfile
These are the config files (just default values):
elasticsearch.yml
network.host: 0.0.0.0 # this value is required because we set "network.host" # be sure to modify it appropriately for a production cluster deployment discovery.zen.minimum_master_nodes: 1
log4j2.properties
appender.console.type = Console appender.console.name = console appender.console.layout.type = PatternLayout appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n rootLogger.level = info rootLogger.appenderRef.console.ref = console
And, the Dockerfile for the data volume, basing on an Ubuntu image:
elasticsearch-data/Dockerfile
# Dockerfile for Elasticsearch data volume. FROM ubuntu # We need to create the Elasticsearch user. RUN useradd -d "/home/elasticsearch" -m -s /bin/bash elasticsearch VOLUME /var/log/elasticsearch VOLUME /usr/share/elasticsearch/config VOLUME /usr/share/elasticsearch/config/scripts VOLUME /usr/share/elasticsearch/data VOLUME /usr/share/elasticsearch/plugins # Required config files. ADD ./elasticsearch.yml /usr/share/elasticsearch/config/elasticsearch.yml ADD ./log4j2.properties /usr/share/elasticsearch/config/log4j2.properties USER elasticsearch CMD ["echo", "Data volume for Elasticsearch"]
Now, we have to build the image and create the container, just as any other container:
docker build -t myelasticsearchdata elasticsearch-data/. docker run --name=elasticsearch-data myelasticsearchdata
Note that data containers don’t have to be running.
Having the data container ready, we need the “main” container, the one running the service. We don’t need to modify the Dockerfile for the Elasticsearch master container, nor rebuild the image; but we have to create another container indicating that it has to use the data volume:
docker run -p 9200:9200 --volumes-from=elasticsearch-data --name=elasticsearch-master -d myelasticsearch
With this, we have completely isolated the data layer from the application one. We can test indexing some data:
curl -XPOST localhost:9200/mails/message/1 -d ' { "message": "Elasticsearch with data volumes" }
Deleting the master container:
docker stop elasticsearch-master docker rm elasticsearch-master
And creating another instance of the Elasticsearch image, with the data volume:
docker run -p 9200:9200 --volumes-from=elasticsearch-data --name=elasticsearch-master2 myelasticsearch
Now, if we GET the data, we will see that it actually exists:
curl -XGET localhost:9200/mails/message/1
6. Summary
This tutorial has shown how to put running an Elasticsearch instance with Docker, from the fastest way, just pulling the official image and creating a container; to creating our own Dockerfiles, seeing also how to use data volumes for data persistence.