DevOps

Docker Elasticsearch Tutorial

Elasticsearch is a document-oriented, schema-free, distributed search engine based on Apache Lucene. This powerful tool that allows to index a huge data volume and, after, perform complex searches on it, including full-text searches. This tutorial will show how to use it with Docker.
 
 
 
 
 
 
 
 


 
For this tutorial, Linux Mint 18 and Docker version 1.12.1 have been used.

Tip
You may skip installation and configuration, and jump directly to the beginning of the tutorial below.

1. Installation

Note: Docker requires a 64-bit system with a kernel version equal or higher to 3.10.

We can install Docker simply via apt-get, without the need of adding any repository, just installing the docker.io package. We will also need cURL for this tutorial.

sudo apt-get update
sudo apt-get install docker.io curl

For more details, you can follow the Install Docker on Ubuntu Tutorial.

1.1. Increasing the virtual memory

In order to use Elasticsearch, we have to increase the virtual memory. For doing so, we have to execute the following command:

sudo sysctl -w vm.max_map_count=262144

Note: the virtual memory changes will be lost after reboot. To change it permanently, we have to modify the /etc/sysctl.conf file.

2. Using the official image

We can find the official Elasticsearch Docker repository on DockerHub. We can pull the latest version:

docker pull elasticsearch

2.1. Creating the container

For creating an Elasticsearch container, we have to bind the container 9200 port to the host:

docker run -p 9200:9200 --name=elasticsearch1 elasticsearch

Note: we usually will want to run the container in detached mode; for that, add the -d option.

Once the container is running, we should be able to access it, in this case, at http://localhost:9200, which will return a JSON object:

{
  "name" : "nZBbX-z",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "kysJT3MbR5iBOHKPb6VlTA",
  "version" : {
    "number" : "5.1.1",
    "build_hash" : "5395e21",
    "build_date" : "2016-12-06T12:36:15.409Z",
    "build_snapshot" : false,
    "lucene_version" : "6.3.0"
  },
  "tagline" : "You Know, for Search"
}

So, that’s it! We now have a Elasticsearch node running within that container. Let’s make a little test before going further with its configuration within the container.

Let’s start indexing a simple document like the following:

{
    "message": "Hello world!"
}

With cURL, we would have to execute:

curl -XPOST localhost:9200/mails/message/1 -d '
{
    "message": "Hello world!"
}
'

If everything worked, we would have received a response like:

{"_index":"mails","_type":"message","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}

Now, we can GET that document, either accessing the URL with a browser, or with cURL:

curl -XGET localhost:9200/mails/message/1

3. Installing plugins

Elasticsearch has many plugins available to extend the default functionality. In this section we will see how we can install them. But, as a Docker user, you should already know that every change we make within a container will be lost if the container is removed, because the instance where we have made the changes is deleted.

Being that told, for installing Elasticsearch plugins being ran within a Docker container, we first need to get the command line of the container:

docker exec -it elasticsearch1 /bin/bash

Note: we execute the shell as superuser since root permissions are needed for plugins installation.

Now we have to execute the Elasticsearch binary for installing plugins. The binaries are located in the bin directory of the Elasticsearch installation directory, which, by default, is /usr/share/elasticsearch.

So, being inside the container, we would just have to execute:

/usr/share/elasticsearch/bin/elasticsearch-plugin install <plugin-name>

For example, for installing the EC2 Discovery Plugin, we would execute:

/usr/share/elasticsearch/bin/elasticsearch-plugin install discovery-ec2

4. Using Dockerfiles

In most of the cases, we will want to customize our Elasticsearch instance, installing plugins for instance, as we did in the previous section. In this cases, writing a Dockerfile is the recommended practice, instead of executing manually the commands within the container.

A Dockerfile for doing so, would be just the following:

Dockerfile

FROM elasticsearch

# We use '-b' ('--batch') option for automatic confirmation.
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install -b discovery-ec2

Now we have to build the image:

docker build -t myelasticsearch . # Path to the Dockerfile.

And we could create a container from the image:

docker run -p 9200:9200 --name=elasticsearch2 myelasticsearch

5. Using data containers

A typical scenario for this cases is to use containers just for the data that is going to be used by the application. This allows to keep the data layer untouched, regardless the changes we make within the running service.

For this, we will use the following directory structure and files:

.
├── elasticsearch-data
│   ├── Dockerfile
│   ├── elasticsearch.yml
│   └── log4j2.properties
└── elasticsearch-master
    └── Dockerfile

These are the config files (just default values):

elasticsearch.yml

network.host: 0.0.0.0

# this value is required because we set "network.host"
# be sure to modify it appropriately for a production cluster deployment
discovery.zen.minimum_master_nodes: 1

log4j2.properties

appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n

rootLogger.level = info
rootLogger.appenderRef.console.ref = console

And, the Dockerfile for the data volume, basing on an Ubuntu image:

elasticsearch-data/Dockerfile

# Dockerfile for Elasticsearch data volume.

FROM ubuntu

# We need to create the Elasticsearch user.
RUN useradd -d "/home/elasticsearch" -m -s /bin/bash elasticsearch

VOLUME /var/log/elasticsearch
VOLUME /usr/share/elasticsearch/config
VOLUME /usr/share/elasticsearch/config/scripts
VOLUME /usr/share/elasticsearch/data
VOLUME /usr/share/elasticsearch/plugins

# Required config files.
ADD ./elasticsearch.yml /usr/share/elasticsearch/config/elasticsearch.yml
ADD ./log4j2.properties /usr/share/elasticsearch/config/log4j2.properties

USER elasticsearch

CMD ["echo", "Data volume for Elasticsearch"]

Now, we have to build the image and create the container, just as any other container:

docker build -t myelasticsearchdata elasticsearch-data/.
docker run --name=elasticsearch-data myelasticsearchdata

Note that data containers don’t have to be running.

Having the data container ready, we need the “main” container, the one running the service. We don’t need to modify the Dockerfile for the Elasticsearch master container, nor rebuild the image; but we have to create another container indicating that it has to use the data volume:

docker run -p 9200:9200 --volumes-from=elasticsearch-data --name=elasticsearch-master -d myelasticsearch

With this, we have completely isolated the data layer from the application one. We can test indexing some data:

curl -XPOST localhost:9200/mails/message/1 -d '
{
    "message": "Elasticsearch with data volumes"
}

Deleting the master container:

docker stop elasticsearch-master
docker rm elasticsearch-master

And creating another instance of the Elasticsearch image, with the data volume:

docker run -p 9200:9200 --volumes-from=elasticsearch-data --name=elasticsearch-master2 myelasticsearch

Now, if we GET the data, we will see that it actually exists:

curl -XGET localhost:9200/mails/message/1

6. Summary

This tutorial has shown how to put running an Elasticsearch instance with Docker, from the fastest way, just pulling the official image and creating a container; to creating our own Dockerfiles, seeing also how to use data volumes for data persistence.

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button