Exercises for the Elastic Certified Engineer Exam: Deploy and Operate a Cluster

This is the first in a series of blog posts to challenge your preparation for the certification exam and includes various tasks and problems to be solved.

08.07.2019

Guido Lena Cota

blog@kreuzwerker.de

Install and Configure Elasticsearch

There is no better way to test your Elasticsearch skills than having an Elasticsearch instance to work with. Testing the first Exam Objective serves this purpose. By the end of this section, you will have an Elasticsearch cluster running on your local machine, and configured to satisfy a given set of network and security requirements.

Exercise 1

In this exercise, you will deploy a cluster of three Elasticsearch nodes with fine-tuned network and software settings. The code blocks contain information and/or exercise instructions.

# ** EXAM OBJECTIVE: INSTALLATION AND CONFIGURATION **
# GOAL: Setup an Elasticsearch cluster that satisfies a given set of requirements
# REQUIRED SETUP: /

Let’s begin by getting the right Elasticsearch package for your machine and providing an initial setup for each node.

# Download the exam version of Elasticsearch
# Deploy the cluster `eoc-01-cluster`, so that it satisfies the following  
  requirements:
  (i)   has three nodes, named `node1`, `node2`, and `node3`,
  (ii)  all nodes are eligible master nodes

Now, let’s configure the network and discovery settings.

# Bind `node1` to the IP address “151.101.2.217” and port “9201”
# Bind `node2` to the IP address “151.101.2.218” and port “9202”
# Bind `node3` to the IP address “151.101.2.219” and port “9203”
# Configure the cluster discovery module of `node2` and `node3` so as to use `node1` 
  as seed host

⚠ : The cluster coordination layer of Elasticsearch has been completely rebuilt for v7.x (see the Elastic blog post). As a consequence, the node discovery and cluster formation processes and settings have significant differences than the Elasticsearch exam version (see “Discovery module and its configuration”.

Elasticsearch is a resilient creature, and if a master node goes down, then another eligible master node is promoted. However, in the case of a temporary network partition, this might lead to the promotion of one master node per partition. A cluster with many master nodes? It sounds bad because it is bad. This condition is called a split brain and must be avoided.

⚠ : Unnecessary in Elasticsearch v7.x (see “Voting Configurations”).

# Configure the nodes to avoid the split brain scenario

A node can serve multiple purposes, and you should know how to specify them.

# Configure `node1` to be a data node but not an ingest node
# Configure `node2` and `node3` to be both an ingest and data node

The configuration file of a node supports much more than just Elasticsearch-specific properties. For instance, system and logs configurations, and even index policies.

# Configure `node1` to disallow swapping on its host
# Configure the JVM settings of each node so that it uses a minimum and maximum of  
  8 GB for the heap
# Configure the logging settings of each node so that
  (i)  the logs directory is not the default one,
  (ii) the log level for transport-related events is set to "debug"
# Configure the nodes so as to disable the possibility to delete indices using 
  wildcards

Exercise 2

In this exercise, you will secure the cluster data using Elasticsearch Security. The exercise requires a running Elasticsearch cluster with at least one node and a Kibana instance. You can spin up such cluster in no time by using a docker-compose file from my elastic-training-repo on GitHub. So, download the file and run it with docker-compose.

# ** EXAM OBJECTIVE: INSTALLATION AND CONFIGURATION **
# GOAL: Secure a cluster and an index using Elasticsearch Security
# REQUIRED SETUP:
  (i)  a running Elasticsearch cluster with at least one node and a Kibana instance,
  (ii) no index with name `hamlet` is indexed on the cluster

Most of the security features of Elasticsearch have been free since v6.8.0 and v7.1.0 (see the Elastic Blog Post). However, the certification exam is currently based on an earlier version, which requires you to first unlock a trial license. This will give you 30 days to enable, setup and evaluate all security settings.

# Enable xPack security on the cluster
# Set the password of the `elastic` and `kibana` built-in users.
  Use the pattern "{{username}}-password" (e.g., "elastic-password")
# Login to Kibana using the `elastic` user credentials

We are now going to use the _bulk API to index some documents into the cluster. The documents are lines from Hamlet by William Shakespeare, and have the following structure:

{
  "line_number": "String",
  "speaker": "String",
  "text_entry": "String",
}

Let’s continue with the exercise.

# Create the index `hamlet` and add some documents by running the following _bulk command:
  PUT hamlet/_doc/_bulk
  {"index":{"_index":"hamlet","_id":0}}
  {"line_number":"1","speaker":"BERNARDO","text_entry":"Whos there?"}
  {"index":{"_index":"hamlet","_id":1}}
  {"line_number":"2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."}
  {"index":{"_index":"hamlet","_id":2}}
  {"line_number":"3","speaker":"BERNARDO","text_entry":"Long live the king!"}
  {"index":{"_index":"hamlet","_id":3}}
  {"line_number":"4","speaker":"FRANCISCO","text_entry":"Bernardo?"}
  {"index":{"_index":"hamlet","_id":4}}
  {"line_number":"5","speaker":"BERNARDO","text_entry":"He."}
  {"index":{"_index":"hamlet","_id":5}}
  {"line_number":"6","speaker":"FRANCISCO","text_entry":"You come most carefully upon your hour."}
  {"index":{"_index":"hamlet","_id":6}}
  {"line_number":"7","speaker":"BERNARDO","text_entry":"Tis now struck twelve; get thee to bed, Francisco."}
  {"index":{"_index":"hamlet","_id":7}}
  {"line_number":"8","speaker":"FRANCISCO","text_entry":"For this relief much thanks: tis bitter cold,"}
  {"index":{"_index":"hamlet","_id":8}}
  {"line_number":"9","speaker":"FRANCISCO","text_entry":"And I am sick at heart."}
  {"index":{"_index":"hamlet","_id":9}}
  {"line_number":"10","speaker":"BERNARDO","text_entry":"Have you had quiet guard?"}

You can specify authentication (“who are you”) and authorisation (“what you can do”) policies on the Elasticsearch resources by means of users, roles, and mappings between users and roles. Do you know how to do that?

# Create the security role `francisco_role` in the native realm, so that:
  (i)  the role has "monitor" privileges on the cluster,
  (ii) the role has all privileges on the `hamlet` index
# Create the user `francisco` with password "francisco-password"
# Assign the role `francisco_role` to the `francisco` user

Don’t forget to check that your configuration works as expected - especially during the exam!

# Login using the `francisco` user credentials, and run some queries on `hamlet` to 
  Verify that the role privileges were correctly set

Not bad, right? Now, let’s create a more sophisticated security role, which assigns read-only permissions on indices, documents and fields.

# Create the security role `bernardo_role` in the native realm, so that:
  (i)   the role has "monitor" privileges on the cluster,
  (ii)  the role has read-only privileges on the `hamlet` index,
  (iii) the role can see only those documents having "BERNARDO" as a `speaker`,
  (iv)  the role can see only the `text_entry` field
# Create the user `bernardo` with password "bernardo-password"
# Assign the role `bernardo_role` to the `bernardo` user
# Login using the `bernardo` user credentials, and run some queries on `hamlet` to 
  Verify that the role privileges were correctly set

Whoops, I asked you to assign the wrong password to the “bernardo” user. My bad. Would you be so kind as to change it?

# Change the password of the `bernardo` user to "poor-bernardo"

(Never forget to check if it worked!)

Administer an Elasticsearch Cluster

Elasticsearch ships with reasonable defaults and usually requires little configuration. At the same time, it offers great flexibility to optimise your cluster for high availability, performance, security, performance, and more. An Elastic Certified Engineer candidate should be very confident with this subject, and the “Cluster Administration” Exam Objectives is there to prove it.

Exercise 3

In this exercise, you will optimise an Elasticsearch cluster for availability and robustness by configuring how the data is distributed across nodes as indices and shards. Furthermore, you will use the _cat API to interact with the cluster and check that all your operations have the expected result.

The exercise doesn’t require any preliminary set-up. Also, as in Exercise 2, we will use lines from Hamlet by William Shakespeare to index document into the clusters.

# ** EXAM OBJECTIVE: CLUSTER ADMINISTRATION **
# OBJECTIVES: Allocate the shards in a way that satisfies a given set of requirements
# REQUIRED SETUP: /

By now, you should know how to install, deploy, and provide a basic configuration to an Elasticsearch node. If not, you can always practice some more with Exercise 1.

# Download the exam version of Elasticsearch
# Deploy the cluster `eoc-06-cluster`, with three nodes named `node1`, `node2`, and `node3`
# Configure the Zen Discovery module of each node so that they can communicate
# Start the cluster

⚠ : Remember that the node discovery and cluster formation have significantly changed in Elasticsearch v7.x (see updated “Network Settings”).

We’re now going to put some data in the cluster, and nothing is better than our good old Hamlet. Let’s create two new indices, each with two primary shards and one replica. Also, let’s add some documents by using again the _bulk API.

# Create the index `hamlet-1` with two primary shards and one replica
# Add some documents to `hamlet-1` by running the following _bulk command
  PUT hamlet-1/_doc/_bulk
  {"index":{"_index":"hamlet-1","_id":0}}  
  {"line_number":"1","speaker":"BERNARDO","text_entry":"Whos there?"}
  {"index":{"_index":"hamlet-1","_id":1}} 
  {"line_number":"2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."}
  {"index":{"_index":"hamlet-1","_id":2}}
  {"line_number":"3","speaker":"BERNARDO","text_entry":"Long live the king!"}
  {"index":{"_index":"hamlet-1","_id":3}}
  {"line_number":"4","speaker":"FRANCISCO","text_entry":"Bernardo?"}
  {"index":{"_index":"hamlet-1","_id":4}}
  {"line_number":"5","speaker":"BERNARDO","text_entry":"He."}

# Create the index `hamlet-2` with two primary shard and one replica
# Add some documents to `hamlet-2` by running the following _bulk command
  PUT hamlet-2/_doc/_bulk
  {"index":{"_index":"hamlet-2","_id":5}}
  {"line_number":"6","speaker":"FRANCISCO","text_entry":"You come most carefully upon your hour."}
  {"index":{"_index":"hamlet-2","_id":6}}
  {"line_number":"7","speaker":"BERNARDO","text_entry":"Tis now struck twelve; get thee to bed, Francisco."}
  {"index":{"_index":"hamlet-2","_id":7}}
  {"line_number":"8","speaker":"FRANCISCO","text_entry":"For this relief much thanks: tis bitter cold,"}
  {"index":{"_index":"hamlet-2","_id":8}}
  {"line_number":"9","speaker":"FRANCISCO","text_entry":"And I am sick at heart."}
  {"index":{"_index":"hamlet-2","_id":9}}
  {"line_number":"10","speaker":"BERNARDO","text_entry":"Have you had quiet guard?"}

You can always check the health and other stats of an index by using the cat APIs. For example, verify that all indices’ replicas have been already allocated - i.e., their health status is green. And while you are at it, use the right cat API to see the distribution of primary shards and replicas among the nodes (spoiler alert: is the cat shards).

# Check that the replicas of indices `hamlet-1` and `hamlet-2` have been allocated 
# Check the distribution of the primary shards and replicas of indices `hamlet-1`  
  and `hamlet-2` across the nodes of the cluster

Elasticsearch allows you to restrict the set of nodes that can allocate the shards of a given index. This feature is called shard allocation filtering, it can be configured either via APIs or in the configuration file of the node, and it’s what we’re going to practice with next.

# Configure `hamlet-1` to allocate both primary shards to `node2`, using the node name
# Verify the success of the last action by using the _cat API
# Configure `hamlet-2` so that no primary shard is allocated to `node3`
# Verify the success of the last action by using the _cat API
# Remove any allocation filter setting associated with `hamlet-1` and `hamlet-2`

Imagine that your cluster is distributed among different locations (e.g., availability zones, racks, continents, planets!). To increase availability, you want to avoid any data loss in the cluster if one location fails. To this end, you can make Elasticsearch aware of this physical configuration of the cluster so that it can distribute shards in such a way as to minimise the impact of failure - for example, by putting at least one replica of each shard in a different location. How to do that? Does shard allocation awareness ring any bells?

# Let's assume that we have deployed the `eoc-06-cluster` cluster across two  
  Availability zones, named `earth` and `mars`. Add the attribute `AZ` to the nodes  
  Configuration, and set its value to "earth" for `node1` and `node2`, and to "mars"  
  for `node3`
# Restart the cluster
# Configure the cluster to force shard allocation awareness based on the two  
  Availability zones, and persist such configuration across cluster restarts
# Verify the success of the last action by using the _cat API

In the last part of this exercise, we will practice with the last strategy offered by Elasticsearch to control the allocation of shards, also named shard allocation filtering. This strategy allows you to create some of the architectures described in the webinar “Elasticsearch Architecture Best Practices”, which I strongly encourage you to watch before the exam. Like for the other allocation strategies, you will need to adapt the configuration file of each node and adjust the “index.routing.allocation.*” settings of your indices.

# Configure the cluster to reflect a hot/warm architecture, with `node1` as the only  
  hot node
# Configure the `hamlet-1` index to allocate its shards only to warm nodes
# Verify the success of the last action by using the _cat API \
# Remove the hot/warm shard filtering configuration from the `hamlet-1` configuration
# Let's assume that the nodes have either a "large" or "small" local storage. Add the
  attribute `storage` to the nodes configuration, and set its value so that `node2` is
  the only with a "small" storage
# Configure the `hamlet-2` index to allocate its shards only to nodes with a large 
  storage size
# Verify the success of the last action by using the _cat API

Exercise 4

In this exercise, you will spin up a brand new cluster and create a backup repository for (re)storing snapshots of its data. Also, you will deploy a second cluster and enable cross cluster search with the other one. The exercise doesn’t require any preliminary set-up.

# ** EXAM OBJECTIVE: CLUSTER ADMINISTRATION **
# GOAL: Backup and cross-cluster search
# REQUIRED SETUP: /

Let’s create a one-node cluster and index some data in it.

# Download the exam version of Elasticsearch
# Deploy the cluster `eoc-06-original-cluster`, with one node named `node-1`
# Start the cluster
# Create the index `hamlet` and add some documents by running the following _bulk command
  PUT hamlet/_doc/_bulk
  {"index":{"_index":"hamlet","_id":0}}
  {"line_number":"1","speaker":"BERNARDO","text_entry":"Whos there?"}
  {"index":{"_index":"hamlet","_id":1}}
  {"line_number":"2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."}
  {"index":{"_index":"hamlet","_id":2}}
  {"line_number":"3","speaker":"BERNARDO","text_entry":"Long live the king!"}
  {"index":{"_index":"hamlet","_id":3}}
  {"line_number":"4","speaker":"FRANCISCO","text_entry":"Bernardo?"}
  {"index":{"_index":"hamlet","_id":4}}
  {"line_number":"5","speaker":"BERNARDO","text_entry":"He."}

In Elasticsearch, you can backup your data by creating snapshots of a set of indices or of the entire cluster. Snapshots can be stored either in local repositories or in the storage service of the main cloud providers. Do you know how to take and restore a snapshot? Well, prove it!

# Configure `node-1` to support a shared file system repository for backups  
  located in
  (i)  "[home_folder]/repo" and
  (ii) "[home_folder]/elastic/repo" - e.g., "glenacota/elastic/repo"
# Create the `hamlet_backup` shared file system repository in 
  "[home_folder]/elastic/repo" 
# Create a snapshot of the `hamlet` index, so that the snapshot
  (i)  is named `hamlet_snapshot_1`,
  (ii) is stored into `hamlet_backup` 
# Delete the index `hamlet`
# Restore the index `hamlet` using `hamlet_snapshot_1`

Now, imagine that you own another Elasticsearch cluster that contains data related to the first one, such as recent adaptations of Hamlet. Also, imagine that you want to run queries against both the original and all the adaptations of the play. Elasticsearch also offers this functionality, which is called cross-cluster search. To practice with it, we need a second cluster.

# Deploy a second cluster `eoc-06-adaptation-cluster`, with one node named `node-2`
# Start the cluster
# Create the index `hamlet-pirate` on `node-2` and add documents using the _bulk command
  PUT hamlet-pirate/_doc/_bulk
  {"index":{"_index":"hamlet-pirate","_id":5}}
  {"line_number":"6","speaker":"FRANCISCO","text_entry":"Ahoy Matey! Ye come most carefully upon yer hour."}
  {"index":{"_index":"hamlet-pirate","_id":6}}
  {"line_number":"7","speaker":"BERNARDO","text_entry":"Aye! Tis now struck twelve; get ye to bed, Francisco."}
  {"index":{"_index":"hamlet-pirate","_id":7}}
  {"line_number":"8","speaker":"FRANCISCO","text_entry":"For this relief much thanks, son of a biscuit eater"}
  {"index":{"_index":"hamlet-pirate","_id":8}}
  {"line_number":"9","speaker":"BERNARDO","text_entry":"Arrrrrrrrh!"}

To enable cross-cluster queries from “eoc-06-adapation-cluster” to “eoc-06-original-cluster”, you must configure the latter as a remote cluster of the former one. This can be done either in the configuration file of the node connecting to the remote cluster or by using the cluster settings API. Let’s do it.

# Enable cross cluster search on `eoc-06-adaptation-cluster`, so that
  (i)   the name of the remote cluster is `original`,
  (ii)  the seed is `node-1`, which is listening on the default transport port,
  (iii) the cross cluster configuration persists across multiple restarts 
# Run the following cross-cluster query to check that your setup is correct 
  GET /original:hamlet,hamlet-pirate/_search
  {
    "query": {
      "match": {
        "speaker": "BERNARDO"
      }
    }
  }

Conclusions

This blog post provided four handy exercises to train for two Exam Objectives of the Elastic Certified Engineer exam. In particular, we practised the “Installation and Configuration” and “Cluster Administration” objectives. You can find the instructions-only version of the exercises also on this Github repo.

Don’t forget that this article was just the beginning of a series! New exercises will be published here on the kreuzwerker blog next week.

The credits for cover image go to Unsplash.

Enhancing Customer Workloads

. . .

Der Zukunftstag Mittelstand – Wie Zusammenhalt und Innovation einen ganzen Sektor bewegt!

. . .

Navigating the Depths: A Comprehensive Guide to AWS Data Lakes

. . .