Elasticsearch

Elasticsearch Cluster Deployment on DigitalOcean

Elasticsearch is deployed based on three droplets located on DigitalOcean.
The Elasticsearch cluster consists of several nodes, each interacting through transport port 9300. Internal communication between cluster nodes is done using the internal network 10.10.1.0/24.
Request load balancing is performed using a DigitalOcean Internal Load Balancer, accessible via the domain name elasticsearch-lb.prod.rockengroup.com on port 9200.
Port forwarding is also configured on the load balancer:

HTTPS on port 9200 -> HTTP on port 9200

This same port is used for health checks of the droplets (Health checks are configured on port 9200).

List of Elasticsearch Droplets:

Droplet Name

Domain

IP Address

elasticsearch-01-prod

elasticsearch-01.prod.rockengroup.com

167.172.97.179

elasticsearch-02-prod

elasticsearch-02.prod.rockengroup.com

178.128.207.86

elasticsearch-03-prod

elasticsearch-03.prod.rockengroup.com

46.101.168.210

Access to the cluster is restricted with a login and password (password can be found in 1Password).
The cluster is deployed using Docker Compose files (GitLab URL).

Directory Structure and Description:

Screenshot from 2024-09-09 10-01-36.png

certs – folder with certificates used for internal communication between cluster nodes

docker-compose.yml – Docker Compose file with configuration

.env – file with variables used in the Docker Compose file

The connection between nodes is established using SSL.
Adding new nodes is possible by modifying discovery.seed_hosts and creating a new Docker Compose file for each new node.


Example of the Docker Compose file for elasticsearch-01-prod:

version: "3.8"

services:
  elasticsearch-node:
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    container_name: elasticsearch-docker-node-1
    volumes:
      - elastic-data1:/usr/share/elasticsearch/data
      - elastic-config:/usr/share/elasticsearch/config
    ports:
      - ${ES_PORT}:9200
      - ${ES_PORT_TRANSFER}:9300
    environment:
      - node.name=elasticsearch-01
      - cluster.name=elasticsearch-cluster
      - network.host=0.0.0.0
      - network.publish_host=10.10.1.26
      - xpack.security.enabled=true
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.keystore.path=/usr/share/elasticsearch/config/certs/elastic-certificates.p12
      - xpack.security.transport.ssl.keystore.password=${CERT_STORE_PASSWORD}
      - xpack.security.transport.ssl.truststore.path=/usr/share/elasticsearch/config/certs/elastic-certificates.p12
      - xpack.security.transport.ssl.truststore.password=${CERT_STORE_PASSWORD}
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - bootstrap.memory_lock=true
      - discovery.seed_hosts=10.10.1.27:9300,10.10.1.28:9300
      # - cluster.initial_master_nodes=elasticsearch-01,elasticsearch-02,elasticsearch-03
    ulimits:
      memlock:
        soft: -1
        hard: -1
    networks:
      elastic_network:
        ipv4_address: 172.20.1.11

networks:
  elastic_network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.1.0/24

volumes:
  elastic-data1:
    driver: local
  elastic-config:
    driver: local

To add a new node to an existing cluster: Simply copy the docker-compose.yml, cert folder and .env file from one of the existing nodes, and update the node_name, IP address and other variables if needed. The cluster will automatically sync all indexes and settings to the new node.


Key Parameters Explained:

  • image – docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION} – The image used to start the container. The ${STACK_VERSION} variable is stored in the .env file.

  • volumes – Local storage for persistent data and certificates:

    elastic-data1:/usr/share/elasticsearch/data 
    elastic-config:/usr/share/elasticsearch/config/certs

    certfolder with self-signed certificates must be located in elastic-config volume

  • xpack.security.enabled – Enables X-Pack Security to support SSL and password authentication in Elasticsearch. Other options such as xpack.security.transport.ssl.enabled, xpack.security.transport.ssl.verification_mode, etc., configure additional security settings.

  • ELASTIC_PASSWORD – Password for accessing Elasticsearch.

  • bootstrap.memory_lock=true – Prevents Elasticsearch from swapping memory to disk.

  • discovery.seed_hosts – List of IP addresses for other cluster nodes.

  • cluster.initial_master_nodes – Used to initialize the cluster during its first deployment.

  • ulimits – Removes memory limits for Elasticsearch

  • networks – Creates a network using the bridge driver.

  • volumes – Local storage for Elasticsearch data


Elasticsearch and mmapfs:

Elasticsearch uses the mmapfs directory by default to store indices. Since the default operating system value may be too low, the Elasticsearch documentation recommends increasing it to 262144.
To do this, add the parameter vm.max_map_count=262144 to the /etc/sysctl.conf file and execute the command sysctl -p to apply the changes.
You can also change this value temporarily using the command sysctl -w vm.max_map_count=262144, but this will only be applied until the next system reboot.


Checking Cluster Status:

The cluster status can be viewed via the following URL:

https://elasticsearch-01.prod.rockengroup.com:9200/_cat/nodes

Elasticsearch Cluster setup

  • Master Node Elections: Elasticsearch uses a master node election process to manage cluster state. With 3 master-eligible nodes, you can achieve a majority quorum (more than half) for decisions, ensuring that the cluster can function even if one master node fails.

  • If 2 out of 3 nodes in an Elasticsearch cluster go offline, the cluster will become inaccessible or unavailable until a majority quorum can be established and data is recovered.

  • Once the second node comes back online, Elasticsearch will begin restoring the cluster state.

Elasticsearch Backup Strategy

Snapshot Repository Setup

  • Repository Type: DigitalOcean Spaces (backup-elasticsearch-prod)

  • Repository Name: backup-elasticsearch-prod

  • Configured Location: DigitalOcean Spaces bucket, with a specific base path (es-snapshots).

  • Access: Credentials (access key and secret key) in 1Password DO Space: backup-elasticsearch-prod.

Automation for Backups

Scripts for backups are configured on the elasticsearch-01-prod node.

  • Backup Script:

    #!/bin/bash
    
    DATE=$(date +%Y%m%d%H%M%S)
    ES_DIR="/root/elasticsearch"
    ES_HOST="http://localhost:9200"
    REPO_NAME="backup-elasticsearch-prod"
    SNAPSHOT_NAME="snapshot_$DATE"
    ES_PASSWORD=$(grep "^ELASTIC_PASSWORD=" $ES_DIR/.env | cut -d '=' -f2)
    LOG_NAME="$ES_DIR/es_snapshot.log"
    
    # create snapshot
    echo "$(date)" | tee -a $LOG_NAME
    echo "Creating a snapshot with name $SNAPSHOT_NAME" | tee -a $LOG_NAME
    
    # Perform snapshot request
    time curl -u "elastic:$ES_PASSWORD" -X PUT "${ES_HOST}/_snapshot/${REPO_NAME}/${SNAPSHOT_NAME}?wait_for_completion=true&pretty" \
        -H 'Content-Type: application/json' -d'
    {
      "indices": "*",
      "ignore_unavailable": true,
      "include_global_state": true
    }
    ' | tee -a $LOG_NAME
    
    sleep 5
    # check if snapshot is available
    if curl -u "elastic:$ES_PASSWORD" -X GET "${ES_HOST}/_cat/snapshots/${REPO_NAME}?v&s=id&pretty" | grep "$SNAPSHOT_NAME"
    then
        echo "$(date) - Snapshot ${SNAPSHOT_NAME} created" | tee -a $LOG_NAME
    else
        echo "$(date) - !!! Snapshot was not found. Exit !!!"
        exit 1
    fi
    
    echo "" >> $LOG_NAME
  • Execution Schedule:

    The script is scheduled to run automatically every night at 1:00 AM UTC using a cron job.

    0 1 * * * /root/elasticsearch/es_backup.sh
  • Verification

    Snapshots can be verified using the following command:

    curl -u "user:password" -X GET "http://localhost:9200/_cat/snapshots/backup-elasticsearch-prod?v&s=id&pretty"

Logs from the backup script are stored in /root/elasticsearch/es_snapshot.log.

Retention Policy

To manage storage effectively and prevent the repository from filling up with excessive snapshots, a retention policy is implemented to keep only the last 7 backups. Older backups are automatically deleted.

  • Cleanup Script

    #!/bin/bash
    
    ES_DIR="/root/elasticsearch"
    ES_HOST="http://localhost:9200"
    REPO_NAME="backup-elasticsearch-prod"
    ES_PASSWORD=$(grep "^ELASTIC_PASSWORD=" $ES_DIR/.env | cut -d '=' -f2)
    LOG_NAME="$ES_DIR/es_snapshot_cleanup.log"
    
    KEEP_COUNT=7  # Number of snapshots to keep
    
    echo "$(date)" | tee -a $LOG_NAME
    echo "Retrieving snapshot list..." | tee -a $LOG_NAME
    SNAPSHOT_LIST=$(curl -u "elastic:$ES_PASSWORD" -X GET "${ES_HOST}/_cat/snapshots/${REPO_NAME}?h=id&s=start_epoch&pretty" | awk '{print $1}')
    TOTAL_SNAPSHOTS=$(echo "$SNAPSHOT_LIST" | tr ' ' '\n' | wc -l)
    
    # Check if cleanup is needed
    if [ "$TOTAL_SNAPSHOTS" -gt "$KEEP_COUNT" ]; then
      echo "Cleaning up old snapshots. Total: $TOTAL_SNAPSHOTS, Keeping: $KEEP_COUNT" | tee -a $LOG_NAME
      DELETE_COUNT=$((TOTAL_SNAPSHOTS - KEEP_COUNT))
      DELETE_SNAPSHOTS=$(echo "$SNAPSHOT_LIST" | tr ' ' '\n' | head -n "$DELETE_COUNT")
    
      for SNAPSHOT in $DELETE_SNAPSHOTS; do
        echo "Deleting snapshot: $SNAPSHOT" | tee -a $LOG_NAME
        curl -u "elastic:$ES_PASSWORD" -X DELETE "${ES_HOST}/_cat/snapshots/${REPO_NAME}/$SNAPSHOT?pretty"
      done 
    else
      echo "No cleanup needed. Total snapshots: $TOTAL_SNAPSHOTS" | tee -a $LOG_NAME
    fi
    
    echo "" >> $LOG_NAME
  • Execution Schedule:

    The script is scheduled to run automatically every night at 2:00 AM UTC using a cron job.

    0 2 * * * /root/elasticsearch/es_backup_cleanup.sh
  • Verification

    Snapshots can be verified using the following command:

    curl -u "user:password" -X GET "http://localhost:9200/_cat/snapshots/backup-elasticsearch-prod?v&s=id&pretty"

    Verify that only the last 7 snapshots are retained

  • Logs from the cleanup script are stored in /root/elasticsearch/es_snapshot_cleanup.log.

Comments

Leave a Reply