Introduction to Elasticsearch Using Node.js—Part 2

Practical use: explanation and example

Published in

Better Programming

6 min readAug 21, 2019

This is the 11th article in a series, which will help you grasp different concepts behind Node.js and will empower you to create production-ready applications.

This article expects the reader to know Babel. Please read this article if you need to know how to set it up.

Before reading this article, you must go through the previous article, as it covers all the concepts you need to know.

You can download the archive files of Elastic and Kibana from here and extract them.

Kibana is mostly used to visualize your Elasticsearch data and navigate the Elastic Stack. But it also has a dev console, which you can use to interact with elastic data. Elastic exposes a REST API by default. So every action which you perform on the Kibana dev console can also be done through a simple curl request.

Once you extract both elastic and Kibana archive files, you can start both of them by running bin/elasticsearch andbin/kibana.

You can then access Kibana from the following URL: htrp://localhost:5601.

You can go to Kibana’s dev console tab and hit the following API to get a verbose output showing details of all indices created: _cat/indices?v

You can also hit the elastic instance by a simple curl as below. But here, for obvious reasons, you have to explicitly mention the host and port details of your elastic instance.

Let us first create an index with our custom mapping and analyzer. Then we will add documents to the index. We will store lines and dialogues from Shakespeare’s plays. Our sample document will look like below:

Copy and paste the following request to your Kibana dev console.

Before hitting the API, let’s analyze each section of the request in detail.

Mappings

In the mappings, we have kept the keyword data type for the fields speaker and line_number, as we don’t have to perform a partial string search for these fields. Similarly we kept line_id and speech_number as integer.

"mappings": {
    "properties": {
      "speaker": {
        "type": "keyword"
      },
      "play_name": {
        "type": "text",
        "analyzer": "pk_custom_analyzer"
      },
      "line_id": {
        "type": "integer"
      },
      "speech_number": {
        "type": "integer"
      },
      "line_number": {
        "type": "keyword"
      },
      "text_entry": {
        "type": "text",
        "analyzer": "autocomplete"
      }
    }
  }

playname and text_entry are important fields. For both, we have kept the datatype as text. Additionally, we have attached a different custom analyzer to each of them.

pk_custom_analyzer is a basic version of a custom analyzer like we created in the last article. But autocomplete analyzer uses a custom filter too. Let’s see the analyzer settings of our index.

Analyzer Settings

As the name suggests autocomplete analyzer is to provide auto completion to text input/search. This can be achieved by using the tokenizer edge_ngramand the token filter edge_ngram. I prefer using the token filter as we can only have one tokenizer, so it is better to keep it free for other purposes. edge_ngram tokenizers or filters emit N-grams of each word, where the start of the N-gram is anchored to the beginning of the word. So Quick becomes [‘Q’, ‘Qu’, ‘Qui’, ‘Quic’, ‘Quick’].

First, we create the custom token filter using edge_ngram. autocomplete_filter is the filter name. min_gram defines the minimum length of the token generated by edge_ngram and max_gram defines the maximum length.

"filter": {
   "autocomplete_filter": {
      "type": "edge_ngram",
      "min_gram": "1",
      "max_gram": "40"
   }
}

Then we define our custom analyzers by explicitly mentioning the filters and tokenizers we want to use:

      "analyzer": {
        "pk_custom_analyzer": {
          "type":      "custom", 
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete": {
          "filter": ["lowercase", "autocomplete_filter"],
          "type": "custom",
          "tokenizer": "whitespace"
        }
      }

Now let’s hit the API in the Kibana console. You should see a response like below:

Whenever you work with Elasticsearch, it is very useful to know that you can test your analyzers with any custom input. You can hit the following API:

POST shakespeare/_analyze
{
  "analyzer": "pk_custom_analyzer",
  "text":     "<p>I'm Inevitable</p>"
}

It should return the tokens after analyzing the given input. This is very helpful as we can find out if our analyzer works properly for various inputs (before uploading documents to our index). You can do the same for the other autocomplete analyzer too.

Now, let us upload some documents to our index. Download the sample file from this link, and I will add the same file to my Git repo too.

Go to the folder of the downloaded file. You can use a simple curl request to hit the given API by ES. The API URL is /index-name/_bulk?pretty. You can hit the API like below:

curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare_lines.json

It will take a few seconds for all the documents to get uploaded. You can use _cat/indices?v to see details of indices again.

The search query to hit our auto complete data will be as follows:

GET /shakespeare/_search
{
  "query" : {
    "match":{
      "text_entry" : "Where"
    }
  }
}

Hitting this API will give a response similar to the structure below:

{
  "took": 6,
  "timed_out": false,
  "_shards": {...},
  "hits": {
    "total": {...},
    "max_score": 1.9957395,
    "hits": [
              {
                "_index" : "shakespeare",
                "_type" : "_doc",
                "_id" : "s1AotWwBbPS4PkMGTqTW",
                "_score" : 1.9957395,
                "_source" : {
                   ...
                   "speaker" : "Second Servingman",
                   "text_entry" : "Wherefore? wherefore?"
                }
              }            
           ...]
  }
}

The took field will mention the milliseconds it took to execute the query. _shards will have the shard details from where the data was fetched. hits.total will contain the count of available results. hits.hits will contain the array of matching documents. _score signifies the relevance of the document with respect to the search term. The _source field holds the actual document.

Now, to integrate with Node.js, we can either use Elasticsearch’s official client library or use the popular request library to hit Elasticsearch’s rest APIs. It is very common to have the request library as part of the project, so I will use that in this article. (@elastic/elastisearch; request)

Let’s start with a boilerplate I created earlier. If you have followed the series from the start, the boilerplate should be very clear. Install the request library.

npm i request --save

Create the following file: services/PlayService.js

In the above API call, we pass two parameters to elastic as below:

let reqObject = {
   url:"http://localhost:9200/shakespeare/_search?pretty",
   json:queryClause
};

The queryClause will contain the object written in Query DSL.

Now create a route file routes/play.js like below:

We are defining a get request here, where the path should be /play/line, and there should be query parameter q.

Open the app.js file and import the play route.

import play from './routes/play';

And inject the play route just under the existing user route injection.

app.use('/user',user); // Existing code in boiler plate
app.use('/play',play);

Now let’s start our API server by running node index.js. You can hit the API in postman and get the results.

In real-life use cases, you won’t return the results as we did here. You will either restructure and filter the data, or you will do another operation based on these results.

Elasticsearch is a vast topic. But I hope I gave you a quick and well-understood introduction.

Code

pankaj805/medium-11-es

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com