Index, Filter and Query Strategy


(Simon Taylor) #1

I have made the move to a search tool (Elastic) after using many filters / REST queries etc and it has been amazing so far however I am very 'green' on the subject.

I have two main problems:

  1. Bulk and strategy for the future of using ElasticSearch
  2. Running, Filter Only, Query Only and Filter + Query

Background
I am using Elastic on an Exercise Prescription Mobile Application that users search for exercises and add them to Programmes. I currently have 3500 exercises that are structured in JSON like below:

{
    "difficulty": "Easy",
    "exerciseDescription": "Lie prone on a bed with the unaffected leg off, Bend the knee of the leg that is on the bed, Lower the leg and repeat",
    "exerciseID": "179",
    "exerciseName": "Femoral Nerve Mobilisation 1",
    "images": [
        683,
        684
    ],
    "tags": [
        "Neural",
        "Hip \u0026 Pelvis",
        "Knee",
        "Balance and Proprioception"
    ],
    "words": [
        "femoral",
        "nerve",
        "mobilisation",
        "1"
    ]
}

Problem 1
The exercise database gets updated every week so I would need to import a new index weekly.
For my bulk import I have been using elasticsearch-tools and providing it data in the format below:

{ "index":  { "_index": "exercises", "_type": "exercise" } }
{"difficulty":"Easy","exerciseDescription":"Lie prone on a bed with the unaffected leg off, Bend the knee of the leg that is on the bed, Lower the leg and repeat","exerciseID":"179","exerciseName":"Femoral Nerve Mobilisation 1","images":[683,684],"objectId":"AcHqZjCSV6","tags":["Neural","Hip & Pelvis","Knee"],"updatedAt":"2015-06-13T21:39:00.084Z","words":["femoral","nerve","mobilisation","1"]}
{ "index":  { "_index": "exercises", "_type": "exercise" } }
{"difficulty":"Easy","exerciseDescription":"Turn head to the side and raise the hand to the mouth as if to smoke, Lower hand to to the side","exerciseID":"216","exerciseName":"Ulnar Nerve Fingers to Lips","images":[788,789],"objectId":"udJSAa5rtH","tags":["Neural","Elbow, Wrist & Hand","Shoulder","No Equipment","Self Treatment"],"updatedAt":"2015-07-02T20:28:48.909Z","words":["ulnar","nerve","fingers","to","lips"]}

Which I managed to create my minifying my Pretty JSON and then finding the end of each exercise using Sublime Text and then pasting in the { "index": { "_index": "exercises", "_type": "exercise" } } above each line.

Question 1
How can I improve my workflow to index my exercises without having to do all this laboursome JSON manipulation before pressing GO on the bulk import. Additionally it would be ideal if I am able to index by the exercise ID however I have not figured out how to do that.

Problem 2
There are three methods of search in my App.

  1. Freetext / Query
  2. Tags Only
  3. Freetext / Query + Tags.

The user activates 'Tags' in the UI that pushes the name of the Tag to an Array. In the above case the exercise has the tags ["Neural","Hip \u0026 Pelvis","Knee", Balance and Proprioception"]. I believe that when my index is mapped then the exercise tags are being analysed and therefore made into tokens, therefore when I send the array of tags ["Balance and Proprioception"] it will not be found and returns an error.

I have been using the below query to solve the problem above - however understand that it is far from an ideal way to use ElasticSearch (however I have to say I am getting fantastic results - but its not correct).

var searchQuery = {
    "query": {
        "match": {
            "_all": {
                "query": term + " " + filterArray,
                "operator": "and"
            }
        }
    },
    "aggs": {},
    "size": 20
};

In this case an example of term would be squat with dumbells and an example of filterArray is ["Ball","Leg","Strength"]

Question 2
How can I change my Query to ensure that if there is a filter active then it will perform the filtering on the tags index prior to performing the text search on the results.

Many thanks for taking the time to read this, after a week I have had to resort to a forum! Both of my questions have been asked on Stackoverflow here and here.

Kind regards

Simon


(Colin Goodheart-Smithe) #2

For Question 1:

You could create a Logstash config that collects the data from your source (the database I presume) and outputs to Elasticsearch. This has the benefit that Logstash deals with the bulk requests for you, you just need to configure the input to point to your data, and the Elasticsearch output to send the index requests to Elasticsearch. This other advantage is that you can use Logstash to enrich or massage your data if you need to, using filters.

There are also many language clients where you can programmatically create the bulk requests for your documents instead of manually editing the JSON. Documents for how to use those clients are here (under the clients section).

For Question 2:

Consider using a filteredquery here. You can add your free-text search as a match query to the query section (similar to how you currently do it) and the filterArray can be added to the filter section using the terms filter. What you would end up with would look something like the following:

var searchQuery = {
    "query": {
        "filtered": {
          "query": {
            "match": {
            "_all": {
                "query": term,
                "operator": "and"
            }
        }
          },
          "filter": {
            "terms": {
              "tags": filterArray
            }
          }
        }
    },
    "size": 20
}

Hope that helps


(Simon Taylor) #3

@colings86 - Fantastic, many thanks.

Once quick question on using the terms filter to match my exercise tags on indexing it seems that they are mapped as Strings (analysed) therefore if I try to match terms using and array like this ["Balance and Proprioception"] it does not match does to it not matching the tags index exactly ?

In order for the filter (bool) to work does my tags field need to be non_analysed?


(Colin Goodheart-Smithe) #4

Yes, thats exactly right. If you want to do exact matching on the tags you will need to tags fields to be "index": "not_analyzed" in your mappings. If you want to be able to search for both exact searches and for searching keywords in the tags field, have a look at multifields. This allows you to create two fields in your index from one field in your JSON document. So you could set one to analyzed to let you do keyword searching and another to not_analyzed to let you do exact matching.


(Simon Taylor) #5

So I am up and running with logstash (kind of!) and have the following config

  input {
      file {
        path => "/Users/taylorsuk/Desktop/Exercises_Single.json"
        type => "exercises"
        codec => "json"
      }
    }
    output {
      stdout { codec => rubydebug }
      elasticsearch { 
      host => localhost 
      protocol => http
      }
    }

I am getting waited for 30s and no initial state was set by the discovery do you have any ideas for where I can get info on getting setup with a simple input (JSON) > filter / sort > ElasticSearch Index

Many thanks


JSON > Logstash > ElasticSearch
(Colin Goodheart-Smithe) #6

hmmm, unfortunately I only have limited knowledge about troubleshooting Logstash. I have checked my logstash configs and usual I specify the protocol as protocol => "http" but I'm not sure the lack of double quotes is the issue here. It might be worth starting a topic in the Logstash category for this issue. The users there will almost certainly know how to solve this for you.


(system) #7