Universal search

Nikita_Krasnov · December 26, 2018, 3:19pm

Hello to everyone.
I have a string "Jhon Abraham 18". I want to create search query that will search by divided by spaces words from the string in an index. This search have to be set to all fields of the index and you don't know what meaning have to be mapped(set) to a field.
So, I have a document:

{
  "_index": "recipient",
  "_type": "recipient",
  "_id": "37a15258d9",
  "_version": 1,
  "_score": 1,
  "_source": {
    "name": "Jhon ",
    "surname": "Abraham",
    "age": "18 ",
}

and I don't know to what fields of index meanings Jhon, Abraham and 18 correspond. I just have a string and by this string I want to search in all fields of the index documents. I can divide it by separete words by spaces but I don't know exact mapping fields for search. Also, I want to do it at Java.
I'll be appreciate for help.

Igor_Motov · December 26, 2018, 4:15pm

The answer depends on the version of elasticsearch that you are using. In 6.x and above you can use multi_match query with "*" in fields. For the best results numeric fields should be mapped as strings, otherwise you might run into issues:

DELETE test

PUT test
{
  "settings": {
    "number_of_shards": 1
  }
}

PUT test/doc/1
{
    "name": "Jhon",
    "surname": "Abraham",
    "age": "18 "
}


PUT test/doc/2
{
    "name": "John",
    "surname": "Smith",
    "age": "19 "
    
}

POST test/_search
{
  "query": {
    "multi_match": {
      "query": "John Abraham 18",
      "fields": ["*"],
      "lenient": "true",
      "type": "most_fields"
    }
  }
}

Nikita_Krasnov · December 26, 2018, 4:25pm

Thanks a lot. Do you know such example on Java?

Igor_Motov · December 26, 2018, 4:36pm

No, I don't. What's the issue with java?

Nikita_Krasnov · December 27, 2018, 12:37pm

I have Java code:

List<Recipient> searchRecipients = new ArrayList<>();
        SearchRequest idSearchRequest = new SearchRequest("test");
SearchSourceBuilder idSearchSourceBuilder = new SearchSourceBuilder();

        QueryBuilder qb = multiMatchQuery(searchParameters, "*").type(MOST_FIELDS);
        idSearchSourceBuilder.query(qb);
        idSearchRequest.source(idSearchSourceBuilder);

        SearchResponse searchResponse;
        try {
            searchResponse = esClient.search(idSearchRequest, RequestOptions.DEFAULT);
            SearchHit[] searchHits = searchResponse.getHits().getHits();
            for (SearchHit searchHit : searchHits) {
                Recipient searchHitRecipient = modelMapper().readValue(searchHit.getSourceAsString(), Recipient.class);
                searchRecipients.add(searchHitRecipient);
            }
        } catch (IOException e) {
            log.error(e.getMessage());
        }

And if I imply this code I get such request:

    POST test/_search
{
  "multi_match" : {
    "query" : "1 254898",
    "fields" : [
      "*^1.0"
    ],
    "type" : "most_fields",
    "operator" : "OR",
    "slop" : 0,
    "prefix_length" : 0,
    "max_expansions" : 50,
    "zero_terms_query" : "NONE",
    "auto_generate_synonyms_phrase_query" : true,
    "fuzzy_transpositions" : true,
    "boost" : 1.0
  }
}

I got such error:

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "Unknown key for a START_OBJECT in [multi_match].",
        "line": 2,
        "col": 19
      }
    ],
    "type": "parsing_exception",
    "reason": "Unknown key for a START_OBJECT in [multi_match].",
    "line": 2,
    "col": 19
  },
  "status": 400
}

After changing query to:

   POST test/_search
    {
      "query": { 
      "multi_match" : {
        "query" : "1 254898",
        "fields" : [
          "recipient.document.number^1.0",
          "recipient.document.type^1.0"
        ],
        "type" : "most_fields",
        "operator" : "OR",
        "slop" : 0,
        "prefix_length" : 0,
        "max_expansions" : 50,
        "zero_terms_query" : "NONE",
        "auto_generate_synonyms_phrase_query" : true,
        "fuzzy_transpositions" : true,
        "boost" : 1.0
      }
      }
    }

I get such error:

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: {\n  \"multi_match\" : {\n    \"query\" : \"1 254898\",\n    \"fields\" : [\n      \"*^1.0\"\n    ],\n    \"type\" : \"most_fields\",\n    \"operator\" : \"OR\",\n    \"slop\" : 0,\n    \"prefix_length\" : 0,\n    \"max_expansions\" : 50,\n    \"zero_terms_query\" : \"NONE\",\n    \"auto_generate_synonyms_phrase_query\" : true,\n    \"fuzzy_transpositions\" : true,\n    \"boost\" : 1.0\n  }\n}",
        "index_uuid": "kZAjfQWHQ6SkNIETBxhfOA",
        "index": "portal-smm-recipient"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "portal-smm-recipient",
        "node": "JEwfXRPHQSWjPn4L4U6AVQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n  \"multi_match\" : {\n    \"query\" : \"1 254898\",\n    \"fields\" : [\n      \"*^1.0\"\n    ],\n    \"type\" : \"most_fields\",\n    \"operator\" : \"OR\",\n    \"slop\" : 0,\n    \"prefix_length\" : 0,\n    \"max_expansions\" : 50,\n    \"zero_terms_query\" : \"NONE\",\n    \"auto_generate_synonyms_phrase_query\" : true,\n    \"fuzzy_transpositions\" : true,\n    \"boost\" : 1.0\n  }\n}",
          "index_uuid": "kZAjfQWHQ6SkNIETBxhfOA",
          "index": "portal-smm-recipient",
          "caused_by": {
            "type": "number_format_exception",
            "reason": "For input string: \"1 254898\""
          }
        }
      }
    ]
  },
  "status": 400
}

What I'm doing wrong?

Igor_Motov · December 27, 2018, 2:30pm

Which version of es is this?

As I mentioned before if some of your fields are indexed as numeric fields (not like in your original example where all fields are text) this approach is not going to work. You need to reindex numeric fields as text fields and exclude numeric fields from search or you need to ignore them by adding "lenient": "true", as in my example.

Nikita_Krasnov · December 27, 2018, 3:56pm

I did it by this code:
QueryBuilder qb = multiMatchQuery(searchParameters, "*").type(MOST_FIELDS).lenient(true);

And put 5 documents in one index:

PUT test/doc/1
{
    "name": "Smith",
    "surname": "19",
    "age": "Jhon"
    
}

PUT test/doc/2
{
    "name": "Long",
    "surname": "Jhon",
    "age": "19"
    
}


PUT test/doc/3
{
    "name": "Smith",
    "surname": "Jhon",
    "age": "19"
    
}

PUT test/doc/4
{
    "name": "Jhon",
    "surname": "Smith",
    "age": "19"
    
}

PUT test/doc/5
{
    "name": "1",
    "surname": "Jhon",
    "age": "19"
    
}

run a query:

POST test/_search
{
  "query": { 
  
  "multi_match" : {
    "query" : "Jhon Smith",
    "fields" : [
      "*^1.0"
    ],
    "type" : "most_fields",
    "operator" : "OR",
    "slop" : 0,
    "prefix_length" : 0,
    "max_expansions" : 50,
    "lenient" : true,
    "zero_terms_query" : "NONE",
    "auto_generate_synonyms_phrase_query" : true,
    "fuzzy_transpositions" : true,
    "boost" : 1.0
  }
}
}

and get result:

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "4",
        "_score": 1.3862944,
        "_source": {
          "name": "Jhon",
          "surname": "Smith",
          "age": "19"
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "2",
        "_score": 0.6931472,
        "_source": {
          "name": "Long",
          "surname": "Jhon",
          "age": "19"
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "name": "Smith",
          "surname": "Jhon",
          "age": "19"
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "1",
        "_score": 0.36464313,
        "_source": {
          "name": "Smith",
          "surname": "19",
          "age": "Jhon"
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "5",
        "_score": 0.2876821,
        "_source": {
          "name": "1",
          "surname": "Jhon",
          "age": "19"
        }
      }
    ]
  }
}

The score is different. I create my logic according to score and need the same score if Jhon Smith was found no matter in what fields and in what order. So, the score should be the same and the highest if Jhon and Smith was found. How can I do it?

Igor_Motov · December 27, 2018, 5:50pm

Which version of elasticsearch is this?

Nikita_Krasnov · December 27, 2018, 11:15pm

6.4.1

Nikita_Krasnov · December 28, 2018, 1:29pm

Solve my task by this one:

QueryBuilder qb = multiMatchQuery(searchParameters, "*").type(CROSS_FIELDS).operator(Operator.AND).
                lenient(true);

query in Elastic:

    {
      "multi_match" : {
        "query" : "1 19",
        "fields" : [
          "*^1.0"
        ],
        "type" : "cross_fields",
        "operator" : "AND",
        "slop" : 0,
        "prefix_length" : 0,
        "max_expansions" : 50,
        "lenient" : true,
        "zero_terms_query" : "NONE",
        "auto_generate_synonyms_phrase_query" : true,
        "fuzzy_transpositions" : true,
        "boost" : 1.0
      }
    }

but I have a template for my data:

PUT _template/test
{
  "index_patterns": "test",
  "settings": { "number_of_shards": 5,
    "analysis": {
      "normalizer": {
        "useLowercase": {
          "type": "custom",
          "filter": [ "lowercase" ]
        }
      }
    
  }
  },
  "mappings": {
      "test": {
        "properties": {
          "name": {"normalizer":"useLowercase",
          "type": "keyword"},
      "surname": {"normalizer":"useLowercase",
          "type": "keyword"},
          "age": {
          "type": "long"}
    }
  }
}
}

I use normalizer [lowercase] for searching independetly from register. But I can't use it with multi match - can't find anything.

Have this data in Elastic:

PUT test/test/1
{
    "name": "Smith",
    "surname": "19",
    "age": 19
    
}

PUT test/test/2
{
    "name": "Long",
    "surname": "Jhon",
    "age": 19
    
}


PUT test/test/3
{
    "name": "Smith",
    "surname": "Jhon",
    "age": 19
    
}

PUT test/test/4
{
    "name": "Jhon",
    "surname": "Smith",
    "age": 19
    
}

PUT test/test/5
{
    "name": "1",
    "surname": "Jhon",
    "age": 19
    
}

and using query which I wrote upper, I can't find anything because of normalizer. What I have to add or change for such case? I want to search independetly from register and if 1 and 19 is in the document no metter in what fields.

Igor_Motov · December 28, 2018, 2:18pm

I see. That gets way too complicated comparing to original question then. I think it would be easier for you to just copy all fields into a single text field using copy_to.

Nikita_Krasnov · December 29, 2018, 8:47am

As I understood I have to use copy_to during creating index:

PUT test
{
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "keyword",
          "copy_to": "full_name" 
        },
        "surname": {
          "type": "keyword",
          "copy_to": "full_name" 
        },
        "age": {
          "type": "integer"
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

But such mapping I also do during creating templete.

PUT _template/test
{
  "index_patterns": "test",
  "settings": { "number_of_shards": 5
    
  },
  "mappings": {
      "test": {
        "properties": {
          "name": {"type": "keyword"},
      "surname": {"type": "keyword"},
          "age": {"type": "long"}
    }
  }
}
}

Will it cross in some way? And with such realization I can't find any documents by field full_name in the index.

Igor_Motov · December 31, 2018, 9:25pm

I don't understand what's the issue here. If you are creating index explicitly, you don't need the template. If you are creating the index with template, you need to specify copy_to in the template.

system · January 28, 2019, 9:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Searching with multiple values - JAVA Elasticsearch	5	2757	July 5, 2017
Query with Text field Elasticsearch	2	233	June 29, 2023
Elastic search parse a string Elasticsearch	9	836	July 24, 2022
How to implement a search by multiple fields and support whitespace, symbols, case insensitive Elasticsearch	5	1153	May 12, 2023
Using space characters in a field's name Elasticsearch	2	4342	July 6, 2017

Universal search

Related topics