Show first especific documents


(Rodolfo Edu Lugo Garcia) #1

Hello Guys!
I'm creating an autocomplete query for hotels and destinations.
I want than when i type Can or Canc i see Cancun in the first results of my query.
I already have a list of popular "citys and hotels" that i want to see first in my autocomplete.

I think i can put a "relevance" value when I make the bulk code, but i don't be shure what way to follow

Anyone can help me?, i apresiate it alot!

Thanks!!
My php-code is the follow:

$params = [
	    'index' => ['hotels', 'destination_ngrams'],
	    'type' => ['hotel','city'],
	    "size" => 100	,
	    'body' => [
	        'query' => [
	            'multi_match' => [
	            	"type" => "best_fields",
	            	"query" => $text,
	            	"fields" => ["destination_name_*^3","hotel_name"],
	            	"fuzziness" => "AUTO"
	            ],
	        ]
	    ]
	];

(Rodolfo Edu Lugo Garcia) #2

Any suggestion guys?


(Byron Voorbach) #3

I think a bit more context is needed in order to help you out.
Like:

  • What is going wrong at the moment (eg. no results / weird results / wrong order of results)
  • What does your mapping look like for both destination_name & hotel_name
  • How much data do you have
  • More context the better :slight_smile:

(Rodolfo Edu Lugo Garcia) #4

Thanks, i will send more info in a little time.

Thanks


(Rodolfo Edu Lugo Garcia) #5

@byronvoorbach thanks for your recomendation and this are the index/seach/bulk code (i ommite the bulkk code of the hotels).

The amount of hotels is around of 100k.
The amount of destinations is around of 5k.

My Hotel Index:

$params = [
		    'index' => 'hotels',
		    'body' => [
		        'settings' => [ 
		            'analysis' => [ 	
		            	'analyzer' => [
		                    'hotel_analyzer' => [
		                        'type'      => 'custom',
		                        'tokenizer' => 'whitespace',
		                        "tokenizer" => "standard",
		                        'filter'    => ['lowercase', 'destination_ngram'],
		                        'stopwords' => ['the', "and", "&", "hotel", "all inclusive", "resort"]
		                    ]

	                	],
		                'filter' =>  [
			                'destination_ngram' => [
			                    'type' => 'edge_ngram',
			                    'min_gram' => 2,
                        		'max_gram' => 5,
			                ]
			            ]
		            ]
		        ],
		       
		    ]
		];

Destination_bulk_query:

for ($i = 0; $i <= $total_destinations; $i++) {
		    $params['body'][] = [
		        'index' => [
		            '_index' => 'destination_ngrams',
		            '_type' => 'city',		            
	            ]
		    ];

		    $params['body'][] = [
		        'destination_name_en' => $destinations[$i]->destination_name_en,
		        'destination_name_es' => $destinations[$i]->destination_name_es,
		        'country_code' => $destinations[$i]->country_code,
		        'destination_id' => $destinations[$i]->destination_id
		    ];
...

Hotel Index:

$params = [
		    'index' => 'destination_ngrams',
		    'body' => [
		        'settings' => [ 
		            'analysis' => [ 	
		            	'analyzer' => [
		                    'product_analyzer' => [
		                        'type'      => 'custom',
		                        'tokenizer' => 'whitespace',
		                        "tokenizer" => "standard",
		                        'filter'    => ['lowercase', 'destination_ngram'],
		                    ],
	                	],
		                'filter' =>  [
			                'destination_ngram' => [
			                    'type' => 'edge_ngram',
			                    'min_gram' => 2,
                        		'max_gram' => 15	,
			                ]
			            ]
		            ]
		        ],
		       
		    ]
		];

Search Query:

$params = [
		    'index' => ['hotels', 'destination_ngrams'],
		    'type' => ['hotel','city'],
		    "size" => 20	,
		    'body' => [
		        'query' => [
		            'multi_match' => [
		            	"type" => "best_fields",
		            	"query" => $text,
		            	"fields" => ["destination_name_*^3","name"],
		            	"fuzziness" => "AUTO"
		            ],
		        ]
		    ]
		]; 

Current result when i type Can:

        [city]Van 
	[city]Nan 
	[city]Can Tho
	[city]San Juan 
	[city]Caen 
	[city]San Francisco
	[city]FAirmont  San
	[Hotel]Fairmont san Francisco
	[Hotel]Fairmont san Francisco
	[Hotel]Francisco bay inn
	[Hotel]Freenwich Inn
	 and Many hotels...

I expect the follow result:

	[city]Cancun 
	[city]Van 
	[city]Nan 
	[city]Can Tho
	[city]San Juan 
	[city]Caen 
	[city]San Francisco
	[city]FAirmont  San
	[Hotel]Fairmont san Francisco
	[Hotel]Fairmont san Francisco
	[Hotel]Francisco bay inn
	[Hotel]Freenwich Inn
	 and Many hotels... 

I have a list o many popular cities [Cancun, new york, paris] and i Want that this cities appear first in my searchs.

Thanks for your time.

PD: Right now i don't have define any mapping.
PD2: I add the type city and hotel in my current/expected results just for be clear.
PD:3 if any consider if necesary the hotels bulk I can post it


(Byron Voorbach) #6

Hey @BlindCode

Sorry for my late reply, it's been quite busy for me last few days.
I don't think I have time to write the full lengthy post that I would like to write at the moment, so I'll give you some quick pointers for now:

  • You definitely need a mapping. You created an analyzer, but you never specified that either destination_name or name should use it. Check this
  • Fuzziness is actually taking care of your matching at the moment, which you probably want to turn off as soon as you added the analyzer. Fuzziness on small ngram tokens will generate a lot of weird results.

And some questions:

  • Is there a specific reason for having 2 indices?
  • Did you add fuzziness with a reason?

(Rodolfo Edu Lugo Garcia) #7

Hey @byronvoorbach thanks for you advices,

This is my first time with Elastic and any comment or suggestion y very useful for me, so rigth now dont underestand very well the mapping, but i'll follow you advices and read more about analyzer and mapping.

About the questions.

1.-I was create two indices because my plan is create one index for Destinations with 3 "types" city, city zones and airports. The other index is just for hotels name.

Do you think is better create just 1 index Search_Index and create many types like City, City zone, airpots and hotels? I can change it (maybe i dont underestand very well the index concept).

2.-About my reason to use Fuzziness, I was check some tutorials about "simple autocompletes" and see the idea of use fuzziness. Now i can see that maybe is not the best option.

Any other suggestion or link with examples is well receipt.
Thanks a lot.


(Byron Voorbach) #8

Hi @BlindCode

There are 2 steps required when improving the way 'search out of the box' works.

1: Configure an analyzer in your index settings
2: Appoint your analyzer to a specific field(s) in your mapping

For example:

PUT index
{
  "settings": {
    "number_of_shards": 1, 
    "number_of_replicas": 0, 
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "my_custom_analyzer"
        }
      }
    }
  }
}

Try this out with your own fields + analyzers and query again. You'll see some different results :smiley: .

Note that index types have been deprecated since ES6, with a complete removal in ES7. Not sure what version you're running on, but it's good practice to act as if they don't exist anymore :wink:

Depending on your data (and how you would like it analyzed), you might be able to keep all data in 1 index and just a separate field in each document to denote the document type.

Fuzzy is in most cases too fuzzy. I don't have a direct link to any resources, but there are multiple (maybe better) ways to create a good autocomplete.

Hope this helps!


(Rodolfo Edu Lugo Garcia) #9

hello @byronvoorbach, sorry for my late response, but i need to try many things and read more about elasticsearch before to response this message.

I followed many of your recommendations and now i have an autocomplete with better results.

These are my mappings:

Hotel Mapping:

$params = [
	    'index' => 'hotels',
	    'body' => [
	        'settings' => [ 
	            'analysis' => [ 	
	                'filter' =>  [
		                'ngram_filter' => [
		                    'type' => 'edge_ngram',
		                    'min_gram' => 2,
                    		'max_gram' => 20,
		                ]
		            ],
		            'analyzer' => [
	                    'ngram_analyzer' => [
	                        'type'      => 'custom',
	                        "tokenizer" => "standard",
	                        "tokenizer" => "standard",
	                        'filter'    => ['lowercase', 'ngram_filter'],
	                        'stopwords' => ['the', "and", "&", "hotel", "all inclusive", "resort"]
	                    ]

                	]
	            ],   
	        ],
	        'mappings' =>[
            	'doc' => [
			        "properties"=> [
			            "name"=> [
			               "type"=> "text",
			               "term_vector"=> "yes",
			               "analyzer"=> "ngram_analyzer",
			               "search_analyzer"=> "standard",
			            ],
			            "popularity"=> [
			               "type"=> "integer",
			            ]
			            
			        ]
            	]
	        ] 
		]
	];

Destination_mapping:

$params = [
	    'index' => 'destinations',
	    'body' => [
	        'settings' => [ 
	            'analysis' => [ 	
	                'filter' =>  [
		                'ngram_filter' => [
		                    'type' => 'edge_ngram',
		                    'min_gram' => 2,
                    		'max_gram' => 20,
		                ]
		            ],
		            'analyzer' => [
	                    'ngram_analyzer' => [
	                        'type'      => 'custom',
	                        "tokenizer" => "standard",
	                        'filter'    => ['lowercase', 'ngram_filter'],
	                    ]

                	]
	            ],   
	        ],
	        'mappings' =>[
            	'doc' => [
			        "properties"=> [
			            "destination_name_en"=> [
			               "type"=> "text",
			               "term_vector"=> "yes",
			               "analyzer"=> "ngram_analyzer",
			               "search_analyzer"=> "standard",
			            ],
			            "destination_name_es"=> [
			               "type"=> "text",
			               "term_vector"=> "yes",
			               "analyzer"=> "ngram_analyzer",
			               "search_analyzer"=> "standard",
			            ],
			            "destination_name_pt"=> [
			               "type"=> "text",
			               "term_vector"=> "yes",
			               "analyzer"=> "ngram_analyzer",
			               "search_analyzer"=> "standard",
			            ],
			            "popularity"=> [
			               "type"=> "integer",
			            ]
			        ]
            	]
	        ] 
		]
	];

And this is my search_query:

$params = array(
	    'index' => ['destinations','hotels'],
	    "size" => 20,
	    'body' => array(
	    	'sort' => array(
	    		array('popularity' => array('order' => "desc")),
	    		"_score"
	    	),
	        'query' => array(
        		"bool" => array(
        			"should" => array(
        				array(
        					"multi_match" => array(
	        				"type" => "best_fields",
				            	"query" => $text,
				            	"fields" => ["destination_name_*^3","name"],
	        					"fuzziness"=>1
	        					)
	        				),
        				array(
        					"term" => array(
	        					"destination_name_*" => array(
	        						"value" => $text,
	        						"boost" => 10
	        					)
        					)
        				),

        			),
        		)
		    )
	    )
	);

The idea of my autocomplete is the follow:
1.-Show first the hotels/destinations with a higest popularity value
2.-Show second the exact text match (show first the destinations and then the hotels)
3.-Allow to the users to make some mistakes, por example: Cancn, vancun, tancun => Cancun (for this reason I use fuzziniess)
4.-The search can be made with many languajes, for that razon I use the "destination_name_*
"

I was reading about the obsolete "type" in ES6.x and decided not to use the type, however keep the 2 indexes because I think they are very different objects.

Maybe i dont use the best scheme to obtain the results that i want. I you have any comment or suggestion in general I'll happy to hear it.

Thanks for your time.