Ngram not working for multivalued field

krishna · February 17, 2015, 8:09pm

Hi,

Couple of questions:

We are trying to create an index having an analyzed multi value field
(filter used is n-gram). But we are not able to query the partial values.
But when we have single valued field for same filter everything is working
as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

  "str_search_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase"] 
    }, 
    "str_index_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase", "substring"] 
    } 
  }, 

  "filter" : { 
    "substring" : { 
      "type" : "nGram", 
      "min_gram" : 2, 
      "max_gram"  : 5 
    } 
  } 
}

}
}’;

Sample values inserted curl -X POST "http://localhost:9200/xxx-test/test"
-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" : ["a1b2c","c1de"]
}’

Search query used and got some results as expected(this is entire string
match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match) curl
"http://localhost:9200/xxx-test/_search" -d '{ "query": { "match":
{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

We have a query with collection of dynamic terms eg: title:test AND
title:west AND desc:world AND desc:hello, now our objective is to avoid
terms in the query having document frequency > 10 within the specific
field. I.,e if title:west has df as 11 and desc:world has df 20, elastic
search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our search
queries are very high!
We are using ngram for prefix,suffix and fuzzy queries are there any
effective ways to store the index for the same?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/034fca16-9fb0-4830-8fec-9184a42ba866%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

krishna · February 18, 2015, 3:30am

any one faced same issue ?

On Wednesday, 18 February 2015 01:39:26 UTC+5:30, sri krishna wrote:

Hi,

Couple of questions:

We are trying to create an index having an analyzed multi value field
(filter used is n-gram). But we are not able to query the partial values.
But when we have single valued field for same filter everything is working
as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {
  "str_search_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase"] 
    }, 
    "str_index_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase", "substring"] 
    } 
  }, 

  "filter" : { 
    "substring" : { 
      "type" : "nGram", 
      "min_gram" : 2, 
      "max_gram"  : 5 
    } 
  } 
} 
}
}’;

Sample values inserted curl -X POST "http://localhost:9200/xxx-test/test"
-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" :
["a1b2c","c1de"] }’

Search query used and got some results as expected(this is entire string
match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match)
curl "http://localhost:9200/xxx-test/_search" -d '{ "query": { "match":
{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

We have a query with collection of dynamic terms eg: title:test AND
title:west AND desc:world AND desc:hello, now our objective is to avoid
terms in the query having document frequency > 10 within the specific
field. I.,e if title:west has df as 11 and desc:world has df 20, elastic
search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our search
queries are very high!

We are using ngram for prefix,suffix and fuzzy queries are there any
effective ways to store the index for the same?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26c492fe-f056-4b92-9827-e6df62c1fa5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

masaru · February 18, 2015, 5:13am

Hi,

Check your mapping. url_domain is in object lists while documents/queries use plain url_domain. So, standard analyser is used for the field.

Masaru

On February 18, 2015 at 12:30:23, sri krishna (krishnainet@gmail.com) wrote:

any one faced same issue ?

On Wednesday, 18 February 2015 01:39:26 UTC+5:30, sri krishna wrote:

Hi,

Couple of questions:

We are trying to create an index having an analyzed multi value field
(filter used is n-gram). But we are not able to query the partial values.
But when we have single valued field for same filter everything is working
as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

"str_search_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase"]
},
"str_index_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase", "substring"]
}
},

"filter" : {
"substring" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 5
}
}
}
}
}’;

Sample values inserted curl -X POST "http://localhost:9200/xxx-test/test"
-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" :
["a1b2c","c1de"] }’

Search query used and got some results as expected(this is entire string
match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match)
curl "http://localhost:9200/xxx-test/_search" -d '{ "query": { "match":
{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

We have a query with collection of dynamic terms eg: title:test AND
title:west AND desc:world AND desc:hello, now our objective is to avoid
terms in the query having document frequency > 10 within the specific
field. I.,e if title:west has df as 11 and desc:world has df 20, elastic
search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our search
queries are very high!

We are using ngram for prefix,suffix and fuzzy queries are there any
effective ways to store the index for the same?

--
You received this message because you are subscribed to the Google Groups "elasticsearch"
group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26c492fe-f056-4b92-9827-e6df62c1fa5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54e41fc6.66334873.10ad%40citra.local.
For more options, visit https://groups.google.com/d/optout.

krishna · February 18, 2015, 5:38am

We have added lists seeing that ngram not working for multivalued, with/without that as well it was not working as expected.

"mappings" : {
"test" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

  "str_search_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase"] 
    }, 
    "str_index_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase", "substring"] 
    } 
  }, 

  "filter" : { 
    "substring" : { 
      "type" : "nGram", 
      "min_gram" : 2, 
      "max_gram"  : 5 
    } 
  } 
}

}
}’;

On Wednesday, 18 February 2015 10:45:01 UTC+5:30, Masaru Hasegawa wrote:

Hi,

Check your mapping. url_domain is in object lists while documents/queries
use plain url_domain. So, standard analyser is used for the field.

Masaru

On February 18, 2015 at 12:30:23, sri krishna (krish...@gmail.com
<javascript:>) wrote:

any one faced same issue ?

On Wednesday, 18 February 2015 01:39:26 UTC+5:30, sri krishna wrote:

Hi,

Couple of questions:

We are trying to create an index having an analyzed multi value
field
(filter used is n-gram). But we are not able to query the partial
values.
But when we have single valued field for same filter everything is
working
as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

"str_search_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase"]
},
"str_index_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase", "substring"]
}
},

"filter" : {
"substring" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 5
}
}
}
}
}’;

Sample values inserted curl -X POST "
http://localhost:9200/xxx-test/test"
-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" :
["a1b2c","c1de"] }’

Search query used and got some results as expected(this is entire
string
match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match)
curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match":
{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

We have a query with collection of dynamic terms eg: title:test AND
title:west AND desc:world AND desc:hello, now our objective is to
avoid
terms in the query having document frequency > 10 within the specific
field. I.,e if title:west has df as 11 and desc:world has df 20,
elastic
search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our
search
queries are very high!

We are using ngram for prefix,suffix and fuzzy queries are there
any
effective ways to store the index for the same?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch"
group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/26c492fe-f056-4b92-9827-e6df62c1fa5c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f2466b6-1915-4f1b-8f52-a81eb8f24a1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

masaru · February 18, 2015, 6:02am

Forgot to point out. In the first email, you created xxxx-test index while indexing/querying against xxx-test index.
Make sure you use correct index name.

Ngram works regardless of single or multi valued field.

On February 18, 2015 at 14:38:54, sri krishna (krishnainet@gmail.com) wrote:

We have added lists seeing that ngram not working for multivalued, with/without that
as well it was not working as expected.

"mappings" : {
"test" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

"str_search_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase"]
},
"str_index_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase", "substring"]
}
},

"filter" : {
"substring" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 5
}
}
}
}
}’;

On Wednesday, 18 February 2015 10:45:01 UTC+5:30, Masaru Hasegawa wrote:

Hi,

Check your mapping. url_domain is in object lists while documents/queries
use plain url_domain. So, standard analyser is used for the field.

Masaru

On February 18, 2015 at 12:30:23, sri krishna (krish...@gmail.com
) wrote:

any one faced same issue ?

On Wednesday, 18 February 2015 01:39:26 UTC+5:30, sri krishna wrote:

Hi,

Couple of questions:

We are trying to create an index having an analyzed multi value
field
(filter used is n-gram). But we are not able to query the partial
values.
But when we have single valued field for same filter everything is
working
as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

"str_search_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase"]
},
"str_index_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase", "substring"]
}
},

"filter" : {
"substring" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 5
}
}
}
}
}’;

Sample values inserted curl -X POST "
http://localhost:9200/xxx-test/test"
-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" :
["a1b2c","c1de"] }’

Search query used and got some results as expected(this is entire
string
match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match)
curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match":
{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

We have a query with collection of dynamic terms eg: title:test AND
title:west AND desc:world AND desc:hello, now our objective is to
avoid
terms in the query having document frequency > 10 within the specific
field. I.,e if title:west has df as 11 and desc:world has df 20,
elastic
search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our
search
queries are very high!

We are using ngram for prefix,suffix and fuzzy queries are there
any
effective ways to store the index for the same?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch"
group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com .
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/26c492fe-f056-4b92-9827-e6df62c1fa5c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch"
group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f2466b6-1915-4f1b-8f52-a81eb8f24a1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54e42b23.74b0dc51.10ad%40citra.local.
For more options, visit https://groups.google.com/d/optout.

krishna · February 18, 2015, 9:00am

We have a live query terms the query will be for eg: title:test AND
title:west AND desc:world AND desc:hello, now our objective is to avoid
terms in the query having document frequency > 10 within the specific
field. i.,e if title:west has df as 11 and desc:world has df 20, elastic
search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our search
queries are very high in number!

On Wed, Feb 18, 2015 at 1:39 AM, sri krishna krishnainet@gmail.com wrote:

Hi,

Couple of questions:

We are trying to create an index having an analyzed multi value field
(filter used is n-gram). But we are not able to query the partial values.
But when we have single valued field for same filter everything is working
as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {
  "str_search_analyzer" : {
      "tokenizer" : "keyword",
      "filter" : ["lowercase"]
    },
    "str_index_analyzer" : {
      "tokenizer" : "keyword",
      "filter" : ["lowercase", "substring"]
    }
  },

  "filter" : {
    "substring" : {
      "type" : "nGram",
      "min_gram" : 2,
      "max_gram"  : 5
    }
  }
}
}
}’;

Sample values inserted curl -X POST "http://localhost:9200/xxx-test/test"
-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" :
["a1b2c","c1de"] }’

Search query used and got some results as expected(this is entire string
match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match)
curl "http://localhost:9200/xxx-test/_search" -d '{ "query": { "match":
{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

We have a query with collection of dynamic terms eg: title:test AND
title:west AND desc:world AND desc:hello, now our objective is to avoid
terms in the query having document frequency > 10 within the specific
field. I.,e if title:west has df as 11 and desc:world has df 20, elastic
search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our search
queries are very high!

We are using ngram for prefix,suffix and fuzzy queries are there any
effective ways to store the index for the same?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/gTXGdXAXi_Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/034fca16-9fb0-4830-8fec-9184a42ba866%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/034fca16-9fb0-4830-8fec-9184a42ba866%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHZb4M6XV5NS5-SfL39xsbRWjffmAETbM1S%3D370DjuAdxr1uwg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Ngram not workign for multivalued field Elasticsearch	1	323	July 6, 2017
Partial word search does not work with Ngram Analyzer! Elasticsearch	2	1390	October 11, 2017
Multi match query searching on fields not specified Elasticsearch	9	4033	July 5, 2017
Question about multi_field and edge ngram Elasticsearch	11	603	July 6, 2017
nGram filter and relevance score Elasticsearch	3	3636	July 6, 2017

Ngram not working for multivalued field

Related topics