Ngram not working for multivalued field

Hi,

Couple of questions:

  1. We are trying to create an index having an analyzed multi value field
    (filter used is n-gram). But we are not able to query the partial values.
    But when we have single valued field for same filter everything is working
    as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

  "str_search_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase"] 
    }, 
    "str_index_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase", "substring"] 
    } 
  }, 

  "filter" : { 
    "substring" : { 
      "type" : "nGram", 
      "min_gram" : 2, 
      "max_gram"  : 5 
    } 
  } 
} 

}
}’;

Sample values inserted curl -X POST "http://localhost:9200/xxx-test/test"
-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" : ["a1b2c","c1de"]
}’

Search query used and got some results as expected(this is entire string
match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match) curl
"http://localhost:9200/xxx-test/_search" -d '{ "query": { "match":
{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

  1. We have a query with collection of dynamic terms eg: title:test AND
    title:west AND desc:world AND desc:hello, now our objective is to avoid
    terms in the query having document frequency > 10 within the specific
    field. I.,e if title:west has df as 11 and desc:world has df 20, elastic
    search should be internally changing the query to title:west AND
    desc:hello, let us know if this can be done in effective way, as our search
    queries are very high!

  2. We are using ngram for prefix,suffix and fuzzy queries are there any
    effective ways to store the index for the same?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/034fca16-9fb0-4830-8fec-9184a42ba866%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

any one faced same issue ?

On Wednesday, 18 February 2015 01:39:26 UTC+5:30, sri krishna wrote:

Hi,

Couple of questions:

  1. We are trying to create an index having an analyzed multi value field
    (filter used is n-gram). But we are not able to query the partial values.
    But when we have single valued field for same filter everything is working
    as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

  "str_search_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase"] 
    }, 
    "str_index_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase", "substring"] 
    } 
  }, 

  "filter" : { 
    "substring" : { 
      "type" : "nGram", 
      "min_gram" : 2, 
      "max_gram"  : 5 
    } 
  } 
} 

}
}’;

Sample values inserted curl -X POST "http://localhost:9200/xxx-test/test"
-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" :
["a1b2c","c1de"] }’

Search query used and got some results as expected(this is entire string
match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match)
curl "http://localhost:9200/xxx-test/_search" -d '{ "query": { "match":
{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

  1. We have a query with collection of dynamic terms eg: title:test AND
    title:west AND desc:world AND desc:hello, now our objective is to avoid
    terms in the query having document frequency > 10 within the specific
    field. I.,e if title:west has df as 11 and desc:world has df 20, elastic
    search should be internally changing the query to title:west AND
    desc:hello, let us know if this can be done in effective way, as our search
    queries are very high!

  2. We are using ngram for prefix,suffix and fuzzy queries are there any
    effective ways to store the index for the same?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26c492fe-f056-4b92-9827-e6df62c1fa5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

Check your mapping. url_domain is in object lists while documents/queries use plain url_domain. So, standard analyser is used for the field.

Masaru

On February 18, 2015 at 12:30:23, sri krishna (krishnainet@gmail.com) wrote:

any one faced same issue ?

On Wednesday, 18 February 2015 01:39:26 UTC+5:30, sri krishna wrote:

Hi,

Couple of questions:

  1. We are trying to create an index having an analyzed multi value field
    (filter used is n-gram). But we are not able to query the partial values.
    But when we have single valued field for same filter everything is working
    as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

"str_search_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase"]
},
"str_index_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase", "substring"]
}
},

"filter" : {
"substring" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 5
}
}
}
}
}’;

Sample values inserted curl -X POST "http://localhost:9200/xxx-test/test"
-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" :
["a1b2c","c1de"] }’

Search query used and got some results as expected(this is entire string
match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match)
curl "http://localhost:9200/xxx-test/_search" -d '{ "query": { "match":
{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

  1. We have a query with collection of dynamic terms eg: title:test AND
    title:west AND desc:world AND desc:hello, now our objective is to avoid
    terms in the query having document frequency > 10 within the specific
    field. I.,e if title:west has df as 11 and desc:world has df 20, elastic
    search should be internally changing the query to title:west AND
    desc:hello, let us know if this can be done in effective way, as our search
    queries are very high!

  2. We are using ngram for prefix,suffix and fuzzy queries are there any
    effective ways to store the index for the same?

--
You received this message because you are subscribed to the Google Groups "elasticsearch"
group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26c492fe-f056-4b92-9827-e6df62c1fa5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54e41fc6.66334873.10ad%40citra.local.
For more options, visit https://groups.google.com/d/optout.

We have added lists seeing that ngram not working for multivalued, with/without that as well it was not working as expected.

"mappings" : {
"test" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

  "str_search_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase"] 
    }, 
    "str_index_analyzer" : { 
      "tokenizer" : "keyword", 
      "filter" : ["lowercase", "substring"] 
    } 
  }, 

  "filter" : { 
    "substring" : { 
      "type" : "nGram", 
      "min_gram" : 2, 
      "max_gram"  : 5 
    } 
  } 
} 

}
}’;

On Wednesday, 18 February 2015 10:45:01 UTC+5:30, Masaru Hasegawa wrote:

Hi,

Check your mapping. url_domain is in object lists while documents/queries
use plain url_domain. So, standard analyser is used for the field.

Masaru

On February 18, 2015 at 12:30:23, sri krishna (krish...@gmail.com
<javascript:>) wrote:

any one faced same issue ?

On Wednesday, 18 February 2015 01:39:26 UTC+5:30, sri krishna wrote:

Hi,

Couple of questions:

  1. We are trying to create an index having an analyzed multi value
    field

(filter used is n-gram). But we are not able to query the partial
values.

But when we have single valued field for same filter everything is
working

as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

"str_search_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase"]
},
"str_index_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase", "substring"]
}
},

"filter" : {
"substring" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 5
}
}
}
}
}’;

Sample values inserted curl -X POST "
http://localhost:9200/xxx-test/test"

-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" :
["a1b2c","c1de"] }’

Search query used and got some results as expected(this is entire
string

match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match)
curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match":

{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

  1. We have a query with collection of dynamic terms eg: title:test AND
    title:west AND desc:world AND desc:hello, now our objective is to
    avoid

terms in the query having document frequency > 10 within the specific
field. I.,e if title:west has df as 11 and desc:world has df 20,
elastic

search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our
search

queries are very high!

  1. We are using ngram for prefix,suffix and fuzzy queries are there
    any

effective ways to store the index for the same?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch"
group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/26c492fe-f056-4b92-9827-e6df62c1fa5c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f2466b6-1915-4f1b-8f52-a81eb8f24a1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Forgot to point out. In the first email, you created xxxx-test index while indexing/querying against xxx-test index.
Make sure you use correct index name.

Ngram works regardless of single or multi valued field.

On February 18, 2015 at 14:38:54, sri krishna (krishnainet@gmail.com) wrote:

We have added lists seeing that ngram not working for multivalued, with/without that
as well it was not working as expected.

"mappings" : {
"test" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

"str_search_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase"]
},
"str_index_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase", "substring"]
}
},

"filter" : {
"substring" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 5
}
}
}
}
}’;

On Wednesday, 18 February 2015 10:45:01 UTC+5:30, Masaru Hasegawa wrote:

Hi,

Check your mapping. url_domain is in object lists while documents/queries
use plain url_domain. So, standard analyser is used for the field.

Masaru

On February 18, 2015 at 12:30:23, sri krishna (krish...@gmail.com
) wrote:

any one faced same issue ?

On Wednesday, 18 February 2015 01:39:26 UTC+5:30, sri krishna wrote:

Hi,

Couple of questions:

  1. We are trying to create an index having an analyzed multi value
    field

(filter used is n-gram). But we are not able to query the partial
values.

But when we have single valued field for same filter everything is
working

as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

"str_search_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase"]
},
"str_index_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase", "substring"]
}
},

"filter" : {
"substring" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 5
}
}
}
}
}’;

Sample values inserted curl -X POST "
http://localhost:9200/xxx-test/test"

-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" :
["a1b2c","c1de"] }’

Search query used and got some results as expected(this is entire
string

match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match)
curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match":

{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

  1. We have a query with collection of dynamic terms eg: title:test AND
    title:west AND desc:world AND desc:hello, now our objective is to
    avoid

terms in the query having document frequency > 10 within the specific
field. I.,e if title:west has df as 11 and desc:world has df 20,
elastic

search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our
search

queries are very high!

  1. We are using ngram for prefix,suffix and fuzzy queries are there
    any

effective ways to store the index for the same?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch"
group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com .
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/26c492fe-f056-4b92-9827-e6df62c1fa5c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch"
group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f2466b6-1915-4f1b-8f52-a81eb8f24a1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54e42b23.74b0dc51.10ad%40citra.local.
For more options, visit https://groups.google.com/d/optout.

We have a live query terms the query will be for eg: title:test AND
title:west AND desc:world AND desc:hello, now our objective is to avoid
terms in the query having document frequency > 10 within the specific
field. i.,e if title:west has df as 11 and desc:world has df 20, elastic
search should be internally changing the query to title:west AND
desc:hello, let us know if this can be done in effective way, as our search
queries are very high in number!

On Wed, Feb 18, 2015 at 1:39 AM, sri krishna krishnainet@gmail.com wrote:

Hi,

Couple of questions:

  1. We are trying to create an index having an analyzed multi value field
    (filter used is n-gram). But we are not able to query the partial values.
    But when we have single valued field for same filter everything is working
    as expected, i.e able to retrieve partial query results as well.

Create index:
curl -X PUT "http://localhost:9200/xxxx-test" -d '{
"mappings" : {
"test" : {
"properties" : {
"lists" : {
"properties" : {
"url_domain" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
}
}
},

"settings" : {
"analysis" : {
"analyzer" : {

  "str_search_analyzer" : {
      "tokenizer" : "keyword",
      "filter" : ["lowercase"]
    },
    "str_index_analyzer" : {
      "tokenizer" : "keyword",
      "filter" : ["lowercase", "substring"]
    }
  },

  "filter" : {
    "substring" : {
      "type" : "nGram",
      "min_gram" : 2,
      "max_gram"  : 5
    }
  }
}

}
}’;

Sample values inserted curl -X POST "http://localhost:9200/xxx-test/test"
-d '{ "url_domain" : "slkd" }' curl -X POST "
http://localhost:9200/xxx-test/test" -d '{ "url_domain" :
["a1b2c","c1de"] }’

Search query used and got some results as expected(this is entire string
match) curl "http://localhost:9200/xxx-test/_search" -d '{ "query": {
"match": {"url_domain": “a1b2c"} } }’

Search query used but didn’t give any results(this is a partial match)
curl "http://localhost:9200/xxx-test/_search" -d '{ "query": { "match":
{"url_domain": "1b2"} } }’ As the field is n gram analysed, we are
expecting a result for this query. Let us know if our understanding is
wrong?

  1. We have a query with collection of dynamic terms eg: title:test AND
    title:west AND desc:world AND desc:hello, now our objective is to avoid
    terms in the query having document frequency > 10 within the specific
    field. I.,e if title:west has df as 11 and desc:world has df 20, elastic
    search should be internally changing the query to title:west AND
    desc:hello, let us know if this can be done in effective way, as our search
    queries are very high!

  2. We are using ngram for prefix,suffix and fuzzy queries are there any
    effective ways to store the index for the same?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/gTXGdXAXi_Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/034fca16-9fb0-4830-8fec-9184a42ba866%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/034fca16-9fb0-4830-8fec-9184a42ba866%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHZb4M6XV5NS5-SfL39xsbRWjffmAETbM1S%3D370DjuAdxr1uwg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.