Need mapping (or query) tips


(Loïc Wenkin) #1

Hello everybody,

I am working on a multilingual project and I need some tips about mapping
(or queries). I have two needs :

  • I would like to benefit of the full text power and specific language
    analyzers power on a field of my documents. I need that for some of my
    searches. Users can do some search like (Hello toto) that will return
    documents containing (Hello, my name is toto) (1).
  • On the other hand, users have also the ability to do some search like
    ("Hello toto") that won't return documents containing ("Hello, my name is
    toto"), but they will only return documents containing ("Hello toto") (a
    little bit as Google do when we use quotes) (2).

Currently, in my mapping, I have something like that:

{
"myfieldinfrench": {
...
"analyzer": "french"
}
}

This allow me to easily meet my first need, but not my second one.

I was thinking to index twice the field (using multi field types) but is it
a good idea? Won't it increase my index size?

Is there another way than using multi fields to meet my needs?

Thanks a lot for your replies.

Regards,
Loïc Wenkin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/70b46c78-5e03-4ff9-8cd6-ba2217642fa2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Reelsen) #2

Hey,

you might want to use the phrase match query and check out its slop
parameter for your second requirement. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase

--Alex

On Wed, Jul 16, 2014 at 10:49 AM, Loïc Wenkin loic.wenkin@gmail.com wrote:

Hello everybody,

I am working on a multilingual project and I need some tips about mapping
(or queries). I have two needs :

  • I would like to benefit of the full text power and specific language
    analyzers power on a field of my documents. I need that for some of my
    searches. Users can do some search like (Hello toto) that will return
    documents containing (Hello, my name is toto) (1).
  • On the other hand, users have also the ability to do some search like
    ("Hello toto") that won't return documents containing ("Hello, my name is
    toto"), but they will only return documents containing ("Hello toto") (a
    little bit as Google do when we use quotes) (2).

Currently, in my mapping, I have something like that:

{
"myfieldinfrench": {
...
"analyzer": "french"
}
}

This allow me to easily meet my first need, but not my second one.

I was thinking to index twice the field (using multi field types) but is
it a good idea? Won't it increase my index size?

Is there another way than using multi fields to meet my needs?

Thanks a lot for your replies.

Regards,
Loïc Wenkin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/70b46c78-5e03-4ff9-8cd6-ba2217642fa2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/70b46c78-5e03-4ff9-8cd6-ba2217642fa2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_BgBOWrJ5vQT4R-%3DY6XohyyAdUQi7T%3Dp_d%2BFUxX8KS%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Loïc Wenkin) #3

Hi Alexander,

Thanks a lot for your reply! Can you explain a little bit what is this
"slop" parameter?

Loïc

Le lundi 28 juillet 2014 14:14:56 UTC+2, Alexander Reelsen a écrit :

Hey,

you might want to use the phrase match query and check out its slop
parameter for your second requirement. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase

--Alex

On Wed, Jul 16, 2014 at 10:49 AM, Loïc Wenkin <loic....@gmail.com
<javascript:>> wrote:

Hello everybody,

I am working on a multilingual project and I need some tips about mapping
(or queries). I have two needs :

  • I would like to benefit of the full text power and specific language
    analyzers power on a field of my documents. I need that for some of my
    searches. Users can do some search like (Hello toto) that will return
    documents containing (Hello, my name is toto) (1).
  • On the other hand, users have also the ability to do some search like
    ("Hello toto") that won't return documents containing ("Hello, my name is
    toto"), but they will only return documents containing ("Hello toto") (a
    little bit as Google do when we use quotes) (2).

Currently, in my mapping, I have something like that:

{
"myfieldinfrench": {
...
"analyzer": "french"
}
}

This allow me to easily meet my first need, but not my second one.

I was thinking to index twice the field (using multi field types) but is
it a good idea? Won't it increase my index size?

Is there another way than using multi fields to meet my needs?

Thanks a lot for your replies.

Regards,
Loïc Wenkin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/70b46c78-5e03-4ff9-8cd6-ba2217642fa2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/70b46c78-5e03-4ff9-8cd6-ba2217642fa2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fc1edcdd-f833-426d-92cf-eb7c1026dc45%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Loïc Wenkin) #4

I answer myself to my question: slop is the max number of "permutation" we
have to do in the results to get a match.
(See http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/slop.html)

Le lundi 28 juillet 2014 14:41:39 UTC+2, Loïc Wenkin a écrit :

Hi Alexander,

Thanks a lot for your reply! Can you explain a little bit what is this
"slop" parameter?

Loïc

Le lundi 28 juillet 2014 14:14:56 UTC+2, Alexander Reelsen a écrit :

Hey,

you might want to use the phrase match query and check out its slop
parameter for your second requirement. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase

--Alex

On Wed, Jul 16, 2014 at 10:49 AM, Loïc Wenkin loic....@gmail.com wrote:

Hello everybody,

I am working on a multilingual project and I need some tips about
mapping (or queries). I have two needs :

  • I would like to benefit of the full text power and specific language
    analyzers power on a field of my documents. I need that for some of my
    searches. Users can do some search like (Hello toto) that will return
    documents containing (Hello, my name is toto) (1).
  • On the other hand, users have also the ability to do some search like
    ("Hello toto") that won't return documents containing ("Hello, my name is
    toto"), but they will only return documents containing ("Hello toto") (a
    little bit as Google do when we use quotes) (2).

Currently, in my mapping, I have something like that:

{
"myfieldinfrench": {
...
"analyzer": "french"
}
}

This allow me to easily meet my first need, but not my second one.

I was thinking to index twice the field (using multi field types) but is
it a good idea? Won't it increase my index size?

Is there another way than using multi fields to meet my needs?

Thanks a lot for your replies.

Regards,
Loïc Wenkin

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/70b46c78-5e03-4ff9-8cd6-ba2217642fa2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/70b46c78-5e03-4ff9-8cd6-ba2217642fa2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/70aee8fa-a2ae-45e0-bc45-02ad505e6e73%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5