I can't find anything after hypens or underscores

bnf_lsn · November 12, 2014, 1:15pm

Hi, I'm very newbie on ElasticSearch.
I'm try to indexing a set of biological data. There are some fields like
'gene_id' or 'gene_shortname' that should be processed as literal strings.
When I try to search for 'ZNF6092' in a field filled with 'linc-ZNF6092-6',
I can't find anything. When I search for 'linc' I find correct document
elsewhere.
It seems that this is a problem with ES analyzer, but I tried to set it for
do not analyze fields, but it seems that nothing changes.
I try with:

curl -XPOST 'localhost:9200/a3' -d @tracking_map.json

where tracking_map.json is

{
"mappings": {
"tracking": {
"properties": {
"tracking_id" : {
"type": "string",
"index":"not_analyzed"
},
"nearest_ref_id" : {
"type": "string",
"index":"not_analyzed"
},
"gene_id" : {
"type": "string",
"index":"not_analyzed"
},
"gene_short_name" : {
"type": "string",
"index":"not_analyzed"
}
}
}
}
}

And then re-indexing of all documents. I failed, but where?
Thanks in advance,

Alessandro

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce070db4-dee9-42e2-9f5a-ee8aa645e2f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · November 12, 2014, 2:25pm

On Wed, Nov 12, 2014 at 8:15 AM, Alessandro Bonfanti bnf.lsn@gmail.com
wrote:

Hi, I'm very newbie on Elasticsearch.
I'm try to indexing a set of biological data. There are some fields like
'gene_id' or 'gene_shortname' that should be processed as literal strings.
When I try to search for 'ZNF6092' in a field filled with
'linc-ZNF6092-6', I can't find anything. When I search for 'linc' I find
correct document elsewhere.
It seems that this is a problem with ES analyzer, but I tried to set it
for do not analyze fields, but it seems that nothing changes.
I try with:

curl -XPOST 'localhost:9200/a3' -d @tracking_map.json

where tracking_map.json is

{
"mappings": {
"tracking": {
"properties": {
"tracking_id" : {
"type": "string",
"index":"not_analyzed"
},
"nearest_ref_id" : {
"type": "string",
"index":"not_analyzed"
},
"gene_id" : {
"type": "string",
"index":"not_analyzed"
},
"gene_short_name" : {
"type": "string",
"index":"not_analyzed"
}
}
}
}
}

And then re-indexing of all documents. I failed, but where?
Thanks in advance,

Alessandro

Its an analyzer problem, certainly. You've turned off analyzers with
"index":"not_analazyed". What you probably want is for the gene_short_name
to be analyzed so that dashes are considered "word separators". If you do
that you can find linc-ZNF6092-6 by performing a simple_query_string (or
match) search for ZNF6092 or ZNF6092 6 or
6 or linc. Have a look at

and go from there. You may also want to use a lowercase filter so you can
search for znf6092 and still find it.

This is a good read on how to change the mapping as well:

even if you don't need all the information in there it is nice to know.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

bnf_lsn · November 12, 2014, 4:13pm

Il 12/11/2014 15:25, Nikolas Everett ha
scritto:

On Wed, Nov 12, 2014 at 8:15 AM,
Alessandro Bonfanti <bnf.lsn@gmail.com> wrote:

Hi, I'm very newbie on ElasticSearch.

            I'm try to indexing a set of biological data. There are
            some fields like 'gene_id' or 'gene_shortname' that
            should be processed as literal strings.


            When I try to search for 'ZNF6092' in a field filled
            with 'linc-ZNF6092-6', I can't find anything. When I
            search for 'linc' I find correct document elsewhere.


            It seems that this is a problem with ES analyzer, but I
            tried to set it for do not analyze fields, but it seems
            that nothing changes.


            I try with:

curl -XPOST 'localhost:9200/a3'-d @tracking_map.json

            where tracking_map.json is

{


                      "mappings":{


                        "tracking":{


                          "properties":{


                            "tracking_id":{


                              "type":"string",


                              "index":"not_analyzed"


                            },


                            "nearest_ref_id":{


                              "type":"string",


                              "index":"not_analyzed"


                            },


                            "gene_id":{


                              "type":"string",


                              "index":"not_analyzed"


                            },


                            "gene_short_name":{


                              "type":"string",


                              "index":"not_analyzed"


                            }


                          }


                        }


                      }

}

            And then re-indexing of all documents. I failed, but
            where?


            Thanks in advance,




            Alessandro

Its an analyzer problem, certainly. You've turned off
analyzers with "index":"not_analazyed". What you probably
want is for the gene_short_name to be analyzed so that
dashes are considered "word separators". If you do that
you can find linc-ZNF6092-6 by performing a
simple_query_string (or match) search for
<code>ZNF6092</code> or <code>ZNF6092
6</code> or <code>6</code> or
<code>linc</code>. Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also want to use a lowercase
filter so you can search for
<code>znf6092</code> and still find it.

This is a good read on how to change the mapping as
well:

http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

even if you don't need all the information in there it
is nice to know.

          Nik


  -- 


  You received this message because you are subscribed to a topic in
  the Google Groups "elasticsearch" group.


  To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


  To unsubscribe from this group and all its topics, send an email
  to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.


  To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&amp;utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.


  For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.

Very thanks for your answer,

What I want is that ES store fields as literals, so I should find
ZNF6092 with a wilcard search (*ZNF6092* for example).


I tried set "pattern" to "*" for testing (* isn't in gene_shortname,
so I suppose that entire string is stored. But anyway I still find
nothing.

nik9000 · November 12, 2014, 4:20pm

On Wed, Nov 12, 2014 at 11:13 AM, Alessandro Bonfanti bnf.lsn@gmail.com
wrote:

Il 12/11/2014 15:25, Nikolas Everett ha scritto:

On Wed, Nov 12, 2014 at 8:15 AM, Alessandro Bonfanti bnf.lsn@gmail.com
wrote:

Hi, I'm very newbie on Elasticsearch.
I'm try to indexing a set of biological data. There are some fields like
'gene_id' or 'gene_shortname' that should be processed as literal strings.
When I try to search for 'ZNF6092' in a field filled with
'linc-ZNF6092-6', I can't find anything. When I search for 'linc' I find
correct document elsewhere.
It seems that this is a problem with ES analyzer, but I tried to set it
for do not analyze fields, but it seems that nothing changes.
I try with:

curl -XPOST 'localhost:9200/a3' -d @tracking_map.json

where tracking_map.json is

{
"mappings": {
"tracking": {
"properties": {
"tracking_id" : {
"type": "string",
"index":"not_analyzed"
},
"nearest_ref_id" : {
"type": "string",
"index":"not_analyzed"
},
"gene_id" : {
"type": "string",
"index":"not_analyzed"
},
"gene_short_name" : {
"type": "string",
"index":"not_analyzed"
}
}
}
}
}

And then re-indexing of all documents. I failed, but where?
Thanks in advance,

Alessandro

Its an analyzer problem, certainly. You've turned off analyzers with
"index":"not_analazyed". What you probably want is for the gene_short_name
to be analyzed so that dashes are considered "word separators". If you do
that you can find linc-ZNF6092-6 by performing a simple_query_string (or
match) search for ZNF6092 or ZNF6092 6 or
6 or linc. Have a look at
Elasticsearch Platform — Find real-time answers at scale | Elastic
and go from there. You may also want to use a lowercase filter so you can
search for znf6092 and still find it.

This is a good read on how to change the mapping as well:
Elasticsearch Platform — Find real-time answers at scale | Elastic
even if you don't need all the information in there it is nice to know.

Nik

You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Very thanks for your answer,
What I want is that ES store fields as literals, so I should find ZNF6092
with a wilcard search (ZNF6092 for example).
I tried set "pattern" to "" for testing ( isn't in gene_shortname, so I
suppose that entire string is stored. But anyway I still find nothing.

You'd have to post your queries for me to help more but in general if best
to analyze the content up front and perform basic match queries without
wildcards than it is to search with wildcards. Wildcards are way way way
slower.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

bnf_lsn · November 12, 2014, 4:43pm

Il 12/11/2014 17:20, Nikolas Everett ha
scritto:

On Wed, Nov 12, 2014 at 11:13 AM,
Alessandro Bonfanti <bnf.lsn@gmail.com> wrote:

Il 12/11/2014 15:25, Nikolas Everett ha scritto:

On Wed, Nov 12, 2014
at 8:15 AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:

Hi, I'm very newbie on ElasticSearch.

                            I'm try to indexing a set of biological
                            data. There are some fields like
                            'gene_id' or 'gene_shortname' that
                            should be processed as literal strings.


                            When I try to search for 'ZNF6092' in a
                            field filled with 'linc-ZNF6092-6', I
                            can't find anything. When I search for
                            'linc' I find correct document
                            elsewhere.


                            It seems that this is a problem with ES
                            analyzer, but I tried to set it for do
                            not analyze fields, but it seems that
                            nothing changes.


                            I try with:

curl -XPOST 'localhost:9200/a3'-d @tracking_map.json

                            where tracking_map.json is

{


                                      "mappings":{


                                        "tracking":{


                                          "properties":{


                                            "tracking_id":{


                                              "type":"string",


                                              "index":"not_analyzed"


                                            },


                                            "nearest_ref_id":{


                                              "type":"string",


                                              "index":"not_analyzed"


                                            },


                                            "gene_id":{


                                              "type":"string",


                                              "index":"not_analyzed"


                                            },


                                            "gene_short_name":{


                                              "type":"string",


                                              "index":"not_analyzed"


                                            }


                                          }


                                        }


                                      }

}

                            And then re-indexing of all documents. I
                            failed, but where?


                            Thanks in advance,




                            Alessandro

Its an analyzer problem, certainly.
You've turned off analyzers with
"index":"not_analazyed". What you
probably want is for the gene_short_name
to be analyzed so that dashes are
considered "word separators". If you do
that you can find linc-ZNF6092-6 by
performing a simple_query_string (or
match) search for
<code>ZNF6092</code> or
<code>ZNF6092 6</code> or
<code>6</code> or
<code>linc</code>. Have a
look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also want to
use a lowercase filter so you can search
for <code>znf6092</code> and
still find it.

This is a good read on how to change
the mapping as well:

http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

even if you don't need all the
information in there it is nice to know.

                          Nik


              -- 


              You received this message because you are subscribed
              to a topic in the Google Groups "elasticsearch" group.


              To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


              To unsubscribe from this group and all its topics,
              send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.


              To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.


                For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.

Very thanks for your answer,

            What I want is that ES store fields as literals, so I
            should find ZNF6092 with a wilcard search (*ZNF6092* for
            example).


            I tried set "pattern" to "*" for testing (* isn't in
            gene_shortname, so I suppose that entire string is
            stored. But anyway I still find nothing.

You'd have to post your queries for me to help more but
in general if best to analyze the content up front and
perform basic match queries without wildcards than it is
to search with wildcards. Wildcards are way way way
slower.

          Nik 


  -- 


  You received this message because you are subscribed to a topic in
  the Google Groups "elasticsearch" group.


  To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


  To unsubscribe from this group and all its topics, send an email
  to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.


  To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&amp;utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.


  For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.

This is my query (in Ruby):

@client.search index: @index, body: {query: {wildcard: {_all: query_text}}}

Variables' name should be auto-explicative of its content.


I read that wildcards are slower, if you have a more clean solution
(I need anyway that I still can search for "linc-ZNF6092" in
addiction for "ZNF6092") it will be very welcome.

bnf_lsn · December 2, 2014, 8:21am

Il 12/11/2014 17:43, Alessandro
Bonfanti ha scritto:

Il 12/11/2014 17:20, Nikolas Everett ha scritto:

On Wed, Nov 12, 2014 at 11:13 AM,
Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:

Il 12/11/2014 15:25, Nikolas Everett ha scritto:

On Wed, Nov 12,
2014 at 8:15 AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:

Hi, I'm very newbie on ElasticSearch.

                              I'm try to indexing a set of
                              biological data. There are some fields
                              like 'gene_id' or 'gene_shortname'
                              that should be processed as literal
                              strings.


                              When I try to search for 'ZNF6092' in
                              a field filled with 'linc-ZNF6092-6',
                              I can't find anything. When I search
                              for 'linc' I find correct document
                              elsewhere.


                              It seems that this is a problem with
                              ES analyzer, but I tried to set it for
                              do not analyze fields, but it seems
                              that nothing changes.


                              I try with:

curl -XPOST 'localhost:9200/a3'-d @tracking_map.json

                              where tracking_map.json is

{


                                        "mappings":{


                                          "tracking":{


                                            "properties":{


                                              "tracking_id":{


                                                "type":"string",


                                                "index":"not_analyzed"


                                              },


                                              "nearest_ref_id":{


                                                "type":"string",


                                                "index":"not_analyzed"


                                              },


                                              "gene_id":{


                                                "type":"string",


                                                "index":"not_analyzed"


                                              },


                                              "gene_short_name":{


                                                "type":"string",


                                                "index":"not_analyzed"


                                              }


                                            }


                                          }


                                        }

}

                              And then re-indexing of all documents.
                              I failed, but where?


                              Thanks in advance,




                              Alessandro

Its an analyzer problem, certainly.
You've turned off analyzers with
"index":"not_analazyed". What you
probably want is for the gene_short_name
to be analyzed so that dashes are
considered "word separators". If you do
that you can find linc-ZNF6092-6 by
performing a simple_query_string (or
match) search for
<code>ZNF6092</code> or
<code>ZNF6092 6</code> or
<code>6</code> or
<code>linc</code>. Have a
look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also want to
use a lowercase filter so you can search
for <code>znf6092</code> and
still find it.

This is a good read on how to change
the mapping as well:

http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

even if you don't need all the
information in there it is nice to know.

                            Nik


                -- 


                You received this message because you are subscribed
                to a topic in the Google Groups "elasticsearch"
                group.


                To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


                To unsubscribe from this group and all its topics,
                send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.


                To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.


                  For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.

Very thanks for your answer,

              What I want is that ES store fields as literals, so I
              should find ZNF6092 with a wilcard search (*ZNF6092*
              for example).


              I tried set "pattern" to "*" for testing (* isn't in
              gene_shortname, so I suppose that entire string is
              stored. But anyway I still find nothing.

You'd have to post your queries for me to help more
but in general if best to analyze the content up front
and perform basic match queries without wildcards than
it is to search with wildcards. Wildcards are way way
way slower.

            Nik 


    -- 


    You received this message because you are subscribed to a topic
    in the Google Groups "elasticsearch" group.


    To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


    To unsubscribe from this group and all its topics, send an email
    to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.


    To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&amp;utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.


    For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.

  This is my query (in Ruby):

@client.search index: @index, body: {query: {wildcard: {_all: query_text}}}

  Variables' name should be auto-explicative of its content.


  I read that wildcards are slower, if you have a more clean
  solution (I need anyway that I still can search for "linc-ZNF6092"
  in addiction for "ZNF6092") it will be very welcome.

I have tried a lot of attempts, but the problem still resist. Maybe could it be caused by another setting than analyzer?

bnf_lsn · January 21, 2015, 10:43am

Il 02/12/2014 09:21, Alessandro
Bonfanti ha scritto:

Il 12/11/2014 17:43, Alessandro Bonfanti ha scritto:

Il 12/11/2014 17:20, Nikolas Everett ha scritto:

On Wed, Nov 12, 2014 at 11:13 AM,
Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:

Il 12/11/2014 15:25, Nikolas Everett ha scritto:

On Wed, Nov 12,
2014 at 8:15 AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:

Hi, I'm very newbie on ElasticSearch.

                                I'm try to indexing a set of
                                biological data. There are some
                                fields like 'gene_id' or
                                'gene_shortname' that should be
                                processed as literal strings.


                                When I try to search for 'ZNF6092'
                                in a field filled with
                                'linc-ZNF6092-6', I can't find
                                anything. When I search for 'linc' I
                                find correct document elsewhere.


                                It seems that this is a problem with
                                ES analyzer, but I tried to set it
                                for do not analyze fields, but it
                                seems that nothing changes.


                                I try with:

curl


                                      -XPOST

                                      'localhost:9200/a3'-d @tracking_map.json

                                where tracking_map.json is

{


                                          "mappings":{


                                            "tracking":{


                                              "properties":{


                                                "tracking_id":{


                                                  "type":"string",


                                                  "index":"not_analyzed"


                                                },


                                                "nearest_ref_id":{


                                                  "type":"string",


                                                  "index":"not_analyzed"


                                                },


                                                "gene_id":{


                                                  "type":"string",


                                                  "index":"not_analyzed"


                                                },


                                                "gene_short_name":{


                                                  "type":"string",


                                                  "index":"not_analyzed"


                                                }


                                              }


                                            }


                                          }

}

                                And then re-indexing of all
                                documents. I failed, but where?


                                Thanks in advance,




                                Alessandro

Its an analyzer problem,
certainly. You've turned off
analyzers with
"index":"not_analazyed". What you
probably want is for the
gene_short_name to be analyzed so that
dashes are considered "word
separators". If you do that you can
find linc-ZNF6092-6 by performing a
simple_query_string (or match) search
for <code>ZNF6092</code>
or <code>ZNF6092 6</code>
or <code>6</code> or
<code>linc</code>. Have
a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also want
to use a lowercase filter so you can
search for
<code>znf6092</code> and
still find it.

This is a good read on how to
change the mapping as well:

http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

even if you don't need all the
information in there it is nice to
know.

                              Nik


                  -- 


                  You received this message because you are
                  subscribed to a topic in the Google Groups
                  "elasticsearch" group.


                  To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


                  To unsubscribe from this group and all its topics,
                  send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.


                  To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.


                    For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.

Very thanks for your answer,

                What I want is that ES store fields as literals, so
                I should find ZNF6092 with a wilcard search
                (*ZNF6092* for example).


                I tried set "pattern" to "*" for testing (* isn't in
                gene_shortname, so I suppose that entire string is
                stored. But anyway I still find nothing.

You'd have to post your queries for me to help more
but in general if best to analyze the content up front
and perform basic match queries without wildcards than
it is to search with wildcards. Wildcards are way way
way slower.

              Nik 


      -- 


      You received this message because you are subscribed to a
      topic in the Google Groups "elasticsearch" group.


      To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


      To unsubscribe from this group and all its topics, send an
      email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.


      To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&amp;utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.


      For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.

    This is my query (in Ruby):

@client.search index: @index, body: {query: {wildcard: {_all: query_text}}}

    Variables' name should be auto-explicative of its content.


    I read that wildcards are slower, if you have a more clean
    solution (I need anyway that I still can search for
    "linc-ZNF6092" in addiction for "ZNF6092") it will be very
    welcome.

I have tried a lot of attempts, but the problem still resist. Maybe could it be caused by another setting than analyzer?

Definitely, I need a step-to-step method for disabling the analyzer
or set it to 'keyword' on all fields of an index. I tried a lot of
attempts but no-one seems to work.


This situation cause me much problems, I need that ES do not
tokenize my literal strings, why there isn't a clear method to
switch of it?


Thanks everyones.

bnf_lsn · January 26, 2015, 3:37pm

Il 21/01/2015 11:43, Alessandro
Bonfanti ha scritto:

Il 02/12/2014 09:21, Alessandro Bonfanti ha scritto:

Il 12/11/2014 17:43, Alessandro Bonfanti ha scritto:

Il 12/11/2014 17:20, Nikolas Everett ha scritto:

On Wed, Nov 12, 2014 at 11:13
AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:

Il 12/11/2014 15:25, Nikolas Everett ha scritto:

On Wed, Nov 12,
2014 at 8:15 AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:

Hi, I'm very newbie on ElasticSearch.

                                  I'm try to indexing a set of
                                  biological data. There are some
                                  fields like 'gene_id' or
                                  'gene_shortname' that should be
                                  processed as literal strings.


                                  When I try to search for 'ZNF6092'
                                  in a field filled with
                                  'linc-ZNF6092-6', I can't find
                                  anything. When I search for 'linc'
                                  I find correct document elsewhere.


                                  It seems that this is a problem
                                  with ES analyzer, but I tried to
                                  set it for do not analyze fields,
                                  but it seems that nothing changes.


                                  I try with:

curl


                                        -XPOST


                                        'localhost:9200/a3'-d
                                        @tracking_map.json

                                  where tracking_map.json is

{


                                            "mappings":{


                                              "tracking":{


                                                "properties":{


                                                  "tracking_id":{


                                                    "type":"string",


                                                    "index":"not_analyzed"


                                                  },


                                                  "nearest_ref_id":{


                                                    "type":"string",


                                                    "index":"not_analyzed"


                                                  },


                                                  "gene_id":{


                                                    "type":"string",


                                                    "index":"not_analyzed"


                                                  },


                                                  "gene_short_name":{


                                                    "type":"string",


                                                    "index":"not_analyzed"


                                                  }


                                                }


                                              }


                                            }

}

                                  And then re-indexing of all
                                  documents. I failed, but where?


                                  Thanks in advance,




                                  Alessandro

Its an analyzer problem,
certainly. You've turned off
analyzers with
"index":"not_analazyed". What you
probably want is for the
gene_short_name to be analyzed so
that dashes are considered "word
separators". If you do that you can
find linc-ZNF6092-6 by performing a
simple_query_string (or match)
search for
<code>ZNF6092</code> or
<code>ZNF6092 6</code>
or <code>6</code> or
<code>linc</code>.
Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also
want to use a lowercase filter so
you can search for
<code>znf6092</code> and
still find it.

This is a good read on how to
change the mapping as well:

http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

even if you don't need all the
information in there it is nice to
know.

                                Nik


                    -- 


                    You received this message because you are
                    subscribed to a topic in the Google Groups
                    "elasticsearch" group.


                    To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


                    To unsubscribe from this group and all its
                    topics, send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.


                    To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.


                      For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.

Very thanks for your answer,

                  What I want is that ES store fields as literals,
                  so I should find ZNF6092 with a wilcard search
                  (*ZNF6092* for example).


                  I tried set "pattern" to "*" for testing (* isn't
                  in gene_shortname, so I suppose that entire string
                  is stored. But anyway I still find nothing.

You'd have to post your queries for me to help
more but in general if best to analyze the content
up front and perform basic match queries without
wildcards than it is to search with wildcards.
Wildcards are way way way slower.

                Nik 


        -- 


        You received this message because you are subscribed to a
        topic in the Google Groups "elasticsearch" group.


        To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


        To unsubscribe from this group and all its topics, send an
        email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.


        To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&amp;utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.


        For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.

      This is my query (in Ruby):

@client.search index: @index, body: {query: {wildcard: {_all: query_text}}}

      Variables' name should be auto-explicative of its content.


      I read that wildcards are slower, if you have a more clean
      solution (I need anyway that I still can search for
      "linc-ZNF6092" in addiction for "ZNF6092") it will be very
      welcome.

I have tried a lot of attempts, but the problem still resist. Maybe could it be caused by another setting than analyzer?

  Definitely, I need a step-to-step method for disabling the
  analyzer or set it to 'keyword' on all fields of an index. I tried
  a lot of attempts but no-one seems to work.


  This situation cause me much problems, I need that ES do not
  tokenize my literal strings, why there isn't a clear method to
  switch of it?


  Thanks everyones.

OK, after a lot of attempts I can finally set analyezer to 'keyword'
for default. I do this with:

@es_client.indices.create index: "test", body: { "index" => { "analysis" => { "analyzer" => { "default" => { "type" => "keyword" }}}}}

Now I have solved some problems, I finally can do exact matching
stuff with 'term' query, for example on a path '/home/data/foo.bar'
or on a gene-id 'ENSG00000186092'.


The bad things are that problems with 'query_string' even worsen. It
seems that query_string can't work with not analyzed fields.


If I try a trivial:

@es_client.search index: "test", body: {"query" => { "query_string" => { "query" => "ENSG00000186092" }}}

Nothing works (0 results found). Text hasn't spaces or other special
characters that could create problems with tokenization. So what's
the problem?


Can a solution be the use of a 'fake' pattern tokenizer with pattern
"$^" (this should create a non-matchable pattern, with result alike
the 'keyword' analyzer)?


Any other idea will be very appreciated.

bnf_lsn · January 28, 2015, 9:58am

Il 26/01/2015 16:37, Alessandro
Bonfanti ha scritto:

Il 21/01/2015 11:43, Alessandro Bonfanti ha scritto:

Il 02/12/2014 09:21, Alessandro Bonfanti ha scritto:

Il 12/11/2014 17:43, Alessandro Bonfanti ha scritto:

Il 12/11/2014 17:20, Nikolas Everett ha scritto:

On Wed, Nov 12, 2014 at 11:13
AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:

Il 12/11/2014 15:25, Nikolas Everett ha scritto:

On Wed, Nov
12, 2014 at 8:15 AM, Alessandro
Bonfanti <bnf.lsn@gmail.com>
wrote:

Hi, I'm very newbie on ElasticSearch.

                                    I'm try to indexing a set of
                                    biological data. There are some
                                    fields like 'gene_id' or
                                    'gene_shortname' that should be
                                    processed as literal strings.


                                    When I try to search for
                                    'ZNF6092' in a field filled with
                                    'linc-ZNF6092-6', I can't find
                                    anything. When I search for
                                    'linc' I find correct document
                                    elsewhere.


                                    It seems that this is a problem
                                    with ES analyzer, but I tried to
                                    set it for do not analyze
                                    fields, but it seems that
                                    nothing changes.


                                    I try with:

curl


                                          -XPOST



                                          'localhost:9200/a3'-d
                                          @tracking_map.json

                                    where tracking_map.json is

{


                                              "mappings":{


                                                "tracking":{


                                                  "properties":{


                                                    "tracking_id":{


                                                      "type":"string",


                                                      "index":"not_analyzed"


                                                    },


                                                    "nearest_ref_id":{


                                                      "type":"string",


                                                      "index":"not_analyzed"


                                                    },


                                                    "gene_id":{


                                                      "type":"string",


                                                      "index":"not_analyzed"


                                                    },


                                                    "gene_short_name":{


                                                      "type":"string",


                                                      "index":"not_analyzed"


                                                    }


                                                  }


                                                }


                                              }

}

                                    And then re-indexing of all
                                    documents. I failed, but where?


                                    Thanks in advance,




                                    Alessandro

Its an analyzer problem,
certainly. You've turned off
analyzers with
"index":"not_analazyed". What you
probably want is for the
gene_short_name to be analyzed so
that dashes are considered "word
separators". If you do that you
can find linc-ZNF6092-6 by
performing a simple_query_string
(or match) search for
<code>ZNF6092</code>
or <code>ZNF6092
6</code> or
<code>6</code> or
<code>linc</code>.
Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also
want to use a lowercase filter so
you can search for
<code>znf6092</code>
and still find it.

This is a good read on how to
change the mapping as well:

http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

even if you don't need all the
information in there it is nice to
know.

                                  Nik


                      -- 


                      You received this message because you are
                      subscribed to a topic in the Google Groups
                      "elasticsearch" group.


                      To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


                      To unsubscribe from this group and all its
                      topics, send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.


                      To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.


                        For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.

Very thanks for your answer,

                    What I want is that ES store fields as literals,
                    so I should find ZNF6092 with a wilcard search
                    (*ZNF6092* for example).


                    I tried set "pattern" to "*" for testing (*
                    isn't in gene_shortname, so I suppose that
                    entire string is stored. But anyway I still find
                    nothing.

You'd have to post your queries for me to help
more but in general if best to analyze the content
up front and perform basic match queries without
wildcards than it is to search with wildcards.
Wildcards are way way way slower.

                  Nik 


          -- 


          You received this message because you are subscribed to a
          topic in the Google Groups "elasticsearch" group.


          To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


          To unsubscribe from this group and all its topics, send an
          email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.


          To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&amp;utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.


          For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.

        This is my query (in Ruby):

@client.search index: @index, body: {query: {wildcard: {_all: query_text}}}

        Variables' name should be auto-explicative of its content.


        I read that wildcards are slower, if you have a more clean
        solution (I need anyway that I still can search for
        "linc-ZNF6092" in addiction for "ZNF6092") it will be very
        welcome.

I have tried a lot of attempts, but the problem still resist. Maybe could it be caused by another setting than analyzer?

    Definitely, I need a step-to-step method for disabling the
    analyzer or set it to 'keyword' on all fields of an index. I
    tried a lot of attempts but no-one seems to work.


    This situation cause me much problems, I need that ES do not
    tokenize my literal strings, why there isn't a clear method to
    switch of it?


    Thanks everyones.

  OK, after a lot of attempts I can finally set analyezer to
  'keyword' for default. I do this with:

@es_client.indices.create index: "test", body: { "index" => { "analysis" => { "analyzer" => { "default" => { "type" => "keyword" }}}}}

  Now I have solved some problems, I finally can do exact matching
  stuff with 'term' query, for example on a path
  '/home/data/foo.bar' or on a gene-id 'ENSG00000186092'.


  The bad things are that problems with 'query_string' even worsen.
  It seems that query_string can't work with not analyzed fields.


  If I try a trivial:

@es_client.search index: "test", body: {"query" => { "query_string" => { "query" => "ENSG00000186092" }}}

  Nothing works (0 results found). Text hasn't spaces or other
  special characters that could create problems with tokenization.
  So what's the problem?


  Can a solution be the use of a 'fake' pattern tokenizer with
  pattern "$^" (this should create a non-matchable pattern, with
  result alike the 'keyword' analyzer)?


  Any other idea will be very appreciated.

Problems with search derived probably by the fact that query_string automatically make lovercased all words. It's behavior caused by 'lowercase' filter automatically inserted.

I can't find on the web any examples about setting of
analyzers/tokenizers/filters via ruby APIs. The only one that seems
to work well is the pulled over method for set the default analyzer
when a new index is created. Any suggestion?




I need valid method for set them in custom fields/searches etc.

bnf_lsn · January 30, 2015, 11:00am

Il 28/01/2015 10:58, Alessandro
Bonfanti ha scritto:

Il 26/01/2015 16:37, Alessandro Bonfanti ha scritto:

Il 21/01/2015 11:43, Alessandro Bonfanti ha scritto:

Il 02/12/2014 09:21, Alessandro Bonfanti ha scritto:

Il 12/11/2014 17:43, Alessandro Bonfanti ha scritto:

Il 12/11/2014 17:20, Nikolas Everett ha scritto:

On Wed, Nov 12, 2014 at
11:13 AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:

Il 12/11/2014 15:25, Nikolas Everett ha scritto:

On Wed, Nov
12, 2014 at 8:15 AM, Alessandro
Bonfanti <bnf.lsn@gmail.com>
wrote:

Hi, I'm very newbie on ElasticSearch.

                                      I'm try to indexing a set of
                                      biological data. There are
                                      some fields like 'gene_id' or
                                      'gene_shortname' that should
                                      be processed as literal
                                      strings.


                                      When I try to search for
                                      'ZNF6092' in a field filled
                                      with 'linc-ZNF6092-6', I can't
                                      find anything. When I search
                                      for 'linc' I find correct
                                      document elsewhere.


                                      It seems that this is a
                                      problem with ES analyzer, but
                                      I tried to set it for do not
                                      analyze fields, but it seems
                                      that nothing changes.


                                      I try with:

curl


                                            -XPOST 'localhost:9200/a3'-d @tracking_map.json

                                      where tracking_map.json is

{


                                                "mappings":{


                                                  "tracking":{


                                                    "properties":{


                                                      "tracking_id":{


                                                        "type":"string",


                                                        "index":"not_analyzed"


                                                      },


                                                      "nearest_ref_id":{


                                                        "type":"string",


                                                        "index":"not_analyzed"


                                                      },


                                                      "gene_id":{


                                                        "type":"string",


                                                        "index":"not_analyzed"


                                                      },


                                                      "gene_short_name":{


                                                        "type":"string",


                                                        "index":"not_analyzed"


                                                      }


                                                    }


                                                  }


                                                }

}

                                      And then re-indexing of all
                                      documents. I failed, but
                                      where?


                                      Thanks in advance,




                                      Alessandro

Its an analyzer problem,
certainly. You've turned off
analyzers with
"index":"not_analazyed". What
you probably want is for the
gene_short_name to be analyzed
so that dashes are considered
"word separators". If you do
that you can find linc-ZNF6092-6
by performing a
simple_query_string (or match)
search for
<code>ZNF6092</code>
or <code>ZNF6092
6</code> or
<code>6</code> or
<code>linc</code>.
Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also
want to use a lowercase filter
so you can search for
<code>znf6092</code>
and still find it.

This is a good read on how to
change the mapping as well:

http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

even if you don't need all
the information in there it is
nice to know.

                                    Nik


                        -- 


                        You received this message because you are
                        subscribed to a topic in the Google Groups
                        "elasticsearch" group.


                        To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


                        To unsubscribe from this group and all its
                        topics, send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.


                        To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.


                          For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.

Very thanks for your answer,

                      What I want is that ES store fields as
                      literals, so I should find ZNF6092 with a
                      wilcard search (*ZNF6092* for example).


                      I tried set "pattern" to "*" for testing (*
                      isn't in gene_shortname, so I suppose that
                      entire string is stored. But anyway I still
                      find nothing.

You'd have to post your queries for me to
help more but in general if best to analyze the
content up front and perform basic match queries
without wildcards than it is to search with
wildcards. Wildcards are way way way slower.

                    Nik 


            -- 


            You received this message because you are subscribed to
            a topic in the Google Groups "elasticsearch" group.


            To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.


            To unsubscribe from this group and all its topics, send
            an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.


            To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&amp;utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.


            For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.

          This is my query (in Ruby):

@client.search index: @index, body: {query: {wildcard: {_all: query_text}}}

          Variables' name should be auto-explicative of its content.


          I read that wildcards are slower, if you have a more clean
          solution (I need anyway that I still can search for
          "linc-ZNF6092" in addiction for "ZNF6092") it will be very
          welcome.

I have tried a lot of attempts, but the problem still resist. Maybe could it be caused by another setting than analyzer?

      Definitely, I need a step-to-step method for disabling the
      analyzer or set it to 'keyword' on all fields of an index. I
      tried a lot of attempts but no-one seems to work.


      This situation cause me much problems, I need that ES do not
      tokenize my literal strings, why there isn't a clear method to
      switch of it?


      Thanks everyones.

    OK, after a lot of attempts I can finally set analyezer to
    'keyword' for default. I do this with:

@es_client.indices.create index: "test", body: { "index" => { "analysis" => { "analyzer" => { "default" => { "type" => "keyword" }}}}}

    Now I have solved some problems, I finally can do exact matching
    stuff with 'term' query, for example on a path
    '/home/data/foo.bar' or on a gene-id 'ENSG00000186092'.


    The bad things are that problems with 'query_string' even
    worsen. It seems that query_string can't work with not analyzed
    fields.


    If I try a trivial:

@es_client.search index: "test", body: {"query" => { "query_string" => { "query" => "ENSG00000186092" }}}

    Nothing works (0 results found). Text hasn't spaces or other
    special characters that could create problems with tokenization.
    So what's the problem?


    Can a solution be the use of a 'fake' pattern tokenizer with
    pattern "$^" (this should create a non-matchable pattern, with
    result alike the 'keyword' analyzer)?


    Any other idea will be very appreciated.

Problems with search derived probably by the fact that query_string automatically make lovercased all words. It's behavior caused by 'lowercase' filter automatically inserted.

  I can't find on the web any examples about setting of
  analyzers/tokenizers/filters via ruby APIs. The only one that
  seems to work well is the pulled over method for set the default
  analyzer when a new index is created. Any suggestion?




  I need valid method for set them in custom fields/searches etc.

I successfully fix one problem: now I can set 'keyword' analyzer for
only some fields. I do this launching:

@Client.indices.put_mapping index: index_name, type: '_default_', body: {
	_default_: {
		properties: {
			position: {
				properties: {
					"dir" => {
						"type" => "string",
						"analyzer" => "keyword"
					},
					"name" => {
						"type" => "string",
						"analyzer" => "keyword"
					},
					"extension" => {
						"type" => "string",
						"analyzer" => "keyword"
					}
				}
			}
		}                             
	}
}

after index creation.


Previously this didn't work because I'd set 'dir', 'name' and
'extension' fields like flat fields (without their parent
'position'): I did that way because in searching process with 'term'
query, it needs flatten fields.  


I hope this post can be useful for ES newbies like me; mapping,
analyzing and tokening in Ruby APIs are documented very badly.

Topic		Replies	Views
No results with term filter/query on unanalyzed field Elasticsearch	3	991	July 6, 2017
Analyzer setting for non-string fields Elasticsearch	5	406	July 6, 2017
not_analyzed String search Elasticsearch	3	399	January 22, 2015
not_analyzed String search Elasticsearch	2	357	January 22, 2015
Not_analyzed as default for string type for multiple indices Elasticsearch	2	325	July 6, 2017

I can't find anything after hypens or underscores

Nik

Related topics