Hi, I'm very newbie on ElasticSearch.
I'm try to indexing a set of biological data. There are some fields like
'gene_id' or 'gene_shortname' that should be processed as literal strings.
When I try to search for 'ZNF6092' in a field filled with 'linc-ZNF6092-6',
I can't find anything. When I search for 'linc' I find correct document
elsewhere.
It seems that this is a problem with ES analyzer, but I tried to set it for
do not analyze fields, but it seems that nothing changes.
I try with:
On Wed, Nov 12, 2014 at 8:15 AM, Alessandro Bonfanti bnf.lsn@gmail.com
wrote:
Hi, I'm very newbie on Elasticsearch.
I'm try to indexing a set of biological data. There are some fields like
'gene_id' or 'gene_shortname' that should be processed as literal strings.
When I try to search for 'ZNF6092' in a field filled with
'linc-ZNF6092-6', I can't find anything. When I search for 'linc' I find
correct document elsewhere.
It seems that this is a problem with ES analyzer, but I tried to set it
for do not analyze fields, but it seems that nothing changes.
I try with:
And then re-indexing of all documents. I failed, but where?
Thanks in advance,
Alessandro
Its an analyzer problem, certainly. You've turned off analyzers with
"index":"not_analazyed". What you probably want is for the gene_short_name
to be analyzed so that dashes are considered "word separators". If you do
that you can find linc-ZNF6092-6 by performing a simple_query_string (or
match) search for ZNF6092 or ZNF6092 6 or 6 or linc. Have a look at
and go from there. You may also want to use a lowercase filter so you can
search for znf6092 and still find it.
This is a good read on how to change the mapping as well:
even if you don't need all the information in there it is nice to know.
On Wed, Nov 12, 2014 at 8:15 AM,
Alessandro Bonfanti <bnf.lsn@gmail.com> wrote:
Hi, I'm very newbie on ElasticSearch.
I'm try to indexing a set of biological data. There are
some fields like 'gene_id' or 'gene_shortname' that
should be processed as literal strings.
When I try to search for 'ZNF6092' in a field filled
with 'linc-ZNF6092-6', I can't find anything. When I
search for 'linc' I find correct document elsewhere.
It seems that this is a problem with ES analyzer, but I
tried to set it for do not analyze fields, but it seems
that nothing changes.
I try with:
And then re-indexing of all documents. I failed, but
where?
Thanks in advance,
Alessandro
Its an analyzer problem, certainly. You've turned off
analyzers with "index":"not_analazyed". What you probably
want is for the gene_short_name to be analyzed so that
dashes are considered "word separators". If you do that
you can find linc-ZNF6092-6 by performing a
simple_query_string (or match) search for
<code>ZNF6092</code> or <code>ZNF6092
6</code> or <code>6</code> or
<code>linc</code>. Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also want to use a lowercase
filter so you can search for
<code>znf6092</code> and still find it.
This is a good read on how to change the mapping as
well:
even if you don't need all the information in there it
is nice to know.
Nik
--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its topics, send an email
to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.
Very thanks for your answer,
What I want is that ES store fields as literals, so I should find
ZNF6092 with a wilcard search (*ZNF6092* for example).
I tried set "pattern" to "*" for testing (* isn't in gene_shortname,
so I suppose that entire string is stored. But anyway I still find
nothing.
On Wed, Nov 12, 2014 at 11:13 AM, Alessandro Bonfanti bnf.lsn@gmail.com
wrote:
Il 12/11/2014 15:25, Nikolas Everett ha scritto:
On Wed, Nov 12, 2014 at 8:15 AM, Alessandro Bonfanti bnf.lsn@gmail.com
wrote:
Hi, I'm very newbie on Elasticsearch.
I'm try to indexing a set of biological data. There are some fields like
'gene_id' or 'gene_shortname' that should be processed as literal strings.
When I try to search for 'ZNF6092' in a field filled with
'linc-ZNF6092-6', I can't find anything. When I search for 'linc' I find
correct document elsewhere.
It seems that this is a problem with ES analyzer, but I tried to set it
for do not analyze fields, but it seems that nothing changes.
I try with:
And then re-indexing of all documents. I failed, but where?
Thanks in advance,
Alessandro
Its an analyzer problem, certainly. You've turned off analyzers with
"index":"not_analazyed". What you probably want is for the gene_short_name
to be analyzed so that dashes are considered "word separators". If you do
that you can find linc-ZNF6092-6 by performing a simple_query_string (or
match) search for ZNF6092 or ZNF6092 6 or 6 or linc. Have a look at Elasticsearch Platform — Find real-time answers at scale | Elastic
and go from there. You may also want to use a lowercase filter so you can
search for znf6092 and still find it.
Very thanks for your answer,
What I want is that ES store fields as literals, so I should find ZNF6092
with a wilcard search (ZNF6092 for example).
I tried set "pattern" to "" for testing ( isn't in gene_shortname, so I
suppose that entire string is stored. But anyway I still find nothing.
You'd have to post your queries for me to help more but in general if best
to analyze the content up front and perform basic match queries without
wildcards than it is to search with wildcards. Wildcards are way way way
slower.
On Wed, Nov 12, 2014 at 11:13 AM,
Alessandro Bonfanti <bnf.lsn@gmail.com> wrote:
Il 12/11/2014 15:25, Nikolas Everett ha scritto:
On Wed, Nov 12, 2014
at 8:15 AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:
Hi, I'm very newbie on
ElasticSearch.
I'm try to indexing a set of biological
data. There are some fields like
'gene_id' or 'gene_shortname' that
should be processed as literal strings.
When I try to search for 'ZNF6092' in a
field filled with 'linc-ZNF6092-6', I
can't find anything. When I search for
'linc' I find correct document
elsewhere.
It seems that this is a problem with ES
analyzer, but I tried to set it for do
not analyze fields, but it seems that
nothing changes.
I try with:
And then re-indexing of all documents. I
failed, but where?
Thanks in advance,
Alessandro
Its an analyzer problem, certainly.
You've turned off analyzers with
"index":"not_analazyed". What you
probably want is for the gene_short_name
to be analyzed so that dashes are
considered "word separators". If you do
that you can find linc-ZNF6092-6 by
performing a simple_query_string (or
match) search for
<code>ZNF6092</code> or
<code>ZNF6092 6</code> or
<code>6</code> or
<code>linc</code>. Have a
look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also want to
use a lowercase filter so you can search
for <code>znf6092</code> and
still find it.
This is a good read on how to change
the mapping as well:
even if you don't need all the
information in there it is nice to know.
Nik
--
You received this message because you are subscribed
to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its topics,
send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.
Very thanks for your answer,
What I want is that ES store fields as literals, so I
should find ZNF6092 with a wilcard search (*ZNF6092* for
example).
I tried set "pattern" to "*" for testing (* isn't in
gene_shortname, so I suppose that entire string is
stored. But anyway I still find nothing.
You'd have to post your queries for me to help more but
in general if best to analyze the content up front and
perform basic match queries without wildcards than it is
to search with wildcards. Wildcards are way way way
slower.
Nik
--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its topics, send an email
to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.
Variables' name should be auto-explicative of its content.
I read that wildcards are slower, if you have a more clean solution
(I need anyway that I still can search for "linc-ZNF6092" in
addiction for "ZNF6092") it will be very welcome.
Il 12/11/2014 17:43, Alessandro
Bonfanti ha scritto:
Il 12/11/2014 17:20, Nikolas Everett
ha scritto:
On Wed, Nov 12, 2014 at 11:13 AM,
Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:
Il 12/11/2014 15:25, Nikolas Everett ha scritto:
On Wed, Nov 12,
2014 at 8:15 AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:
Hi, I'm very newbie on
ElasticSearch.
I'm try to indexing a set of
biological data. There are some fields
like 'gene_id' or 'gene_shortname'
that should be processed as literal
strings.
When I try to search for 'ZNF6092' in
a field filled with 'linc-ZNF6092-6',
I can't find anything. When I search
for 'linc' I find correct document
elsewhere.
It seems that this is a problem with
ES analyzer, but I tried to set it for
do not analyze fields, but it seems
that nothing changes.
I try with:
And then re-indexing of all documents.
I failed, but where?
Thanks in advance,
Alessandro
Its an analyzer problem, certainly.
You've turned off analyzers with
"index":"not_analazyed". What you
probably want is for the gene_short_name
to be analyzed so that dashes are
considered "word separators". If you do
that you can find linc-ZNF6092-6 by
performing a simple_query_string (or
match) search for
<code>ZNF6092</code> or
<code>ZNF6092 6</code> or
<code>6</code> or
<code>linc</code>. Have a
look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also want to
use a lowercase filter so you can search
for <code>znf6092</code> and
still find it.
This is a good read on how to change
the mapping as well:
even if you don't need all the
information in there it is nice to know.
Nik
--
You received this message because you are subscribed
to a topic in the Google Groups "elasticsearch"
group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its topics,
send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.
Very thanks for your answer,
What I want is that ES store fields as literals, so I
should find ZNF6092 with a wilcard search (*ZNF6092*
for example).
I tried set "pattern" to "*" for testing (* isn't in
gene_shortname, so I suppose that entire string is
stored. But anyway I still find nothing.
You'd have to post your queries for me to help more
but in general if best to analyze the content up front
and perform basic match queries without wildcards than
it is to search with wildcards. Wildcards are way way
way slower.
Nik
--
You received this message because you are subscribed to a topic
in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its topics, send an email
to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.
Variables' name should be auto-explicative of its content.
I read that wildcards are slower, if you have a more clean
solution (I need anyway that I still can search for "linc-ZNF6092"
in addiction for "ZNF6092") it will be very welcome.
I have tried a lot of attempts, but the problem still resist. Maybe
could it be caused by another setting than analyzer?
Il 02/12/2014 09:21, Alessandro
Bonfanti ha scritto:
Il 12/11/2014 17:43, Alessandro
Bonfanti ha scritto:
Il 12/11/2014 17:20, Nikolas
Everett ha scritto:
On Wed, Nov 12, 2014 at 11:13 AM,
Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:
Il 12/11/2014 15:25, Nikolas Everett ha
scritto:
On Wed, Nov 12,
2014 at 8:15 AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:
Hi, I'm very newbie on
ElasticSearch.
I'm try to indexing a set of
biological data. There are some
fields like 'gene_id' or
'gene_shortname' that should be
processed as literal strings.
When I try to search for 'ZNF6092'
in a field filled with
'linc-ZNF6092-6', I can't find
anything. When I search for 'linc' I
find correct document elsewhere.
It seems that this is a problem with
ES analyzer, but I tried to set it
for do not analyze fields, but it
seems that nothing changes.
I try with:
And then re-indexing of all
documents. I failed, but where?
Thanks in advance,
Alessandro
Its an analyzer problem,
certainly. You've turned off
analyzers with
"index":"not_analazyed". What you
probably want is for the
gene_short_name to be analyzed so that
dashes are considered "word
separators". If you do that you can
find linc-ZNF6092-6 by performing a
simple_query_string (or match) search
for <code>ZNF6092</code>
or <code>ZNF6092 6</code>
or <code>6</code> or
<code>linc</code>. Have
a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also want
to use a lowercase filter so you can
search for
<code>znf6092</code> and
still find it.
This is a good read on how to
change the mapping as well:
even if you don't need all the
information in there it is nice to
know.
Nik
--
You received this message because you are
subscribed to a topic in the Google Groups
"elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its topics,
send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.
Very thanks for your answer,
What I want is that ES store fields as literals, so
I should find ZNF6092 with a wilcard search
(*ZNF6092* for example).
I tried set "pattern" to "*" for testing (* isn't in
gene_shortname, so I suppose that entire string is
stored. But anyway I still find nothing.
You'd have to post your queries for me to help more
but in general if best to analyze the content up front
and perform basic match queries without wildcards than
it is to search with wildcards. Wildcards are way way
way slower.
Nik
--
You received this message because you are subscribed to a
topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its topics, send an
email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.
Variables' name should be auto-explicative of its content.
I read that wildcards are slower, if you have a more clean
solution (I need anyway that I still can search for
"linc-ZNF6092" in addiction for "ZNF6092") it will be very
welcome.
I have tried a lot of attempts, but the problem still resist.
Maybe could it be caused by another setting than analyzer?
Definitely, I need a step-to-step method for disabling the analyzer
or set it to 'keyword' on all fields of an index. I tried a lot of
attempts but no-one seems to work.
This situation cause me much problems, I need that ES do not
tokenize my literal strings, why there isn't a clear method to
switch of it?
Thanks everyones.
Il 21/01/2015 11:43, Alessandro
Bonfanti ha scritto:
Il 02/12/2014 09:21, Alessandro
Bonfanti ha scritto:
Il 12/11/2014 17:43, Alessandro
Bonfanti ha scritto:
Il 12/11/2014 17:20, Nikolas
Everett ha scritto:
On Wed, Nov 12, 2014 at 11:13
AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:
Il 12/11/2014 15:25, Nikolas Everett ha
scritto:
On Wed, Nov 12,
2014 at 8:15 AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:
Hi, I'm very newbie
on ElasticSearch.
I'm try to indexing a set of
biological data. There are some
fields like 'gene_id' or
'gene_shortname' that should be
processed as literal strings.
When I try to search for 'ZNF6092'
in a field filled with
'linc-ZNF6092-6', I can't find
anything. When I search for 'linc'
I find correct document elsewhere.
It seems that this is a problem
with ES analyzer, but I tried to
set it for do not analyze fields,
but it seems that nothing changes.
I try with:
And then re-indexing of all
documents. I failed, but where?
Thanks in advance,
Alessandro
Its an analyzer problem,
certainly. You've turned off
analyzers with
"index":"not_analazyed". What you
probably want is for the
gene_short_name to be analyzed so
that dashes are considered "word
separators". If you do that you can
find linc-ZNF6092-6 by performing a
simple_query_string (or match)
search for
<code>ZNF6092</code> or
<code>ZNF6092 6</code>
or <code>6</code> or
<code>linc</code>.
Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also
want to use a lowercase filter so
you can search for
<code>znf6092</code> and
still find it.
This is a good read on how to
change the mapping as well:
even if you don't need all the
information in there it is nice to
know.
Nik
--
You received this message because you are
subscribed to a topic in the Google Groups
"elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its
topics, send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.
Very thanks for your answer,
What I want is that ES store fields as literals,
so I should find ZNF6092 with a wilcard search
(*ZNF6092* for example).
I tried set "pattern" to "*" for testing (* isn't
in gene_shortname, so I suppose that entire string
is stored. But anyway I still find nothing.
You'd have to post your queries for me to help
more but in general if best to analyze the content
up front and perform basic match queries without
wildcards than it is to search with wildcards.
Wildcards are way way way slower.
Nik
--
You received this message because you are subscribed to a
topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its topics, send an
email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.
Variables' name should be auto-explicative of its content.
I read that wildcards are slower, if you have a more clean
solution (I need anyway that I still can search for
"linc-ZNF6092" in addiction for "ZNF6092") it will be very
welcome.
I have tried a lot of attempts, but the problem still resist.
Maybe could it be caused by another setting than analyzer?
Definitely, I need a step-to-step method for disabling the
analyzer or set it to 'keyword' on all fields of an index. I tried
a lot of attempts but no-one seems to work.
This situation cause me much problems, I need that ES do not
tokenize my literal strings, why there isn't a clear method to
switch of it?
Thanks everyones.
OK, after a lot of attempts I can finally set analyezer to 'keyword'
for default. I do this with:
Now I have solved some problems, I finally can do exact matching
stuff with 'term' query, for example on a path '/home/data/foo.bar'
or on a gene-id 'ENSG00000186092'.
The bad things are that problems with 'query_string' even worsen. It
seems that query_string can't work with not analyzed fields.
If I try a trivial:
Nothing works (0 results found). Text hasn't spaces or other special
characters that could create problems with tokenization. So what's
the problem?
Can a solution be the use of a 'fake' pattern tokenizer with pattern
"$^" (this should create a non-matchable pattern, with result alike
the 'keyword' analyzer)?
Any other idea will be very appreciated.
Il 26/01/2015 16:37, Alessandro
Bonfanti ha scritto:
Il 21/01/2015 11:43, Alessandro
Bonfanti ha scritto:
Il 02/12/2014 09:21, Alessandro
Bonfanti ha scritto:
Il 12/11/2014 17:43, Alessandro
Bonfanti ha scritto:
Il 12/11/2014 17:20, Nikolas
Everett ha scritto:
On Wed, Nov 12, 2014 at 11:13
AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:
Il 12/11/2014 15:25, Nikolas Everett ha
scritto:
On Wed, Nov
12, 2014 at 8:15 AM, Alessandro
Bonfanti <bnf.lsn@gmail.com>
wrote:
Hi, I'm very newbie
on ElasticSearch.
I'm try to indexing a set of
biological data. There are some
fields like 'gene_id' or
'gene_shortname' that should be
processed as literal strings.
When I try to search for
'ZNF6092' in a field filled with
'linc-ZNF6092-6', I can't find
anything. When I search for
'linc' I find correct document
elsewhere.
It seems that this is a problem
with ES analyzer, but I tried to
set it for do not analyze
fields, but it seems that
nothing changes.
I try with:
And then re-indexing of all
documents. I failed, but where?
Thanks in advance,
Alessandro
Its an analyzer problem,
certainly. You've turned off
analyzers with
"index":"not_analazyed". What you
probably want is for the
gene_short_name to be analyzed so
that dashes are considered "word
separators". If you do that you
can find linc-ZNF6092-6 by
performing a simple_query_string
(or match) search for
<code>ZNF6092</code>
or <code>ZNF6092
6</code> or
<code>6</code> or
<code>linc</code>.
Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also
want to use a lowercase filter so
you can search for
<code>znf6092</code>
and still find it.
This is a good read on how to
change the mapping as well:
even if you don't need all the
information in there it is nice to
know.
Nik
--
You received this message because you are
subscribed to a topic in the Google Groups
"elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its
topics, send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.
Very thanks for your answer,
What I want is that ES store fields as literals,
so I should find ZNF6092 with a wilcard search
(*ZNF6092* for example).
I tried set "pattern" to "*" for testing (*
isn't in gene_shortname, so I suppose that
entire string is stored. But anyway I still find
nothing.
You'd have to post your queries for me to help
more but in general if best to analyze the content
up front and perform basic match queries without
wildcards than it is to search with wildcards.
Wildcards are way way way slower.
Nik
--
You received this message because you are subscribed to a
topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its topics, send an
email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.
Variables' name should be auto-explicative of its content.
I read that wildcards are slower, if you have a more clean
solution (I need anyway that I still can search for
"linc-ZNF6092" in addiction for "ZNF6092") it will be very
welcome.
I have tried a lot of attempts, but the problem still resist.
Maybe could it be caused by another setting than analyzer?
Definitely, I need a step-to-step method for disabling the
analyzer or set it to 'keyword' on all fields of an index. I
tried a lot of attempts but no-one seems to work.
This situation cause me much problems, I need that ES do not
tokenize my literal strings, why there isn't a clear method to
switch of it?
Thanks everyones.
OK, after a lot of attempts I can finally set analyezer to
'keyword' for default. I do this with:
Now I have solved some problems, I finally can do exact matching
stuff with 'term' query, for example on a path
'/home/data/foo.bar' or on a gene-id 'ENSG00000186092'.
The bad things are that problems with 'query_string' even worsen.
It seems that query_string can't work with not analyzed fields.
If I try a trivial:
Nothing works (0 results found). Text hasn't spaces or other
special characters that could create problems with tokenization.
So what's the problem?
Can a solution be the use of a 'fake' pattern tokenizer with
pattern "$^" (this should create a non-matchable pattern, with
result alike the 'keyword' analyzer)?
Any other idea will be very appreciated.
Problems with search derived probably by the fact that query_string
automatically make lovercased all words. It's behavior caused by
'lowercase' filter automatically inserted.
I can't find on the web any examples about setting of
analyzers/tokenizers/filters via ruby APIs. The only one that seems
to work well is the pulled over method for set the default analyzer
when a new index is created. Any suggestion?
I need valid method for set them in custom fields/searches etc.
Il 28/01/2015 10:58, Alessandro
Bonfanti ha scritto:
Il 26/01/2015 16:37, Alessandro
Bonfanti ha scritto:
Il 21/01/2015 11:43, Alessandro
Bonfanti ha scritto:
Il 02/12/2014 09:21, Alessandro
Bonfanti ha scritto:
Il 12/11/2014 17:43, Alessandro
Bonfanti ha scritto:
Il 12/11/2014 17:20, Nikolas
Everett ha scritto:
On Wed, Nov 12, 2014 at
11:13 AM, Alessandro Bonfanti <bnf.lsn@gmail.com>
wrote:
Il 12/11/2014 15:25, Nikolas Everett ha
scritto:
On Wed, Nov
12, 2014 at 8:15 AM, Alessandro
Bonfanti <bnf.lsn@gmail.com>
wrote:
Hi, I'm very
newbie on ElasticSearch.
I'm try to indexing a set of
biological data. There are
some fields like 'gene_id' or
'gene_shortname' that should
be processed as literal
strings.
When I try to search for
'ZNF6092' in a field filled
with 'linc-ZNF6092-6', I can't
find anything. When I search
for 'linc' I find correct
document elsewhere.
It seems that this is a
problem with ES analyzer, but
I tried to set it for do not
analyze fields, but it seems
that nothing changes.
I try with:
And then re-indexing of all
documents. I failed, but
where?
Thanks in advance,
Alessandro
Its an analyzer problem,
certainly. You've turned off
analyzers with
"index":"not_analazyed". What
you probably want is for the
gene_short_name to be analyzed
so that dashes are considered
"word separators". If you do
that you can find linc-ZNF6092-6
by performing a
simple_query_string (or match)
search for
<code>ZNF6092</code>
or <code>ZNF6092
6</code> or
<code>6</code> or
<code>linc</code>.
Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there. You may also
want to use a lowercase filter
so you can search for
<code>znf6092</code>
and still find it.
This is a good read on how to
change the mapping as well:
even if you don't need all
the information in there it is
nice to know.
Nik
--
You received this message because you are
subscribed to a topic in the Google Groups
"elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe" target="_blank">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its
topics, send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com?utm_medium=email&utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.
Very thanks for your answer,
What I want is that ES store fields as
literals, so I should find ZNF6092 with a
wilcard search (*ZNF6092* for example).
I tried set "pattern" to "*" for testing (*
isn't in gene_shortname, so I suppose that
entire string is stored. But anyway I still
find nothing.
You'd have to post your queries for me to
help more but in general if best to analyze the
content up front and perform basic match queries
without wildcards than it is to search with
wildcards. Wildcards are way way way slower.
Nik
--
You received this message because you are subscribed to
a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe">https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe</a>.
To unsubscribe from this group and all its topics, send
an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.
Variables' name should be auto-explicative of its content.
I read that wildcards are slower, if you have a more clean
solution (I need anyway that I still can search for
"linc-ZNF6092" in addiction for "ZNF6092") it will be very
welcome.
I have tried a lot of attempts, but the problem still
resist. Maybe could it be caused by another setting than
analyzer?
Definitely, I need a step-to-step method for disabling the
analyzer or set it to 'keyword' on all fields of an index. I
tried a lot of attempts but no-one seems to work.
This situation cause me much problems, I need that ES do not
tokenize my literal strings, why there isn't a clear method to
switch of it?
Thanks everyones.
OK, after a lot of attempts I can finally set analyezer to
'keyword' for default. I do this with:
Now I have solved some problems, I finally can do exact matching
stuff with 'term' query, for example on a path
'/home/data/foo.bar' or on a gene-id 'ENSG00000186092'.
The bad things are that problems with 'query_string' even
worsen. It seems that query_string can't work with not analyzed
fields.
If I try a trivial:
Nothing works (0 results found). Text hasn't spaces or other
special characters that could create problems with tokenization.
So what's the problem?
Can a solution be the use of a 'fake' pattern tokenizer with
pattern "$^" (this should create a non-matchable pattern, with
result alike the 'keyword' analyzer)?
Any other idea will be very appreciated.
Problems with search derived probably by the fact that
query_string automatically make lovercased all words. It's
behavior caused by 'lowercase' filter automatically inserted.
I can't find on the web any examples about setting of
analyzers/tokenizers/filters via ruby APIs. The only one that
seems to work well is the pulled over method for set the default
analyzer when a new index is created. Any suggestion?
I need valid method for set them in custom fields/searches etc.
I successfully fix one problem: now I can set 'keyword' analyzer for
only some fields. I do this launching:
after index creation.
Previously this didn't work because I'd set 'dir', 'name' and
'extension' fields like flat fields (without their parent
'position'): I did that way because in searching process with 'term'
query, it needs flatten fields.
I hope this post can be useful for ES newbies like me; mapping,
analyzing and tokening in Ruby APIs are documented very badly.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.