Hello, I'm new in using elasticsearch, so maybe this is a basic question...
Is there any way to see how is the text "stored" or at least how it would
look once the filters defined for a field (in the analyzer) are applied?
I know that the actual field is stored "as is", and the filters are
considered for indexing purposes, but i want to see what is the result of
that operation.
Why do I ask this, the reason is that I'm applying filters to a field (for
example removing urls), and performing aggregations on that field, but the
aggregations return "http" as a used word, and guess that is not correct,
as it shall be removed...
Use the analyze API to get a view into how your analysis chain (tokenizer
and filters) affect text.
The index itself is all jumbled together with all the documents and there
isn't a good way to dig the data for a single document out of it.
On Dec 31, 2014 10:36 PM, "Bruno Kamiche" bkamiche@gmail.com wrote:
Hello, I'm new in using elasticsearch, so maybe this is a basic question...
Is there any way to see how is the text "stored" or at least how it would
look once the filters defined for a field (in the analyzer) are applied?
I know that the actual field is stored "as is", and the filters are
considered for indexing purposes, but i want to see what is the result of
that operation.
Why do I ask this, the reason is that I'm applying filters to a field (for
example removing urls), and performing aggregations on that field, but the
aggregations return "http" as a used word, and guess that is not correct,
as it shall be removed...
Beside analyze API, the explain API in the query can tell why a document
has been included into the result set, and allows conclusions about the
terms how they are stored in the index.
Jörg
On Thu, Jan 1, 2015 at 5:26 AM, Nikolas Everett nik9000@gmail.com wrote:
Use the analyze API to get a view into how your analysis chain (tokenizer
and filters) affect text.
The index itself is all jumbled together with all the documents and there
isn't a good way to dig the data for a single document out of it.
On Dec 31, 2014 10:36 PM, "Bruno Kamiche" bkamiche@gmail.com wrote:
Hello, I'm new in using elasticsearch, so maybe this is a basic
question...
Is there any way to see how is the text "stored" or at least how it would
look once the filters defined for a field (in the analyzer) are applied?
I know that the actual field is stored "as is", and the filters are
considered for indexing purposes, but i want to see what is the result of
that operation.
Why do I ask this, the reason is that I'm applying filters to a field
(for example removing urls), and performing aggregations on that field, but
the aggregations return "http" as a used word, and guess that is not
correct, as it shall be removed...
Thanks for your replies, that gave me the clue for what I was looking for,
and now it is solved!
On Wednesday, December 31, 2014 10:36:27 PM UTC-5, Bruno Kamiche wrote:
Hello, I'm new in using elasticsearch, so maybe this is a basic question...
Is there any way to see how is the text "stored" or at least how it would
look once the filters defined for a field (in the analyzer) are applied?
I know that the actual field is stored "as is", and the filters are
considered for indexing purposes, but i want to see what is the result of
that operation.
Why do I ask this, the reason is that I'm applying filters to a field (for
example removing urls), and performing aggregations on that field, but the
aggregations return "http" as a used word, and guess that is not correct,
as it shall be removed...
On Thursday, 1 January 2015 17:05:25 UTC, Bruno Kamiche wrote:
Thanks for your replies, that gave me the clue for what I was looking for,
and now it is solved!
On Wednesday, December 31, 2014 10:36:27 PM UTC-5, Bruno Kamiche wrote:
Hello, I'm new in using elasticsearch, so maybe this is a basic
question...
Is there any way to see how is the text "stored" or at least how it would
look once the filters defined for a field (in the analyzer) are applied?
I know that the actual field is stored "as is", and the filters are
considered for indexing purposes, but i want to see what is the result of
that operation.
Why do I ask this, the reason is that I'm applying filters to a field
(for example removing urls), and performing aggregations on that field, but
the aggregations return "http" as a used word, and guess that is not
correct, as it shall be removed...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.