Otis
Performance Monitoring - http://sematext.com/spm
On Nov 17, 2012 11:01 AM, elasticsearch@googlegroups.com wrote:
Today's Topic Summary
Group: http://groups.google.com/group/elasticsearch/topics
- Storing table-like data in Elastic Search<#13b0f1a1165b73e2_group_thread_0>[3 Updates]
- Issue with my template creation <#13b0f1a1165b73e2_group_thread_1>[1 Update]
- Lost shards and cluster state stays red<#13b0f1a1165b73e2_group_thread_2>[1 Update]
- how to one result when search nested?<#13b0f1a1165b73e2_group_thread_3>[2 Updates]
- Control shard placement <#13b0f1a1165b73e2_group_thread_4> [2
Updates]- Multiple synonyms contribute to the score<#13b0f1a1165b73e2_group_thread_5>[1 Update]
- org.elasticsearch.transport.TransportSerializationException: Failed
to deserialize exception response from stream when one node is still
starting <#13b0f1a1165b73e2_group_thread_6> [1 Update]- [ANN] elasticsearch-equilibrium plugin version 0.19.4<#13b0f1a1165b73e2_group_thread_7>[2 Updates]
- [ANN] geocluster-facet 0.0.1 <#13b0f1a1165b73e2_group_thread_8> [2
Updates]- carrot2 error on elasticsearch version 0.19.11<#13b0f1a1165b73e2_group_thread_9>[2 Updates]
- how to update cluster setting? <#13b0f1a1165b73e2_group_thread_10>[6 Updates]
- [Autocomplete] Cleo or ElasticSearch with NGram<#13b0f1a1165b73e2_group_thread_11>[1 Update]
- Too many open files but nofile set to 256000<#13b0f1a1165b73e2_group_thread_12>[1 Update]
Storing table-like data in Elastic Searchhttp://groups.google.com/group/elasticsearch/t/e0ef5d7dfd923618
Clinton Gormley clint@traveljury.com Nov 17 01:23PM +0100
However I'm now strugging how to give priority to the matching from
the same row. I.e. currently text search for "Irland Setter" gives
second document much higher score (0.21 and 0.13 respectively).First, you're experimenting with very few documents (I assume) which
means that your terms are unevenly distributed across your shards. For
testing purposes, I would either add "search_type=dfs_query_then_fetch"
to your search query string, or I would create a test index with only 1
shard.I need the first document to have a higher score because it has both
"Irand" and "Setter" in the same row.Use the match_phrase query with a high "slop" value, eg:
{ "query": {
"match_phrase": {
"row": {
"query": "irland setter",
"slop": 100
}
}
}}This will incorporate token distance into the relevance calculation.
Also, when you're indexing arrays of analyzed strings, it may be worth
setting the position_offset_gap in the mapping.If you index ["quick brown", "fox"], by default it would be indexed as:
- position 1 : quick
- position 2 : brown
- position 3 : fox
If you set the positon_offset_gap, ie map the "row" field as:
{ type: "string", position_offset_gap: 100 }
it would be indexed as:
- position 1 : quick
- position 2 : brown
- position 103 : fox
This of course depends on what you are trying to achieve with your
data.clint
Zaar Hai haizaar@gmail.com Nov 17 06:51AM -0800
On Saturday, November 17, 2012 2:24:02 PM UTC+2, Clinton Gormley wrote:
testing purposes, I would either add
"search_type=dfs_query_then_fetch"
to your search query string, or I would create a test index with
only 1
shard.Yes, I'm currently experimenting just with two documents to make sure
I'm
on the right track. I've recreated them on a single shard following
your
advice}
}
}}This does not help. The "wrong" (second) document still gets much
higher
score.
I think its because after analysis, the first document looks like:
"boxer", "good", "dog", "germany", "irish", "setter", "great", "dog",
"irland"
And the second:
"setter", "important", "irland", "green"So in the second document the "setter" is actually closer to "irland"
then
in the first one.
- position 2 : brown
- position 103 : fox
This of course depends on what you are trying to achieve with your
data.This looks like an interesting approach. However I need gaps between
rows
and not between row members.
Strangely enough, changing mapping for "row" as you've suggested,
caused no
results at all.
Also running an analyzer shows that position_offset_gap is disregarded
completely.Here is my query:
{
"query": {
"match_phrase": {
"row": {
"query": "setter ireland", "slop":100
}
}
}
}And here is my mapping:
{
"table" : {
"properties" : {
"title" : {"type" : "string"},
"col_names" : {"type" : "string"},
"rows" : {
"properties" : {
"row" : { "type" : "string", "position_offset_gap" :
100 }
}
}
}
}
}Thank you very much for your help and time!
ZaarClinton Gormley clint@traveljury.com Nov 17 04:11PM +0100
"dog", "irland"
And the second:
"setter", "important", "irland", "green"Ah right, yes. And probably the fact that that row is shorter makes it
appear to be more relevant. You could try setting omit_norms to true,
to ignore field length normalization.http://www.elasticsearch.org/guide/reference/mapping/core-types.html
This looks like an interesting approach. However I need gaps between
rows and not between row members.True, sorry!
you may want to try an approach where you make "rows" type "nested". So
that would store each "row" as a separate sub-document, which you could
query individually.Then you can also add {include_in_root: true} to the "rows" mapping, so
that all the data would also be indexed in the root document.I've put together a demo here:
https://gist.github.com/4096675
clint
Issue with my template creationhttp://groups.google.com/group/elasticsearch/t/d4b4438aec0749f0
Radu Gheorghe radu.gheorghe@sematext.com Nov 17 04:37PM +0200
Hello Praveen,
If you use curl from the command-line, you'll probably have to escape
the quotes, like:"date_formats" : ["yyyy-MM-dd'"'T'"'HH:mm:ss"]
So instead of: single-quote-T-single-quote -> which will translate
into the string T, since you have single quotes at the beginning and
the end of your data. So the single quotes there will only end the
quoted string started first, add a T, then begin a new quoted stringYou can put:
single-quote-double-quote-single-quote-T-single-quote-double-quote-single-quote
-> which will translate to 'T', which is what you want. That's because
the first single quote ends the first part of your JSON, then you
start a double-quoted string which contains 'T', then you use
single-quotes again to continue your JSON.That's a lot of quotes
Hope it helps, though.
Best regards,
Raduhttp://sematext.com/ -- ElasticSearch -- Solr -- Lucene
On Fri, Nov 16, 2012 at 11:36 PM, Praveen Kariyanahalli
Lost shards and cluster state stays redhttp://groups.google.com/group/elasticsearch/t/a9b93dd7c0f0f649
Radu Gheorghe radu.gheorghe@sematext.com Nov 17 04:08PM +0200
Hello,
I'm not sure if I understood you question correctly, but what you can
do is:
- delete the indices which have missing shards. Something like:
curl -XDELETE localhost:9200/corrupted_index/
- reindex data belonging to those "incomplete" indices
Then your cluster state should be back to yellow/green again. Until
then, if you have indices that have missing shards but also allocated
shards, ES will still run your searches on the data you have. If
that's important to you, then you might prefer to do things like this:
- reindex data belonging to incomplete indices into new indices with
different names- delete indices with missing shards
- add aliases[0] to the new indices with the old index names, so that
searches will run as before[0]
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.htmlBest regards,
Raduhttp://sematext.com/ -- ElasticSearch -- Solr -- Lucene
how to one result when search nested?http://groups.google.com/group/elasticsearch/t/dc396ad9bb71e198
softtech Wonder wondersofttech@gmail.com Nov 17 01:43AM -0800
this is data
{
'firstname' :'Nicolas'
'lastname' :'Ippolito'
'books' :array
{
0 :array
{
'name' :php,
'rating':3
}
1 :array
{
'name' :'nodejs',
'rating':5,
},
2 :array
{
'name' :'guitar',
'rating':3,
}
}
}
I want result data if books.rating is "Max" and only one nested
example result I want
{
'firstname' :'Nicolas'
'lastname' :'Ippolito'
'books' :array
{
0 :array
{
'name' :'nodejs',
'rating':5,
},
}
}Radu Gheorghe radu.gheorghe@sematext.com Nov 17 03:46PM +0200
Hi,
You mean, when a search hits a document, you want ES only to return
parts of that document?If so, I'm not sure how you can do this other than at client side. Or,
by changing your data structure. For example, you might want to use
parent-child and search for what you want in the children, then use
the has_parent[0] query to search for what you want in the parent. In
this case Elasticsearch would return only the matching children. And
you can fetch their parents at client-side using the Multi Get API[1][0]
http://www.elasticsearch.org/guide/reference/query-dsl/has-parent-query.html
[1] http://www.elasticsearch.org/guide/reference/api/multi-get.htmlBest regards,
Raduhttp://sematext.com/ -- ElasticSearch -- Solr -- Lucene
On Sat, Nov 17, 2012 at 11:43 AM, softtech Wonder
Control shard placementhttp://groups.google.com/group/elasticsearch/t/c5e87efa6889a07f
elasticuser merik2004-elastic@yahoo.fr Nov 17 01:12AM -0800
My goal is to save space on the HDD. In my case, I have 5 To on my
cluster
but with replica shards just 2.5 To. So, I would like to keep 5 To for
my
primary shards and store the replica shards on a SAN.
I am aware that if a node with primary shards down all corresponding
replicas will be promoted to primaries but just during the time to
repair
the node.
Indeed, I would like to use nodes with HDD for requests and nodes with
SAN
just in case the nodes with primary shard has a problem.On Saturday, November 17, 2012 12:11:15 AM UTC+1, Igor Motov wrote:
Igor Motov imotov@gmail.com Nov 17 05:23AM -0800
I see. That's an interesting idea. Unfortunately, I cannot think of a
mechanism that would allow you to do something like this. You can
configure
a set of nodes to use HDD, and you can configure another set of nodes
to
use SAN. You can use Allocation Awareness<
http://www.elasticsearch.org/guide/reference/modules/cluster.html>to
make sure that if one shard is allocated on HDD node, its replica would
be allocated on SAN and vis versa. You can start HDD nodes first to
make
sure they get all primaries shards. But that's it. There is really no
way
to reassign primaries back to HDD nodes if they will get moved to SAN
nodes, or to limit searches only to HDD nodes when primaries are no
longer
there.On Saturday, November 17, 2012 4:12:31 AM UTC-5, elasticuser wrote:
Multiple synonyms contribute to the scorehttp://groups.google.com/group/elasticsearch/t/ef36599c6b76655c
Clinton Gormley clint@traveljury.com Nov 17 01:14PM +0100
Hi Kevin
document only has one mention of 'sutent' and none of its synonyms.
The net result is that words with more synonyms artificially get a
boost in the results.There are various ways to approach this problem. Either you:
- expand your synonym list at index time (ie you store all
variations of the synonym in your index), but then you search on
just one variation (by using a different analyzer at search or
index time),- contract your synonym list at index and search time: eg foo, bar
or baz all get indexed as just 'foo'. A search for 'bar'
becomes a search for 'foo'I have put together a gist demonstrating how this all works:
https://gist.github.com/4095280The question remains: which should I prefer? expand: true or false?
I'm open to disagreement, but my vote would be for expand: false. ie
index just the first word in the synonym list, not all the words.My reason for that is:
- fewer terms to index
- replacing synonyms with all variations or just one variation
implies the same loss of original information (ie which synonym
appeared in the original text).- Synonyms can be of different lengths (eg "wi fi" vs "wifi"), which
means that (with expand: true), the phrase "wifi router" would be
indexed as:Pos: 1 2 3
wifi router
wi fi routerwhich can mess up eg phrase queries which depends on token positions,
and can also mess up snippet highlighting.hth
clint
org.elasticsearch.transport.TransportSerializationException: Failed to
deserialize exception response from stream when one node is still startinghttp://groups.google.com/group/elasticsearch/t/a5d2557800cb33aChris Male gento0nz@gmail.com Nov 17 01:35AM -0800
Hi,
Are you able to share your logs around the Exception? Just so we can
see
what's going on leading up to it.On Friday, November 16, 2012 11:45:03 PM UTC+13, Barbara Ferreira
wrote:[ANN] elasticsearch-equilibrium plugin version 0.19.4http://groups.google.com/group/elasticsearch/t/484bd10e8f3fab74
Otis Gospodnetic otis.gospodnetic@gmail.com Nov 16 06:39PM -0800
Hi Lee,
Thanks, this sounds nice.
Does this take into account shards and their replicas to ensure that
no
more than 1 copy of a shard is placed on any 1 server?Otis
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.htmlOn Friday, November 16, 2012 12:15:17 PM UTC-5, Lee Hinman wrote:
Lee Hinman matthew.hinman@gmail.com Nov 16 10:10PM -0800
On Friday, November 16, 2012 9:39:04 PM UTC-5, Otis Gospodnetic wrote:
Does this take into account shards and their replicas to ensure that
no
more than 1 copy of a shard is placed on any 1 server?Otis
Hi Otis,
Yes, this plugin takes into account all the same Deciders that the
original
shard allocator does, just with the additional check for available
disk
space before giving the "thumbs up" for shard allocation or relocation.
- Lee
[ANN] geocluster-facet 0.0.1http://groups.google.com/group/elasticsearch/t/6c3f95687578e6db
Eric Jain eric.jain@gmail.com Nov 16 06:20PM -0800
Here's a (somewhat simplistic) facet that clusters geo_points:
https://github.com/zenobase/geocluster-facet
You can see this plugin in action here:
https://zenobase.com/#/buckets/u07qih0a27/
Hoping to get some feedback, suggestions for improvements etc!
Medcl Zen medcl2000@gmail.com Nov 17 10:37AM +0800
nice plugin,thanks for sharing~
carrot2 error on elasticsearch version 0.19.11http://groups.google.com/group/elasticsearch/t/c0d6b6e932f92d52
Jalal Mohammed jalalm@algotree.com Nov 16 01:40PM +0530
Thanks Chris,
The error was with elasticsearch version 0.19.11, the carrot plugin
used to
work well with 0.19.8. I will raise this issue with the Carrot plugin
project.Medcl Zen medcl2000@gmail.com Nov 17 10:34AM +0800
hi,already fixed in 1.1.1,
have fun,how to update cluster setting?http://groups.google.com/group/elasticsearch/t/4645c6f083f09ef9
Igor Motov imotov@gmail.com Nov 16 04:56PM -0800
Did you run it like this?
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent": {
"indices.store.throttle.type": "merge",
"indices.store.throttle.max_bytes_per_sec": "50mb"
}
}'Which version of elasticsearch are you using?
On Friday, November 16, 2012 7:48:14 PM UTC-5, Jae wrote:
Jae metacret@gmail.com Nov 16 04:59PM -0800
Every setting I tried is throwing the same error message. I think that
I
should see the full list of cluster wide settings with 'curl -XGET
http://localhost:7104/_cluster/settings' but I am seeing empty
persistent
and transient settings like{"persistent":{},"transient":{}}
What did I do wrong?
On Friday, November 16, 2012 4:48:14 PM UTC-8, Jae wrote:
Igor Motov imotov@gmail.com Nov 16 05:06PM -0800
When you run it with -XGET you only get back the setting that you set
there
using -XPUT.On Friday, November 16, 2012 7:59:58 PM UTC-5, Jae wrote:
Jae metacret@gmail.com Nov 16 05:40PM -0800
So, how can I add updatable setting using -XPUT?
On Friday, November 16, 2012 5:06:30 PM UTC-8, Igor Motov wrote:
Igor Motov imotov@gmail.com Nov 16 05:42PM -0800
Like this:
curl -XPUT localhost:7104/_cluster/settings -d '{
"persistent": {
"indices.store.throttle.type": "merge",
"indices.store.throttle.max_bytes_per_sec": "50mb"
}
}'On Friday, November 16, 2012 8:06:30 PM UTC-5, Igor Motov wrote:
Jae metacret@gmail.com Nov 16 05:48PM -0800
what the heck... when I specify an option as a file name such as
curl -XPUT localhost:7104/_cluster/settings -d @filename
and filename contains the following settings, it didn't work! what's
the
difference?Anyway, thank you so much for your patience
On Friday, November 16, 2012 5:42:20 PM UTC-8, Igor Motov wrote:
[Autocomplete] Cleo or ElasticSearch with NGramhttp://groups.google.com/group/elasticsearch/t/657c9bd4c63477c2
kidkid zkidkid@gmail.com Nov 16 12:11AM -0800
Hi All,
Currently, I am running searching with ES.
We use 3 server with 24 cores and 30GB Ram for each server.I want to build a index with NGram for auto complete but my friend
tells me
to use CLeo.I try to google about Cleo but I don't find any useful article about
Cleo
vs ES or (Lucence)The problem is we have do autocomplete with Lucene and find it's not
good
enough.Could someone help me ?
Thank in advance.
Too many open files but nofile set to 256000http://groups.google.com/group/elasticsearch/t/ae3dd2647a813a81
Derry O' Sullivan derryos@gmail.com Nov 15 11:54PM -0800
For anyone with a problem like this, it may be worth confirming the
numbers
within elasticsearch as well using the nodes info api:/_nodes?process
gives the max_file_descriptors:
{
- refresh_interval: 1000,
- id: 13919,
- max_file_descriptors: 25000
}
/_nodes/process/stats
gives:
open_file_descriptors: 516,
in the output:
On Thursday, 15 November 2012 11:12:43 UTC, mohsin husen wrote:
You received this message because you are subscribed to the Google Group
elasticsearch.
You can post via email elasticsearch@googlegroups.com.
To unsubscribe from this group, sendelasticsearch+unsubscribe@googlegroups.coman empty message.
For more options, visithttp://groups.google.com/group/elasticsearch/topicsthis group.--
--