Hi Ümit
The reason why we store them in HDF5 files is that these files are
self contained, self-described, portable and usually allow for
implicit sharding when fetching the data. What I mean with sharding is
that typically I am only interested in the top 6000 of one analysis at
a time (for visualizing).
To have the same kind of sharding I would have to store each analysis
result in its separate index or create a separate type right?.
No - I was thinking of storing each analysis as a single doc.
Typically the results with low scores are not interested and thus I
didn't want to pollute the index with those values (if I have
thousands of those 6 M entries the index might get huge).
You can limit it to including only the first 6,000 annotations if you
like.
However I have a use case where I have an id and want to retrieve all
the analysis in which the id had a significant score. So I am actually
thinking of taking a combined approach:
Continue to store the 6M entries in HDF5 (basically an archive) and
store the top 6000 entries together with some information about the
analysis in ES.
The annotation would be stored once and then I could use parent/child
mapping to connect them.
Not sure that you even need parent/child for this. I've put together an
example below:
create your index with type 'annotation' and type 'analysis':
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"analysis" : {
"properties" : {
"name" : {
"type" : "string"
},
"annotations" : {
"type" : "integer"
}
}
},
"annotation" : {
"_id" : {
"path" : "position"
},
"properties" : {
"name" : {
"type" : "string"
},
"position" : {
"type" : "integer"
},
"score" : {
"type" : "double"
},
"description" : {
"type" : "string"
}
}
}
}
}
'
Add some annotation data:
curl -XPOST 'http://127.0.0.1:9200/test/annotation?pretty=1' -d '
{
"name" : "foo_1",
"position" : 54023,
"score" : 10.2
}
'
curl -XPOST 'http://127.0.0.1:9200/test/annotation?pretty=1' -d '
{
"name" : "foo_2",
"position" : 123410,
"score" : 9.5
}
'
curl -XPOST 'http://127.0.0.1:9200/test/annotation?pretty=1' -d '
{
"name" : "foo_3",
"position" : 230,
"score" : 7.4
}
'
curl -XPOST 'http://127.0.0.1:9200/test/annotation?pretty=1' -d '
{
"name" : "foo_4",
"position" : 12304,
"score" : 4.4
}
'
And some analysis data:
curl -XPUT 'http://127.0.0.1:9200/test/analysis/1?pretty=1' -d '
{
"name" : "First analysis",
"annotations" : [
54023,
123410,
230,
12304
]
}
'
Return all annotations for analysis '1' in order of score:
curl -XGET 'http://127.0.0.1:9200/test/annotation/_search?pretty=1' -d
'
{
"sort" : {
"score" : "desc"
},
"query" : {
"constant_score" : {
"filter" : {
"terms" : {
"position" : {
"index" : "test",
"path" : "annotations",
"id" : 1,
"type" : "analysis"
}
}
}
}
}
}
'
{
"hits" : {
"hits" : [
{
"_source" : {
"position" : 54023,
"name" : "foo_1",
"score" : 10.2
},
"sort" : [
10.2
],
"_score" : null,
"_index" : "test",
"_id" : "54023",
"_type" : "annotation"
},
{
"_source" : {
"position" : 123410,
"name" : "foo_2",
"score" : 9.5
},
"sort" : [
9.5
],
"_score" : null,
"_index" : "test",
"_id" : "123410",
"_type" : "annotation"
},
{
"_source" : {
"position" : 230,
"name" : "foo_3",
"score" : 7.4
},
"sort" : [
7.4
],
"_score" : null,
"_index" : "test",
"_id" : "230",
"_type" : "annotation"
},
{
"_source" : {
"position" : 12304,
"name" : "foo_4",
"score" : 4.4
},
"sort" : [
4.4
],
"_score" : null,
"_index" : "test",
"_id" : "12304",
"_type" : "annotation"
}
],
"max_score" : null,
"total" : 4
},
"timed_out" : false,
"_shards" : {
"failed" : 0,
"successful" : 5,
"total" : 5
},
"took" : 12
}
Find analyses which contain particular annotations:
curl -XGET 'http://127.0.0.1:9200/test/analysis/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"terms" : {
"annotations" : [
12304,
230
]
}
}
}
}
}
'
{
"hits" : {
"hits" : [
{
"_source" : {
"name" : "First analysis",
"annotations" : [
54023,
123410,
230,
12304
]
},
"_score" : 1,
"_index" : "test",
"_id" : "1",
"_type" : "analysis"
}
],
"max_score" : 1,
"total" : 1
},
"timed_out" : false,
"_shards" : {
"failed" : 0,
"successful" : 5,
"total" : 5
},
"took" : 4
}
Clint
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.