Get & Query


(Rich Kroll) #1

I have a use case where I have just indexed a new document and need to run a
query against it to determine if it is a hit. To further explain the use
case, I have users who specify a query to run against incoming documents,
and if there is a hit, they would like to be notified by email.

I looked at the docs but could not find a way to specify an ID and a
queryString when searching for a document. Can anyone recommend a way to
approach this problem?

Thanks!

--
“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein


(Lukáš Vlček) #2

Hi,

I think you want to look at index API doc:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/index/
http://www.elasticsearch.com/docs/elasticsearch/rest_api/index/As you can
see you can specify document id for the document (PUT) or have ElasticSearch
create document id for you (POST). Either way you get "_id" of the newly
indexed document in response.

As for the searching check query_string:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/query_string_query/

http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/query_string_query/
Regards,
Lukas

On Sat, Dec 4, 2010 at 6:25 PM, Rich Kroll kroll.rich@gmail.com wrote:

I have a use case where I have just indexed a new document and need to run
a query against it to determine if it is a hit. To further explain the use
case, I have users who specify a query to run against incoming documents,
and if there is a hit, they would like to be notified by email.

I looked at the docs but could not find a way to specify an ID and a
queryString when searching for a document. Can anyone recommend a way to
approach this problem?

Thanks!

--
“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein


(Rich Kroll) #3

Lukas,
Thanks for taking the time to respond. I have not been using the index API
to specify IDs as I do not want to create an id generation algo when the IDs
provided by ES work perfectly for me already.

I believe I have found a way to get what I need with the following:

curl -XGET 'http://localhost:9200/my_index/my_type/_search?pretty=true' -d '
{
"query" : {
"filtered" : {
"query" : {
"query_string" : {
"query" : "my search query"
}
},
"filter" : {
"term" : { "_id" : "Yg3FWjfHRm62uLdqkl832g" }
}
}
}
}'

The question I have now is - will I run into any problems with visibility of
the document in the index using this strategy? I am hoping that since I was
returned an ID of the indexed document that I can then immediately perform a
search for it.

Regards,

Rich


(Shay Banon) #4

Yes, the indexed doc will not be visible for search until the a schedule refresh kicks in, or you call the refresh API. Regarding the filtered query, thats the way to go, with one enhancement: use "my_type._id" for the field name, and not just _id.

-shay.banon
On Saturday, December 4, 2010 at 9:53 PM, Rich Kroll wrote:

Lukas,
Thanks for taking the time to respond. I have not been using the index API to specify IDs as I do not want to create an id generation algo when the IDs provided by ES work perfectly for me already.

I believe I have found a way to get what I need with the following:

curl -XGET 'http://localhost:9200/my_index/my_type/_search?pretty=true' -d '
{
"query" : {
"filtered" : {
"query" : {
"query_string" : {
"query" : "my search query"
}
},
"filter" : {
"term" : { "_id" : "Yg3FWjfHRm62uLdqkl832g" }
}
}
}
}'

The question I have now is - will I run into any problems with visibility of the document in the index using this strategy? I am hoping that since I was returned an ID of the indexed document that I can then immediately perform a search for it.

Regards,

Rich


(Lukáš Vlček) #5

Hi,

as Shay already responded I am just adding links to relevant doc:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/indices/refresh/
http://www.elasticsearch.com/docs/elasticsearch/rest_api/index/#Refresh
refresh_interval:
http://www.elasticsearch.com/docs/elasticsearch/index_modules/

Regards,
Lukas

On Sat, Dec 4, 2010 at 8:53 PM, Rich Kroll kroll.rich@gmail.com wrote:

Lukas,
Thanks for taking the time to respond. I have not been using the index API
to specify IDs as I do not want to create an id generation algo when the IDs
provided by ES work perfectly for me already.

I believe I have found a way to get what I need with the following:

curl -XGET 'http://localhost:9200/my_index/my_type/_search?pretty=true' -d
'
{
"query" : {
"filtered" : {
"query" : {
"query_string" : {
"query" : "my search query"
}
},
"filter" : {
"term" : { "_id" : "Yg3FWjfHRm62uLdqkl832g" }
}
}
}
}'

The question I have now is - will I run into any problems with visibility
of the document in the index using this strategy? I am hoping that since I
was returned an ID of the indexed document that I can then immediately
perform a search for it.

Regards,

Rich


(Rich Kroll) #6

Shay,
Thanks for the tip on the type._id! I thought in the docs that it stated
that there would be a performance hit to manually calling refresh,
especially under large write volumes. I'm concerned with an indexing
performance hit since I'm planning on having very high write load to ES.
Since I have the document in hand already (so to speak), is there any type
of Java api that I could use to see if it would be a hit after it was
indexed? If that is not an option, I thought of potentially creating new
indicies that are short lived (say 5min), and insert the incoming documents
into that index, refresh, and then search it. My reasoning was that this
temporary index would have a much smaller volume of documents, and could
therefore be much easier to refresh. Is that reasoning sound? I would love
to hear any ideas you have on how to solve this challenge in ES.

Regards,
Rich

On Sat, Dec 4, 2010 at 3:16 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Yes, the indexed doc will not be visible for search until the a schedule
refresh kicks in, or you call the refresh API. Regarding the filtered query,
thats the way to go, with one enhancement: use "my_type._id" for the field
name, and not just _id.

-shay.banon

On Saturday, December 4, 2010 at 9:53 PM, Rich Kroll wrote:

Lukas,
Thanks for taking the time to respond. I have not been using the index API
to specify IDs as I do not want to create an id generation algo when the IDs
provided by ES work perfectly for me already.

I believe I have found a way to get what I need with the following:

curl -XGET 'http://localhost:9200/my_index/my_type/_search?pretty=true' -d
'
{
"query" : {
"filtered" : {
"query" : {
"query_string" : {
"query" : "my search query"
}
},
"filter" : {
"term" : { "_id" : "Yg3FWjfHRm62uLdqkl832g" }
}
}
}
}'

The question I have now is - will I run into any problems with visibility
of the document in the index using this strategy? I am hoping that since I
was returned an ID of the indexed document that I can then immediately
perform a search for it.

Regards,

Rich

--
“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein


(Shay Banon) #7

The second index option might be an overhead. If you index each doc with a timestamp, you can have a process that checks all the registered user queries against the last X minutes (lets say 5 minutes). Before checking the user queries, it can issue a refresh request to make sure everything is up to date. Then, once the user queries are checked against that time batch, the process can sleep & check for the next batch.
On Sunday, December 5, 2010 at 12:21 AM, Rich Kroll wrote:

Shay,
Thanks for the tip on the type._id! I thought in the docs that it stated that there would be a performance hit to manually calling refresh, especially under large write volumes. I'm concerned with an indexing performance hit since I'm planning on having very high write load to ES. Since I have the document in hand already (so to speak), is there any type of Java api that I could use to see if it would be a hit after it was indexed? If that is not an option, I thought of potentially creating new indicies that are short lived (say 5min), and insert the incoming documents into that index, refresh, and then search it. My reasoning was that this temporary index would have a much smaller volume of documents, and could therefore be much easier to refresh. Is that reasoning sound? I would love to hear any ideas you have on how to solve this challenge in ES.

Regards,
Rich

On Sat, Dec 4, 2010 at 3:16 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Yes, the indexed doc will not be visible for search until the a schedule refresh kicks in, or you call the refresh API. Regarding the filtered query, thats the way to go, with one enhancement: use "my_type._id" for the field name, and not just _id.

-shay.banon

On Saturday, December 4, 2010 at 9:53 PM, Rich Kroll wrote:

Lukas,
Thanks for taking the time to respond. I have not been using the index API to specify IDs as I do not want to create an id generation algo when the IDs provided by ES work perfectly for me already.

I believe I have found a way to get what I need with the following:

curl -XGET 'http://localhost:9200/my_index/my_type/_search?pretty=true' -d '
{
"query" : {
"filtered" : {
"query" : {
"query_string" : {
"query" : "my search query"
}
},
"filter" : {
"term" : { "_id" : "Yg3FWjfHRm62uLdqkl832g" }
}
}
}
}'

The question I have now is - will I run into any problems with visibility of the document in the index using this strategy? I am hoping that since I was returned an ID of the indexed document that I can then immediately perform a search for it.

Regards,

Rich

--
“We can't solve problems by using the same kind of thinking we used when we created them.” ~ Albert Einstein


(system) #8