Search by field type etc


(flowdave) #1

Hello List!

I am very interested in using Elastic Search! I've read the archived mailing list, as well as the documentation, and I have to say that I am impressed by the features it offers.
Having said that, I've some questions that I have not been able to answer for myself, though I've tried.

Let me first say that I would prefer to use the java client, rather than the REST/JSON API.

Having said that, here is first a description of my situation:

I have objects that I would like to index. These objects are all of the same class, but can have any number of dynamically generated fields. These fields can only be of certain fixed types.

Here is an example....

Object 1:

"tweet1":{
"type":"tweet",
"guid":{"type":"GUID", "value":"1234"},
"author":{"type":"email_string","value":"me@example.com"},
"description:{"type":"string","value":"a description of the tweet"},
"text":{"type":"string","value":"the actual tweet, 140 characters"},
"length":{"type","int",value="28"},
"user_tagged_relev_score":{"type":"double","value":"3.4"}
}

Object 2:

"tweet2":{
"type":"tweet",
"guid":{"type":"string", "value":"5678"},
"author":{"type":"string","value":"Jeff"},
"description:{"type":"string","value":"a description of the tweet"},
"text":{"type":"string","value":"the actual tweet, 140 characters"},
"location":{"type":"location","value":{"3.4","2.3"}}
}

You can see from these examples that Obj 1 and Obj 2 are both of type "tweet", but they have different fields, and they can have different types for fields of the same name (as in the "guid" field which here has types "string" and "GUID"). Additionally, some fields can be "complex", as in the case of the "location" field, which is a lat/long tuple.

I have been able to index these. As per:

$ curl -XPOST localhost:9200/twitter/tweet/1 -d '{ "tweet":{
"type" : "tweet",
"guid": {"type":"GUID", "value":"1234"},
"author": {"type":"email_string", "value":"me@example.com"},
"description":{"type":"string", "value":"this is a tweet"},
"text": {"type":"string", "value":"the actual tweet 140 characters"},
"length": {"type":"int", "value":"28"},
"score": {"type":"double", "value":"34"}
}}'

$ curl -XPOST localhost:9200/twitter/tweet/2 -d '{ "tweet":{
"type": "tweet",
"guid": {"type":"string", "value":"5678"},
"author": {"type":"EMAIL", "value":"you@example.com"},
"description":{"type":"string", "value":"this is a different tweet"},
"text": {"type":"string", "value":"some test tweet"},
"score": {"type":"int", "value":"1"},
"location": {"type":"location", "value":{"lat":"2.4","long":"4.0"}}
}}'

$ curl localhost:9200/twitter/tweet/1
{"_index":"twitter","_type":"tweet","_id":"1", "_source" : { "tweet":{
"type" : "tweet",
"guid": {"type":"GUID", "value":"1234"},
"author": {"type":"email_string", "value":"me@example.com"},
"description":{"type":"string", "value":"this is a tweet"},
"text": {"type":"string", "value":"the actual tweet 140 characters"},
"length": {"type":"int", "value":"28"},
"score": {"type":"double", "value":"34"}

$ curl -XGET localhost:9200/twitter/tweet/2
{"_index":"twitter","_type":"tweet","_id":"2", "_source" : { "tweet":{
"type": "tweet",
"guid": {"type":"string", "value":"5678"},
"author": {"type":"EMAIL", "value":"you@example.com"},
"description":{"type":"string", "value":"this is a different tweet"},
"text": {"type":"string", "value":"some test tweet"},
"score": {"type":"int", "value":"1"},
"location": {"type":"location", "value":{"lat":"2.4","long":"4.0"}}

My question now is about performing searches for these items.
In some cases, I want to be able to specify that I want to search for:
"tweets that have a location field" - this should return 2 and not 1
"tweets that have a 'guid' field of type 'GUID'" - which should return 1 and not 2
"any field that has type='EMAIL' which is of value='me@example.com'" - which should return 1 and not 2
"a tweet with a 'score' field of any type, so long as it is below 4" - which should return both 1 and 2

I have not been able to get these searches to work correctly. I have not found on this list, or in the documentation, any obvious way to deal with the notion of typed fields.
Perhaps I am looking in the wrong place, but after looking for a week now, I'm willing to ask the list for help.

I hope I have included enough information to explain my problem here.

Thank you very much.
-dave


(system) #2