Search by field type etc


(saintdave) #1
  • deleted -

(Karussell) #2

it would help to show you queries. I'm not sure if it's a good idea to
have different types for the same field name.

what are your requirements? why not include a type field
(type:tweetWithGUID) or put them into different indicies?

for Java API you can have a look into:

or creating the query via:
https://github.com/karussell/Jetwick/blob/withes/src/main/java/de/jetwick/es/Solr2Elastic.java

On 17 Jan., 22:11, saintdave saint.d...@gmail.com wrote:

Hello List!

I am very interested in using Elastic Search! I've read the archived
mailing list, as well as the documentation, and I have to say that I
am impressed by the features it offers.
Having said that, I've some questions that I have not been able to
answer for myself, though I've tried.

Let me first say that I would prefer to use the java client, rather
than the REST/JSON API.

Having said that, here is first a description of my situation:

I have objects that I would like to index. These objects are all of
the same class, but can have any number of dynamically generated
fields. These fields can only be of certain fixed types.

Here is an example....

Object 1:

"tweet1":{
"type":"tweet",
"guid":{"type":"GUID", "value":"1234"},
"author":{"type":"email_string","value":"m...@example.com"},
"description:{"type":"string","value":"a description of the tweet"},
"text":{"type":"string","value":"the actual tweet, 140 characters"},
"length":{"type","int",value="28"},
"user_tagged_relev_score":{"type":"double","value":"3.4"}

}

Object 2:

"tweet2":{
"type":"tweet",
"guid":{"type":"string", "value":"5678"},
"author":{"type":"string","value":"Jeff"},
"description:{"type":"string","value":"a description of the tweet"},
"text":{"type":"string","value":"the actual tweet, 140 characters"},
"location":{"type":"location","value":{"3.4","2.3"}}

}

You can see from these examples that Obj 1 and Obj 2 are both of type
"tweet", but they have different fields, and they can have different
types for fields of the same name (as in the "guid" field which here
has types "string" and "GUID"). Additionally, some fields can be
"complex", as in the case of the "location" field, which is a lat/long
tuple.

I have been able to index these. As per:

$ curl -XPOST localhost:9200/twitter/tweet/1 -d '{ "tweet":{
"type" : "tweet",
"guid": {"type":"GUID", "value":"1234"},
"author": {"type":"email_string", "value":"m...@example.com"},
"description":{"type":"string", "value":"this is a tweet"},
"text": {"type":"string", "value":"the actual tweet
140 characters"},
"length": {"type":"int", "value":"28"},
"score": {"type":"double", "value":"34"}
}}'

$ curl -XPOST localhost:9200/twitter/tweet/2 -d '{ "tweet":{
"type": "tweet",
"guid": {"type":"string", "value":"5678"},
"author": {"type":"EMAIL", "value":"y...@example.com"},
"description":{"type":"string", "value":"this is a different
tweet"},
"text": {"type":"string", "value":"some test tweet"},
"score": {"type":"int", "value":"1"},
"location": {"type":"location", "value":
{"lat":"2.4","long":"4.0"}}

}}'

$ curl localhost:9200/twitter/tweet/1
{"_index":"twitter","_type":"tweet","_id":"1", "_source" : { "tweet":{
"type" : "tweet",
"guid": {"type":"GUID", "value":"1234"},
"author": {"type":"email_string", "value":"m...@example.com"},
"description":{"type":"string", "value":"this is a tweet"},
"text": {"type":"string", "value":"the actual tweet 140
characters"},
"length": {"type":"int", "value":"28"},
"score": {"type":"double", "value":"34"}

$ curl -XGET localhost:9200/twitter/tweet/2
{"_index":"twitter","_type":"tweet","_id":"2", "_source" : { "tweet":{
"type": "tweet",
"guid": {"type":"string", "value":"5678"},
"author": {"type":"EMAIL", "value":"y...@example.com"},
"description":{"type":"string", "value":"this is a different
tweet"},
"text": {"type":"string", "value":"some test tweet"},
"score": {"type":"int", "value":"1"},
"location": {"type":"location", "value":{"lat":"2.4","long":"4.0"}}

My question now is about performing searches for these items.
In some cases, I want to be able to specify that I want to search for:
"tweets that have a location field" - this should return 2 and not 1
"tweets that have a 'guid' field of type 'GUID'" - which should return
1 and not 2
"any field that has type='EMAIL' which is of value=...@example.com'" -
which should return 1 and not 2
"a tweet with a 'score' field of any type, so long as it is below 4" -
which should return both 1 and 2

I have not been able to get these searches to work correctly. I have
not found on this list, or in the documentation, any obvious way to
deal with the notion of typed fields.
Perhaps I am looking in the wrong place, but after looking for a week
now, I'm willing to ask the list for help.

I hope I have included enough information to explain my problem here.

Thank you very much.


(flowdave) #3

Thanks for the prompt reply!

I agree that it would be best to not have typed fields with shared names. However, user input being what it is, we must support a world wherein a user does not correctly specify the input type, just using the default "string" type. Hence, though I agree that it isn't ideal, I find myself in a position to support having fields of different types and identical names.

Right now, my problem is that I do not have any queries that provide a way to perform the searches I want as I detailed in my original post.
To reiterate though, roughly speaking I would like to be able to search for fields with and without a specified type, as well as to query for only the existence of a field with a specified type.

So, if I had a hypothetical 'tweet' object with a field named "today" with a type of "Date_Type" and a value of "Dec 12, 2001, 12:45:09" I would want to be able to query for:
"objects with a field called 'today'"
"objects with a field of type 'Date_Type'"
"objects with a field called 'today' and a value of 'Dec 12, 2001, 12:45:09'"
"objects with a field of type 'Date_Type' and a value of 'Dec 12, 2001, 12:45:09'"

I hope this makes sense.

(Currently, I'm working in the java api, so I don't have example queries in curl/json format. If it would help I can supply my java code.)

Regards,
-dave


(system) #4