Bool filter is not searching field content


(Bruno Galindro da Costa) #1

This query returns 7 itens:

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"sequenceid": "179781000"
}
}
]
}
}
}
}
}'

Piece of result:

"hits": {
"total": 7,
"max_score": 1,
"hits": [
{
"_index": "modmine",
"_type": "sequencefeature",
"_id": "111",
"_score": 1,
"source": {
"sequenceid": 179781000,
"name": null,
"symbol": null,
"scoretype":
"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320:L4_soma_JK1107_no_DNaseI"
,
"note": null,
"organismid": 11000000,
"length": 2,
"score": null,
"predictionstatus": null,
"chromosomelocationid": 102573059,
"chromosomeid": 11000003,
"secondaryidentifier":
"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048
-wb170"
,
"cytolocation": null,
"scoreprotocolid": 76003281,
"class": "org.intermine.model.bio.Exon",
"id": 102573058,
"primaryidentifier":
"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048
-_wb170"
,
"sequenceontologytermid": 1000002
}
},

If I modify the query to search for id field instead of sequenceid, no
results are displayed. Why?

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"id": "102573058"
}
}
]
}
}
}
}
}'

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0f86ce39-f622-48bc-9d4f-54f07d8d0a59%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

Is it your actual query?
Asking this because I can't see type field in your docs. Wondering how 1st query could match?

Any chance that you create a full curl recreation and gist it?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 23 décembre 2013 at 14:24:23, Bruno Galindro da Costa (bruno.galindro@gmail.com) a écrit:

This query returns 7 itens:

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"sequenceid": "179781000"
}
}
]
}
}
}
}
}'

Piece of result:

"hits": {
"total": 7,
"max_score": 1,
"hits": [
{
"_index": "modmine",
"_type": "sequencefeature",
"_id": "111",
"_score": 1,
"source": {
"sequenceid": 179781000,
"name": null,
"symbol": null,
"scoretype": "RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320:L4_soma_JK1107_no_DNaseI",
"note": null,
"organismid": 11000000,
"length": 2,
"score": null,
"predictionstatus": null,
"chromosomelocationid": 102573059,
"chromosomeid": 11000003,
"secondaryidentifier": "RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048
-wb170",
"cytolocation": null,
"scoreprotocolid": 76003281,
"class": "org.intermine.model.bio.Exon",
"id": 102573058,
"primaryidentifier": "RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048
-_wb170",
"sequenceontologytermid": 1000002
}
},

If I modify the query to search for id field instead of sequenceid, no results are displayed. Why?

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"id": "102573058"
}
}
]
}
}
}
}
}'

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0f86ce39-f622-48bc-9d4f-54f07d8d0a59%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52b83c90.100f8fca.111%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(Bruno Galindro da Costa) #3

Is it your actual query?
Yes

Any chance that you create a full curl recreation and gist it?
Not understood sorry...

Asking this because I can't see type field in your docs. Wondering how
1st query could match?
Yes, it does not have a "type" field.

What I need to do is a query filtering by the id field (not _id) and the
mapping type must be sequencefeature

I have two mapping types in my index (location and sequencefeature). I need
to filter the query by sequencefeature mapping type and by a document
field. The problem is: if I use the id field in the filter, the query
returns zero results; but if I use another field (sequenceid), the query
return 7 results as expected.

I you see, I'm passing an existing value for id field (102573058).

Em segunda-feira, 23 de dezembro de 2013 11h37min20s UTC-2, David Pilato
escreveu:

Is it your actual query?
Asking this because I can't see type field in your docs. Wondering how
1st query could match?

Any chance that you create a full curl recreation and gist it?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 23 décembre 2013 at 14:24:23, Bruno Galindro da Costa (
bruno.g...@gmail.com <javascript:>) a écrit:

This query returns 7 itens:

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"sequenceid": "179781000"
}
}
]
}
}
}
}
}'

Piece of result:

"hits": {
  "total": 7,
  "max_score": 1,
  "hits": [
     {
        "_index": "modmine",
        "_type": "sequencefeature",
        "_id": "111",
        "_score": 1,
        "_source": {
           "sequenceid": 179781000,
           "name": null,
           "symbol": null,
           "scoretype": 

"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320:L4_soma_JK1107_no_DNaseI"
,
"note": null,
"organismid": 11000000,
"length": 2,
"score": null,
"predictionstatus": null,
"chromosomelocationid": 102573059,
"chromosomeid": 11000003,
"secondaryidentifier":
"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048_-wb170"
,
"cytolocation": null,
"scoreprotocolid": 76003281,
"class": "org.intermine.model.bio.Exon",
"id": 102573058,
"primaryidentifier":
"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048
-_wb170"
,
"sequenceontologytermid": 1000002
}
},

If I modify the query to search for id field instead of sequenceid,
no results are displayed. Why?

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"id": "102573058"
}
}
]
}
}
}
}
}'

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0f86ce39-f622-48bc-9d4f-54f07d8d0a59%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e9c16ed-9388-46eb-be45-6169167b2841%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Bruno Galindro da Costa) #4

Hmmm. I think I know why, but I don't know how to make a workaround for
this.

First I'll clarify the environment:

The data was imported from a genome database called modmine. Location and
Sequencefeature are tables of it. So, I've built an index called modmine to
represent the database and two mapping types (location and sequencefeature)
to represent the tables:

Index name: modmine
Mapping types: location and sequencefeature

As you can see bellow, the id field is present in both of the mapping
types, as well as others (organismid, chromosomeid, ...). So, If I try to
do a bool filter by type AND by one of those fields, the query displays
zero results. If I use a "unique" field, the query returns the results as
expected.

How can I solve my problem? I'll need to split data in two indexes instead
of into two mapping types?

Here are the type mappings:

  "mappings": {
     "location": {
        "properties": {
           "id": {
              "type": "integer"
           },
           "featureid": {
              "type": "integer"
           },
           "strand": {
              "type": "multi_field",
              "fields": {
                 "strand": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "organismid": {
              "type": "integer"
           },
           "intermine_start": {
              "type": "integer"
           },
           "class": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "class": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "intermine_end": {
              "type": "integer"
           },
           "locatedonid": {
              "type": "integer"
           },
           "chromosomeid": {
              "type": "integer"
           }
        }
     },
     "sequencefeature": {
        "properties": {
           "sequenceid": {
              "type": "integer"
           },
           "symbol": {
              "type": "multi_field",
              "fields": {
                 "symbol": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "secondaryidentifier": {
              "type": "multi_field",
              "fields": {
                 "secondaryidentifier": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "primaryidentifier": {
              "type": "multi_field",
              "fields": {
                 "primaryidentifier": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "score": {
              "type": "double"
           },
           "class": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "class": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "scoreprotocolid": {
              "type": "integer"
           },
           "id": {
              "type": "integer"
           },
           "predictionstatus": {
              "type": "multi_field",
              "fields": {
                 "predictionstatus": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "chromosomelocationid": {
              "type": "integer"
           },
           "sequenceontologytermid": {
              "type": "integer"
           },
           "organismid": {
              "type": "integer"
           },
           "name": {
              "type": "multi_field",
              "fields": {
                 "name": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "length": {
              "type": "integer"
           },
           "cytolocation": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "cytolocation": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "chromosomeid": {
              "type": "integer"
           },
           "scoretype": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "scoretype": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "note": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "note": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           }
        }
     }
  }

}

Em segunda-feira, 23 de dezembro de 2013 13h38min30s UTC-2, Bruno Galindro
da Costa escreveu:

Is it your actual query?
Yes

Any chance that you create a full curl recreation and gist it?
Not understood sorry...

Asking this because I can't see type field in your docs. Wondering how
1st query could match?
Yes, it does not have a "type" field.

What I need to do is a query filtering by the id field (not _id) and the
mapping type must be sequencefeature

I have two mapping types in my index (location and sequencefeature). I
need to filter the query by sequencefeature mapping type and by a document
field. The problem is: if I use the id field in the filter, the query
returns zero results; but if I use another field (sequenceid), the query
return 7 results as expected.

I you see, I'm passing an existing value for id field (102573058).

Em segunda-feira, 23 de dezembro de 2013 11h37min20s UTC-2, David Pilato
escreveu:

Is it your actual query?
Asking this because I can't see type field in your docs. Wondering how
1st query could match?

Any chance that you create a full curl recreation and gist it?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 23 décembre 2013 at 14:24:23, Bruno Galindro da Costa (
bruno.g...@gmail.com) a écrit:

This query returns 7 itens:

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"sequenceid": "179781000"
}
}
]
}
}
}
}
}'

Piece of result:

"hits": {
  "total": 7,
  "max_score": 1,
  "hits": [
     {
        "_index": "modmine",
        "_type": "sequencefeature",
        "_id": "111",
        "_score": 1,
        "_source": {
           "sequenceid": 179781000,
           "name": null,
           "symbol": null,
           "scoretype": 

"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320:L4_soma_JK1107_no_DNaseI"
,
"note": null,
"organismid": 11000000,
"length": 2,
"score": null,
"predictionstatus": null,
"chromosomelocationid": 102573059,
"chromosomeid": 11000003,
"secondaryidentifier":
"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048_-wb170"
,
"cytolocation": null,
"scoreprotocolid": 76003281,
"class": "org.intermine.model.bio.Exon",
"id": 102573058,
"primaryidentifier":
"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048
-_wb170"
,
"sequenceontologytermid": 1000002
}
},

If I modify the query to search for id field instead of sequenceid,
no results are displayed. Why?

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"id": "102573058"
}
}
]
}
}
}
}
}'

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0f86ce39-f622-48bc-9d4f-54f07d8d0a59%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/43700739-bb0a-4fae-81de-73692f93a4d5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #5

Gist a curl recreation means what we wrote here: http://www.elasticsearch.org/help/
It helps a lot to understand your concern and get the best answer as possible.

About your concern, may be you should consider to denormalize your two tables into a single document?

Hard to say more without some samples and your actual problem.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 23 décembre 2013 at 16:57:40, Bruno Galindro da Costa (bruno.galindro@gmail.com) a écrit:

Hmmm. I think I know why, but I don't know how to make a workaround for this.

First I'll clarify the environment:

The data was imported from a genome database called modmine. Location and Sequencefeature are tables of it. So, I've built an index called modmine to represent the database and two mapping types (location and sequencefeature) to represent the tables:

Index name: modmine
Mapping types: location and sequencefeature

As you can see bellow, the id field is present in both of the mapping types, as well as others (organismid, chromosomeid, ...). So, If I try to do a bool filter by type AND by one of those fields, the query displays zero results. If I use a "unique" field, the query returns the results as expected.

How can I solve my problem? I'll need to split data in two indexes instead of into two mapping types?

Here are the type mappings:

  "mappings": {
     "location": {
        "properties": {
           "id": {
              "type": "integer"
           },
           "featureid": {
              "type": "integer"
           },
           "strand": {
              "type": "multi_field",
              "fields": {
                 "strand": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "organismid": {
              "type": "integer"
           },
           "intermine_start": {
              "type": "integer"
           },
           "class": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "class": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "intermine_end": {
              "type": "integer"
           },
           "locatedonid": {
              "type": "integer"
           },
           "chromosomeid": {
              "type": "integer"
           }
        }
     },
     "sequencefeature": {
        "properties": {
           "sequenceid": {
              "type": "integer"
           },
           "symbol": {
              "type": "multi_field",
              "fields": {
                 "symbol": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "secondaryidentifier": {
              "type": "multi_field",
              "fields": {
                 "secondaryidentifier": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "primaryidentifier": {
              "type": "multi_field",
              "fields": {
                 "primaryidentifier": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "score": {
              "type": "double"
           },
           "class": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "class": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "scoreprotocolid": {
              "type": "integer"
           },
           "id": {
              "type": "integer"
           },
           "predictionstatus": {
              "type": "multi_field",
              "fields": {
                 "predictionstatus": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "chromosomelocationid": {
              "type": "integer"
           },
           "sequenceontologytermid": {
              "type": "integer"
           },
           "organismid": {
              "type": "integer"
           },
           "name": {
              "type": "multi_field",
              "fields": {
                 "name": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "length": {
              "type": "integer"
           },
           "cytolocation": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "cytolocation": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "chromosomeid": {
              "type": "integer"
           },
           "scoretype": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "scoretype": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "note": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "note": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           }
        }
     }
  }

}

Em segunda-feira, 23 de dezembro de 2013 13h38min30s UTC-2, Bruno Galindro da Costa escreveu:
Is it your actual query?
Yes

Any chance that you create a full curl recreation and gist it?
Not understood sorry...

Asking this because I can't see type field in your docs. Wondering how 1st query could match?
Yes, it does not have a "type" field.

What I need to do is a query filtering by the id field (not _id) and the mapping type must be sequencefeature

I have two mapping types in my index (location and sequencefeature). I need to filter the query by sequencefeature mapping type and by a document field. The problem is: if I use the id field in the filter, the query returns zero results; but if I use another field (sequenceid), the query return 7 results as expected.

I you see, I'm passing an existing value for id field (102573058).

Em segunda-feira, 23 de dezembro de 2013 11h37min20s UTC-2, David Pilato escreveu:
Is it your actual query?
Asking this because I can't see type field in your docs. Wondering how 1st query could match?

Any chance that you create a full curl recreation and gist it?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 23 décembre 2013 at 14:24:23, Bruno Galindro da Costa (bruno.g...@gmail.com) a écrit:

This query returns 7 itens:

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"sequenceid": "179781000"
}
}
]
}
}
}
}
}'

Piece of result:

"hits": {
"total": 7,
"max_score": 1,
"hits": [
{
"_index": "modmine",
"_type": "sequencefeature",
"_id": "111",
"_score": 1,
"source": {
"sequenceid": 179781000,
"name": null,
"symbol": null,
"scoretype": "RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320:L4_soma_JK1107_no_DNaseI",
"note": null,
"organismid": 11000000,
"length": 2,
"score": null,
"predictionstatus": null,
"chromosomelocationid": 102573059,
"chromosomeid": 11000003,
"secondaryidentifier": "RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048
-wb170",
"cytolocation": null,
"scoreprotocolid": 76003281,
"class": "org.intermine.model.bio.Exon",
"id": 102573058,
"primaryidentifier": "RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048
-_wb170",
"sequenceontologytermid": 1000002
}
},

If I modify the query to search for id field instead of sequenceid, no results are displayed. Why?

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"id": "102573058"
}
}
]
}
}
}
}
}'

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0f86ce39-f622-48bc-9d4f-54f07d8d0a59%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/43700739-bb0a-4fae-81de-73692f93a4d5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52b85ede.168e121f.111%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(Bruno Galindro da Costa) #6

About your concern, may be you should consider to denormalize your two
tables into a single document?
The fields have the same name in diferent tables, but they could have
different values too. I know that I can change the field names and put a
suffix in it to represent the tables, but I need to imput data into
elasticsearch in the most similar way as it is in the relational database,
to avoid confusion.

I'll make more tests here an return ASAP.

Em segunda-feira, 23 de dezembro de 2013 14h03min42s UTC-2, David Pilato
escreveu:

Gist a curl recreation means what we wrote here:
http://www.elasticsearch.org/help/
It helps a lot to understand your concern and get the best answer as
possible.

About your concern, may be you should consider to denormalize your two
tables into a single document?

Hard to say more without some samples and your actual problem.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 23 décembre 2013 at 16:57:40, Bruno Galindro da Costa (
bruno.g...@gmail.com <javascript:>) a écrit:

Hmmm. I think I know why, but I don't know how to make a workaround for
this.

First I'll clarify the environment:

The data was imported from a genome database called modmine. Location
and Sequencefeature are tables of it. So, I've built an index called
modmine to represent the database and two mapping types (location and sequencefeature)
to represent the tables:

Index name: modmine
Mapping types: location and sequencefeature

As you can see bellow, the id field is present in both of the mapping
types, as well as others (organismid, chromosomeid, ...). So, If I try to
do a bool filter by type AND by one of those fields, the query displays
zero results. If I use a "unique" field, the query returns the results as
expected.

How can I solve my problem? I'll need to split data in two indexes instead
of into two mapping types?

Here are the type mappings:

   "mappings": {
     "location": {
        "properties": {
           "id": {
              "type": "integer"
           },
           "featureid": {
              "type": "integer"
           },
           "strand": {
              "type": "multi_field",
              "fields": {
                 "strand": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "organismid": {
              "type": "integer"
           },
           "intermine_start": {
              "type": "integer"
           },
           "class": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "class": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "intermine_end": {
              "type": "integer"
           },
           "locatedonid": {
              "type": "integer"
           },
           "chromosomeid": {
              "type": "integer"
           }
        }
     },
     "sequencefeature": {
        "properties": {
           "sequenceid": {
              "type": "integer"
           },
           "symbol": {
              "type": "multi_field",
              "fields": {
                 "symbol": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "secondaryidentifier": {
              "type": "multi_field",
              "fields": {
                 "secondaryidentifier": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "primaryidentifier": {
              "type": "multi_field",
              "fields": {
                 "primaryidentifier": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "score": {
              "type": "double"
           },
           "class": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "class": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "scoreprotocolid": {
              "type": "integer"
           },
           "id": {
              "type": "integer"
           },
           "predictionstatus": {
              "type": "multi_field",
              "fields": {
                 "predictionstatus": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "chromosomelocationid": {
              "type": "integer"
           },
           "sequenceontologytermid": {
              "type": "integer"
           },
           "organismid": {
              "type": "integer"
           },
           "name": {
              "type": "multi_field",
              "fields": {
                 "name": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "length": {
              "type": "integer"
           },
           "cytolocation": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "cytolocation": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "chromosomeid": {
              "type": "integer"
           },
           "scoretype": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "scoretype": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "note": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "note": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           }
        }
     }
  }

}

Em segunda-feira, 23 de dezembro de 2013 13h38min30s UTC-2, Bruno Galindro
da Costa escreveu:

Is it your actual query?
Yes

Any chance that you create a full curl recreation and gist it?
Not understood sorry...

Asking this because I can't see type field in your docs. Wondering how
1st query could match?
Yes, it does not have a "type" field.

What I need to do is a query filtering by the id field (not _id) and the
mapping type must be sequencefeature

I have two mapping types in my index (location and sequencefeature). I
need to filter the query by sequencefeature mapping type and by a document
field. The problem is: if I use the id field in the filter, the query
returns zero results; but if I use another field (sequenceid), the query
return 7 results as expected.

I you see, I'm passing an existing value for id field (102573058).

Em segunda-feira, 23 de dezembro de 2013 11h37min20s UTC-2, David Pilato
escreveu:

Is it your actual query?
Asking this because I can't see type field in your docs. Wondering
how 1st query could match?

Any chance that you create a full curl recreation and gist it?

 -- 

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 23 décembre 2013 at 14:24:23, Bruno Galindro da Costa (
bruno.g...@gmail.com) a écrit:

This query returns 7 itens:

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"sequenceid": "179781000"
}
}
]
}
}
}
}
}'

Piece of result:

"hits": {
  "total": 7,
  "max_score": 1,
  "hits": [
     {
        "_index": "modmine",
        "_type": "sequencefeature",
        "_id": "111",
        "_score": 1,
        "_source": {
           "sequenceid": 179781000,
           "name": null,
           "symbol": null,
           "scoretype": 

"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320:L4_soma_JK1107_no_DNaseI"
,
"note": null,
"organismid": 11000000,
"length": 2,
"score": null,
"predictionstatus": null,
"chromosomelocationid": 102573059,
"chromosomeid": 11000003,
"secondaryidentifier":
"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048_-wb170"
,
"cytolocation": null,
"scoreprotocolid": 76003281,
"class": "org.intermine.model.bio.Exon",
"id": 102573058,
"primaryidentifier":
"RNAseq___L4_soma_JK1107_no_DNaseI_genelets_revised_100320.exon_I_10074047_10074048
-_wb170"
,
"sequenceontologytermid": 1000002
}
},

If I modify the query to search for id field instead of sequenceid,
no results are displayed. Why?

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"id": "102573058"
}
}
]
}
}
}
}
}'

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0f86ce39-f622-48bc-9d4f-54f07d8d0a59%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/43700739-bb0a-4fae-81de-73692f93a4d5%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00187a8e-20a7-4937-843e-e76d6606d4a7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Bruno Galindro da Costa) #7

After did some tests, I found the problem and a workaround for "solve" it.

To index documents, I was using the Official Python low-level client for
Elasticsearch https://github.com/elasticsearch/elasticsearch-py. For some
reason, after index some documents in location type, if I index documents
in other type (sequencefeature in my case) the reported problem occurs.
I've did a curl recriation to validate this and from curl command line,
everything works great.

Another strange behaviour is that the above client was duplicating the
first document indexed. Maybe this is because the index method call is
inside a loop. So, I've decided to put statical data in the index method
call outside the loop. But the same behaviour occured.

So, I've decided to use another client ->
pyeshttp://pyes.readthedocs.org/en/latest/.
Now everything is working properly.

This is my final script:

2013/12/23 Bruno Galindro da Costa bruno.galindro@gmail.com

About your concern, may be you should consider to denormalize your two
tables into a single document?
The fields have the same name in diferent tables, but they could have
different values too. I know that I can change the field names and put a
suffix in it to represent the tables, but I need to imput data into
elasticsearch in the most similar way as it is in the relational database,
to avoid confusion.

I'll make more tests here an return ASAP.

Em segunda-feira, 23 de dezembro de 2013 14h03min42s UTC-2, David Pilato
escreveu:

Gist a curl recreation means what we wrote here: http://www.
elasticsearch.org/help/
It helps a lot to understand your concern and get the best answer as
possible.

About your concern, may be you should consider to denormalize your two
tables into a single document?

Hard to say more without some samples and your actual problem.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 23 décembre 2013 at 16:57:40, Bruno Galindro da Costa (
bruno.g...@gmail.com) a écrit:

Hmmm. I think I know why, but I don't know how to make a workaround for
this.

First I'll clarify the environment:

The data was imported from a genome database called modmine. Location
and Sequencefeature are tables of it. So, I've built an index called
modmine to represent the database and two mapping types (location and sequencefeature)
to represent the tables:

Index name: modmine
Mapping types: location and sequencefeature

As you can see bellow, the id field is present in both of the mapping
types, as well as others (organismid, chromosomeid, ...). So, If I try to
do a bool filter by type AND by one of those fields, the query displays
zero results. If I use a "unique" field, the query returns the results as
expected.

How can I solve my problem? I'll need to split data in two indexes
instead of into two mapping types?

Here are the type mappings:

   "mappings": {
     "location": {
        "properties": {
           "id": {
              "type": "integer"
           },
           "featureid": {
              "type": "integer"
           },
           "strand": {
              "type": "multi_field",
              "fields": {
                 "strand": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "organismid": {
              "type": "integer"
           },
           "intermine_start": {
              "type": "integer"
           },
           "class": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "class": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "intermine_end": {
              "type": "integer"
           },
           "locatedonid": {
              "type": "integer"
           },
           "chromosomeid": {
              "type": "integer"
           }
        }
     },
     "sequencefeature": {
        "properties": {
           "sequenceid": {
              "type": "integer"
           },
           "symbol": {
              "type": "multi_field",
              "fields": {
                 "symbol": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "secondaryidentifier": {
              "type": "multi_field",
              "fields": {
                 "secondaryidentifier": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "primaryidentifier": {
              "type": "multi_field",
              "fields": {
                 "primaryidentifier": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "score": {
              "type": "double"
           },
           "class": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "class": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "scoreprotocolid": {
              "type": "integer"
           },
           "id": {
              "type": "integer"
           },
           "predictionstatus": {
              "type": "multi_field",
              "fields": {
                 "predictionstatus": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "chromosomelocationid": {
              "type": "integer"
           },
           "sequenceontologytermid": {
              "type": "integer"
           },
           "organismid": {
              "type": "integer"
           },
           "name": {
              "type": "multi_field",
              "fields": {
                 "name": {
                    "index": "analyzed",
                    "type": "string"
                 },
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 }
              }
           },
           "length": {
              "type": "integer"
           },
           "cytolocation": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "cytolocation": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "chromosomeid": {
              "type": "integer"
           },
           "scoretype": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "scoretype": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           },
           "note": {
              "type": "multi_field",
              "fields": {
                 "original": {
                    "index": "not_analyzed",
                    "type": "string"
                 },
                 "note": {
                    "index": "analyzed",
                    "type": "string"
                 }
              }
           }
        }
     }
  }

}

Em segunda-feira, 23 de dezembro de 2013 13h38min30s UTC-2, Bruno
Galindro da Costa escreveu:

Is it your actual query?
Yes

Any chance that you create a full curl recreation and gist it?
Not understood sorry...

Asking this because I can't see type field in your docs. Wondering how
1st query could match?
Yes, it does not have a "type" field.

What I need to do is a query filtering by the id field (not _id) and the
mapping type must be sequencefeature

I have two mapping types in my index (location and sequencefeature). I
need to filter the query by sequencefeature mapping type and by a document
field. The problem is: if I use the id field in the filter, the query
returns zero results; but if I use another field (sequenceid), the
query return 7 results as expected.

I you see, I'm passing an existing value for id field (102573058).

Em segunda-feira, 23 de dezembro de 2013 11h37min20s UTC-2, David Pilato
escreveu:

Is it your actual query?
Asking this because I can't see type field in your docs. Wondering
how 1st query could match?

Any chance that you create a full curl recreation and gist it?

 --

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 23 décembre 2013 at 14:24:23, Bruno Galindro da Costa (
bruno.g...@gmail.com) a écrit:

This query returns 7 itens:

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"sequenceid": "179781000"
}
}
]
}
}
}
}
}'

Piece of result:

"hits": {
  "total": 7,
  "max_score": 1,
  "hits": [
     {
        "_index": "modmine",
        "_type": "sequencefeature",
        "_id": "111",
        "_score": 1,
        "_source": {
           "sequenceid": 179781000,
           "name": null,
           "symbol": null,
           "scoretype": "RNAseq___L4_soma_JK1107_no_

DNaseI_genelets_revised_100320:L4_soma_JK1107_no_DNaseI",
"note": null,
"organismid": 11000000,
"length": 2,
"score": null,
"predictionstatus": null,
"chromosomelocationid": 102573059,
"chromosomeid": 11000003,
"secondaryidentifier": "RNAseq___L4_soma_JK1107_no_
DNaseI_genelets_revised_100320.exon_I_10074047_10074048_-wb170",
"cytolocation": null,
"scoreprotocolid": 76003281,
"class": "org.intermine.model.bio.Exon",
"id": 102573058,
"primaryidentifier": "RNAseq___L4_soma_JK1107_no

DNaseI_genelets_revised_100320.exon_I_10074047_10074048_-_wb170",
"sequenceontologytermid": 1000002
}
},

If I modify the query to search for id field instead of sequenceid,
no results are displayed. Why?

curl -XGET "http://localhost:9200/modmine/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "sequencefeature"
}
}
],
"must": [
{
"term": {
"id": "102573058"
}
}
]
}
}
}
}
}'

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0f86ce39-f622-48bc-9d4f-54f07d8d0a59%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/43700739-bb0a-4fae-81de-73692f93a4d5%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/6vrDqTJ9GGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00187a8e-20a7-4937-843e-e76d6606d4a7%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Att.
Bruno Galindro da Costa

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE%3DUkuqsG5x1U82i_SVebrzKfoWo4Us%2BKqBTUPquQb89ECCNxw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8