Elasticsearch not returning all relevant results

I am using elastic search to search for files stored in MongoDB. I would
like to retrieve all files whose name match a pattern. When I queried in
MongoDB it returns 6754 files.

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with elastic search it return only 85
files. Any way to get all the files in elastic search?

curl -XGET 

"localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1"
-d '{"query" : {
"field" : {
"filename" : "Mail"
}
}
}'

{
  "_scroll_id" : 

"c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 85,
"max_score" : 0.0,
"hits" : [ ]
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

How did you index those documents in elasticsearch?
What gives:

curl -XGET "localhost:9200/submission_idx/files/_count?pretty"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 01:04, KS kiruthi.s19@gmail.com a écrit :

I am using elastic search to search for files stored in MongoDB. I would like to retrieve all files whose name match a pattern. When I queried in MongoDB it returns 6754 files.

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with elastic search it return only 85 files. Any way to get all the files in elastic search?

curl -XGET "localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1" -d '{"query" : {
"field" : {
        "filename" : "*Mail*"
    }                           
}                            
}'

{
  "_scroll_id" : "c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 85,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David,

Please find the detail you asked for below:

curl -XGET "localhost:9200/submission_idx/files/_count?pretty"
{
"count" : 34591,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}

Also I used the below code to create index in elasticsearch:

curl -XPUT "localhost:9200/_river/fs/_meta" -d'
{
"type": "mongodb",
"mongodb": {
"db": "submission_data",
"collection": "fs",
"gridfs": true
},
"index": {
"name": "submission_idx",
"type": "files",
"content_type": "text/plain"
}
}'

On Tuesday, September 10, 2013 6:35:31 PM UTC-7, David Pilato wrote:

How did you index those documents in elasticsearch?
What gives:

curl -XGET "localhost:9200/submission_idx/files/_count?pretty"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 01:04, KS <kirut...@gmail.com <javascript:>> a écrit :

I am using Elasticsearch to search for files stored in MongoDB. I would
like to retrieve all files whose name match a pattern. When I queried in
MongoDB it returns 6754 files.

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with Elasticsearch it return only 85
files. Any way to get all the files in Elasticsearch?

curl -XGET 

"localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1"
-d '{"query" : {
"field" : {
"filename" : "Mail"
}
}
}'

{
  "_scroll_id" : 

"c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 85,
"max_score" : 0.0,
"hits" :
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David,

Please find the detail you asked for below:

curl -XGET "localhost:9200/submission_
idx/files/_count?pretty"
{
"count" : 34591,

"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}

Also I used the below code to create index in elasticsearch:

curl -XPUT "localhost:9200/_river/fs/_meta" -d'
{
"type": "mongodb",
"mongodb": {
"db": "submission_data",
"collection": "fs",
"gridfs": true
},
"index": {
"name": "submission_idx",
"type": "files",
"content_type": "text/plain"
}
}'

After running the query you gave I found out that all my files in MongoDB
have not been indexed by Elasticsearch. I have 1637870 files in DB but only
34591 files have been indexed. Why did not Elasticsearch index rest of the
files?

Appreciate your help.

Thanks,
Kiruthika

On Tuesday, September 10, 2013 6:35:31 PM UTC-7, David Pilato wrote:

How did you index those documents in elasticsearch?
What gives:

curl -XGET "localhost:9200/submission_idx/files/_count?pretty"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 01:04, KS <kirut...@gmail.com <javascript:>> a écrit :

I am using Elasticsearch to search for files stored in MongoDB. I would
like to retrieve all files whose name match a pattern. When I queried in
MongoDB it returns 6754 files.

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with Elasticsearch it return only 85
files. Any way to get all the files in Elasticsearch?

curl -XGET 

"localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1"
-d '{"query" : {
"field" : {
"filename" : "Mail"
}
}
}'

{
  "_scroll_id" : 

"c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 85,
"max_score" : 0.0,
"hits" :
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

And what gives:

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:Mail"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 06:24, KS kiruthi.s19@gmail.com a écrit :

Hi David,

Please find the detail you asked for below:

curl -XGET "localhost:9200/submission_
idx/files/_count?pretty"
{
"count" : 34591,

"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}

Also I used the below code to create index in elasticsearch:

curl -XPUT "localhost:9200/_river/fs/_meta" -d'
{
"type": "mongodb",
"mongodb": {
"db": "submission_data",
"collection": "fs",
"gridfs": true
},
"index": {
"name": "submission_idx",
"type": "files",
"content_type": "text/plain"
}
}'

After running the query you gave I found out that all my files in MongoDB have not been indexed by Elasticsearch. I have 1637870 files in DB but only 34591 files have been indexed. Why did not Elasticsearch index rest of the files?

Appreciate your help.

Thanks,
Kiruthika

On Tuesday, September 10, 2013 6:35:31 PM UTC-7, David Pilato wrote:

How did you index those documents in elasticsearch?
What gives:

curl -XGET "localhost:9200/submission_idx/files/_count?pretty"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 01:04, KS kirut...@gmail.com a écrit :

I am using Elasticsearch to search for files stored in MongoDB. I would like to retrieve all files whose name match a pattern. When I queried in MongoDB it returns 6754 files.

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with Elasticsearch it return only 85 files. Any way to get all the files in Elasticsearch?

curl -XGET "localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1" -d '{"query" : {
"field" : {
        "filename" : "*Mail*"
    }                           
}                            
}'

{
  "_scroll_id" : "c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 85,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David, I ran the curl command and it returns only 85 files. Could it be
the reason that Elasticsearch returns only 85 files because it has not
indexed all the files present in MongoDB? Elasticsearch have indexed only
34591 files whereas I have 1637870 files in DB. Any idea on why it have not
indexed all the files?

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:Mail"
{
"count" : 85,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}

Thanks much,
Kiruthika

On Tuesday, September 10, 2013 10:38:37 PM UTC-7, David Pilato wrote:

And what gives:

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:Mail"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 06:24, KS <kirut...@gmail.com <javascript:>> a écrit :

Hi David,

Please find the detail you asked for below:

curl -XGET "localhost:9200/submission_
idx/files/_count?pretty"
{
"count" : 34591,

"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}

Also I used the below code to create index in elasticsearch:

curl -XPUT "localhost:9200/_river/fs/_meta" -d'
{
"type": "mongodb",
"mongodb": {
"db": "submission_data",
"collection": "fs",
"gridfs": true
},
"index": {
"name": "submission_idx",
"type": "files",
"content_type": "text/plain"
}
}'

After running the query you gave I found out that all my files in MongoDB
have not been indexed by Elasticsearch. I have 1637870 files in DB but only
34591 files have been indexed. Why did not Elasticsearch index rest of the
files?

Appreciate your help.

Thanks,
Kiruthika

On Tuesday, September 10, 2013 6:35:31 PM UTC-7, David Pilato wrote:

How did you index those documents in elasticsearch?
What gives:

curl -XGET "localhost:9200/submission_idx/files/_count?pretty"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 01:04, KS kirut...@gmail.com a écrit :

I am using Elasticsearch to search for files stored in MongoDB. I would
like to retrieve all files whose name match a pattern. When I queried in
MongoDB it returns 6754 files.

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with Elasticsearch it return only 85
files. Any way to get all the files in Elasticsearch?

curl -XGET 

"localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1"
-d '{"query" : {
"field" : {
"filename" : "Mail"
}
}
}'

{
  "_scroll_id" : 

"c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 85,
"max_score" : 0.0,
"hits" :
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Probably. I was not aware that you don't have the same number of docs on both systems.
I don't have any idea why you have this difference.

I would recommend to check your logs, check if some docs have the same Id and then are updated and not added.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 08:09, KS kiruthi.s19@gmail.com a écrit :

Hi David, I ran the curl command and it returns only 85 files. Could it be the reason that Elasticsearch returns only 85 files because it has not indexed all the files present in MongoDB? Elasticsearch have indexed only 34591 files whereas I have 1637870 files in DB. Any idea on why it have not indexed all the files?

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:Mail"
{
"count" : 85,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}

Thanks much,
Kiruthika

On Tuesday, September 10, 2013 10:38:37 PM UTC-7, David Pilato wrote:

And what gives:

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:Mail"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 06:24, KS kirut...@gmail.com a écrit :

Hi David,

Please find the detail you asked for below:

curl -XGET "localhost:9200/submission_
idx/files/_count?pretty"
{
"count" : 34591,

"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}

Also I used the below code to create index in elasticsearch:

curl -XPUT "localhost:9200/_river/fs/_meta" -d'
{
"type": "mongodb",
"mongodb": {
"db": "submission_data",
"collection": "fs",
"gridfs": true
},
"index": {
"name": "submission_idx",
"type": "files",
"content_type": "text/plain"
}
}'

After running the query you gave I found out that all my files in MongoDB have not been indexed by Elasticsearch. I have 1637870 files in DB but only 34591 files have been indexed. Why did not Elasticsearch index rest of the files?

Appreciate your help.

Thanks,
Kiruthika

On Tuesday, September 10, 2013 6:35:31 PM UTC-7, David Pilato wrote:

How did you index those documents in elasticsearch?
What gives:

curl -XGET "localhost:9200/submission_idx/files/_count?pretty"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 01:04, KS kirut...@gmail.com a écrit :

I am using Elasticsearch to search for files stored in MongoDB. I would like to retrieve all files whose name match a pattern. When I queried in MongoDB it returns 6754 files.

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with Elasticsearch it return only 85 files. Any way to get all the files in Elasticsearch?

curl -XGET "localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1" -d '{"query" : {
"field" : {
        "filename" : "*Mail*"
    }                           
}                            
}'

{
  "_scroll_id" : "c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 85,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David, sorry it was my mistake. The problem was that elasticsearch was
not indexing all the documents. I tried to re-index all the documents but
still I am not able to index all the documents. When I checked the log file
there is no error but I found the below line for some files. Is this an
error?

[2013-09-11 01:14:10,955][INFO ][river.mongodb ] [Jennifer
Walters] [mongodb][subfs] Caught file: 522bef2325d9bfa684efe087 -
/data/Test.java

Thanks,
Kiruthika

On Tuesday, September 10, 2013 11:46:33 PM UTC-7, David Pilato wrote:

Probably. I was not aware that you don't have the same number of docs on
both systems.
I don't have any idea why you have this difference.

I would recommend to check your logs, check if some docs have the same Id
and then are updated and not added.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 08:09, KS <kirut...@gmail.com <javascript:>> a écrit :

Hi David, I ran the curl command and it returns only 85 files. Could it be
the reason that Elasticsearch returns only 85 files because it has not
indexed all the files present in MongoDB? Elasticsearch have indexed only
34591 files whereas I have 1637870 files in DB. Any idea on why it have not
indexed all the files?

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:Mail"
{
"count" : 85,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}

Thanks much,
Kiruthika

On Tuesday, September 10, 2013 10:38:37 PM UTC-7, David Pilato wrote:

And what gives:

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:Mail
"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 06:24, KS kirut...@gmail.com a écrit :

Hi David,

Please find the detail you asked for below:

curl -XGET "localhost:9200/submission_
idx/files/_count?pretty"
{
"count" : 34591,

"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}

Also I used the below code to create index in elasticsearch:

curl -XPUT "localhost:9200/_river/fs/_meta" -d'
{
"type": "mongodb",
"mongodb": {
"db": "submission_data",
"collection": "fs",
"gridfs": true
},
"index": {
"name": "submission_idx",
"type": "files",
"content_type": "text/plain"
}
}'

After running the query you gave I found out that all my files in MongoDB
have not been indexed by Elasticsearch. I have 1637870 files in DB but only
34591 files have been indexed. Why did not Elasticsearch index rest of the
files?

Appreciate your help.

Thanks,
Kiruthika

On Tuesday, September 10, 2013 6:35:31 PM UTC-7, David Pilato wrote:

How did you index those documents in elasticsearch?
What gives:

curl -XGET "localhost:9200/submission_idx/files/_count?pretty"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 01:04, KS kirut...@gmail.com a écrit :

I am using Elasticsearch to search for files stored in MongoDB. I would
like to retrieve all files whose name match a pattern. When I queried in
MongoDB it returns 6754 files.

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with Elasticsearch it return only 85
files. Any way to get all the files in Elasticsearch?

curl -XGET 

"localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1"
-d '{"query" : {
"field" : {
"filename" : "Mail"
}
}
}'

{
  "_scroll_id" : 

"c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 85,
"max_score" : 0.0,
"hits" :
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I don't know.
I did not create the mongodb river.

I think Richard Louapre could answer here.

What I have seen in the past about attachments is that you can hit the maximum default extracted chars (defaults to 100000). See GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch

HTH

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 11 sept. 2013 à 10:37, KS kiruthi.s19@gmail.com a écrit :

Hi David, sorry it was my mistake. The problem was that elasticsearch was not indexing all the documents. I tried to re-index all the documents but still I am not able to index all the documents. When I checked the log file there is no error but I found the below line for some files. Is this an error?

[2013-09-11 01:14:10,955][INFO ][river.mongodb ] [Jennifer Walters] [mongodb][subfs] Caught file: 522bef2325d9bfa684efe087 - /data/Test.java

Thanks,
Kiruthika

On Tuesday, September 10, 2013 11:46:33 PM UTC-7, David Pilato wrote:
Probably. I was not aware that you don't have the same number of docs on both systems.
I don't have any idea why you have this difference.

I would recommend to check your logs, check if some docs have the same Id and then are updated and not added.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 08:09, KS kirut...@gmail.com a écrit :

Hi David, I ran the curl command and it returns only 85 files. Could it be the reason that Elasticsearch returns only 85 files because it has not indexed all the files present in MongoDB? Elasticsearch have indexed only 34591 files whereas I have 1637870 files in DB. Any idea on why it have not indexed all the files?

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:Mail"
{
"count" : 85,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}

Thanks much,
Kiruthika

On Tuesday, September 10, 2013 10:38:37 PM UTC-7, David Pilato wrote:
And what gives:

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:Mail"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 06:24, KS kirut...@gmail.com a écrit :

Hi David,

Please find the detail you asked for below:

curl -XGET "localhost:9200/submission_
idx/files/_count?pretty"
{
"count" : 34591,

"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}

Also I used the below code to create index in elasticsearch:

curl -XPUT "localhost:9200/_river/fs/_meta" -d'
{
"type": "mongodb",
"mongodb": {
"db": "submission_data",
"collection": "fs",
"gridfs": true
},
"index": {
"name": "submission_idx",
"type": "files",
"content_type": "text/plain"
}
}'

After running the query you gave I found out that all my files in MongoDB have not been indexed by Elasticsearch. I have 1637870 files in DB but only 34591 files have been indexed. Why did not Elasticsearch index rest of the files?

Appreciate your help.

Thanks,
Kiruthika

On Tuesday, September 10, 2013 6:35:31 PM UTC-7, David Pilato wrote:
How did you index those documents in elasticsearch?
What gives:

curl -XGET "localhost:9200/submission_idx/files/_count?pretty"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 01:04, KS kirut...@gmail.com a écrit :

I am using Elasticsearch to search for files stored in MongoDB. I would like to retrieve all files whose name match a pattern. When I queried in MongoDB it returns 6754 files.

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with Elasticsearch it return only 85 files. Any way to get all the files in Elasticsearch?

curl -XGET "localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1" -d '{"query" : {
"field" : {
        "filename" : "*Mail*"
    }                           
}                            
}'

{
  "_scroll_id" : "c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 85,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you David!

On Wednesday, September 11, 2013 2:35:39 AM UTC-7, David Pilato wrote:

I don't know.
I did not create the mongodb river.

I think Richard Louapre could answer here.

What I have seen in the past about attachments is that you can hit the
maximum default extracted chars (defaults to 100000). See
GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch

HTH

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 11 sept. 2013 à 10:37, KS <kirut...@gmail.com <javascript:>> a écrit :

Hi David, sorry it was my mistake. The problem was that elasticsearch was
not indexing all the documents. I tried to re-index all the documents but
still I am not able to index all the documents. When I checked the log file
there is no error but I found the below line for some files. Is this an
error?

[2013-09-11 01:14:10,955][INFO ][river.mongodb ] [Jennifer
Walters] [mongodb][subfs] Caught file: 522bef2325d9bfa684efe087 -
/data/Test.java

Thanks,
Kiruthika

On Tuesday, September 10, 2013 11:46:33 PM UTC-7, David Pilato wrote:

Probably. I was not aware that you don't have the same number of docs on
both systems.
I don't have any idea why you have this difference.

I would recommend to check your logs, check if some docs have the same Id
and then are updated and not added.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 08:09, KS kirut...@gmail.com a écrit :

Hi David, I ran the curl command and it returns only 85 files. Could it
be the reason that Elasticsearch returns only 85 files because it has not
indexed all the files present in MongoDB? Elasticsearch have indexed only
34591 files whereas I have 1637870 files in DB. Any idea on why it have not
indexed all the files?

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:Mail"
{
"count" : 85,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}

Thanks much,
Kiruthika

On Tuesday, September 10, 2013 10:38:37 PM UTC-7, David Pilato wrote:

And what gives:

curl "localhost:9200/submission_idx/files/_count?pretty&q=filename:
Mail"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 06:24, KS kirut...@gmail.com a écrit :

Hi David,

Please find the detail you asked for below:

curl -XGET "localhost:9200/submission_
idx/files/_count?pretty"
{
"count" : 34591,

"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}

Also I used the below code to create index in elasticsearch:

curl -XPUT "localhost:9200/_river/fs/_meta" -d'
{
"type": "mongodb",
"mongodb": {
"db": "submission_data",
"collection": "fs",
"gridfs": true
},
"index": {
"name": "submission_idx",
"type": "files",
"content_type": "text/plain"
}
}'

After running the query you gave I found out that all my files in
MongoDB have not been indexed by Elasticsearch. I have 1637870 files in DB
but only 34591 files have been indexed. Why did not Elasticsearch index
rest of the files?

Appreciate your help.

Thanks,
Kiruthika

On Tuesday, September 10, 2013 6:35:31 PM UTC-7, David Pilato wrote:

How did you index those documents in elasticsearch?
What gives:

curl -XGET "localhost:9200/submission_idx/files/_count?pretty"

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 sept. 2013 à 01:04, KS kirut...@gmail.com a écrit :

I am using Elasticsearch to search for files stored in MongoDB. I
would like to retrieve all files whose name match a pattern. When I queried
in MongoDB it returns 6754 files.

FSsearch:PRIMARY> db.fs.files.find({"filename":/.*Mail.*/}).count();

6754

But when I tried to do the same with Elasticsearch it return only 85
files. Any way to get all the files in Elasticsearch?

curl -XGET 

"localhost:9200/submission_idx/files/_search?search_type=scan&scroll=10m&size=7000&pretty=1"
-d '{"query" : {
"field" : {
"filename" : "Mail"
}
}
}'

{
  "_scroll_id" : 

"c2Nhbjs1OzIyMDpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxODpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNjpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxOTpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzIxNzpDV21tamdEbVEyZUhOcVcwYWVnVU9ROzE7dG90YWxfaGl0czo4NTs=",
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 85,
"max_score" : 0.0,
"hits" :
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.