Common lines for multiple files

Hello everyone,

I have a lot of files with a lot of short lines (~45000 per file). The
lines consist in a keyword and some additional data
I store each file and its metadata as objects in {index: "default", _type:
"file", id: filename, _source: {various metadata} }
I store each line as children of my files:
body_mapping = {"line": {
"_parent" :{
"type" :"file"

{"_index": "default",
"_type": "line",
"_id": line_number,
"_parent": filename
"_source": {"keyword": keyword,

My goal is to search accross all my files by keyword {"query":


{"query" : keyword,

"fields" : ["keyword"]


But there is more to it: I want to search a bunch of keywords from a given
file (all lines from an existing file or a new one) and aggregate the
results by filename.
For example, the result would be:
{filename1: [{keyword: my_search_keyword,
metadata_for_this_keyword_in_file1, _id: line_number},
{keyword: my_search_keyword, metadata_for_this_keyword_in_file1, _id:
line_number}, ...],
filename2: [{keyword: my_search_keyword,
metadata_for_this_keyword_in_file2, _id:
line_number}, keyword: {my_search_keyword,
metadata_for_this_keyword_in_file2, _id: line_number}, ...],
filename5: [{keyword: my_search_keyword,
metadata_for_this_keyword_in_file5, _id: line_number},
{keyword: my_search_keyword, metadata_for_this_keyword_in_file5, _id:
line_number}, ...],

Important point: There are a lot of collisions, keyword-wise.

At the moment I am using elasticsearch-py with the es.msearch function. My
query is mentioned above. However this is quite slow, so I suspect that
either my object design, mapping, or search strategy are wrong.

Would you have an insight to give? Thanks a lot!

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
For more options, visit