Creating dynamic fields from a field


(Pablo Musa) #1

Hey guys,
I have the following problem: Given a title, I want to record that title
but I also want to record two fields for each word in the title, using the
words itself as part of the field name.
For example:
given the title "The greatest band ever - Urban Legion" I would like to
have a, document like:

{
"title":"The greatest band ever - Urban Legion",
"greatest_x" : 1,
"band_x" : 1,
"ever_x" : 1,
"Urban_x": 1,
"Legion_x" : 1,
"greatest_y" : [],
"band_y" : [],
"ever_y" : [],
"Urban_y": [],
"Legion_y" : []
}

I was reading about dynamic mapping but I am not sure if I can acomplish
the above. Is it possible to do inside ES?

I could easily do it in an auxiliary application.

Thanks,
Pablo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF6PhFJisN_puhY0xq-tvRe9gx0jLRReheRzjL9n_PfxFAZ7pQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #2

No I don't believe you can automatically create fields based on the token
values of another field. You'd probably have to do this outside of ES.

If it matters, you can call the _analyze API to produce the tokens before
you inject your fields.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae093293-b65f-4661-b54f-e614accae5bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Clinton Gormley) #3

To add to what Binh said, you really shouldn't add field names like this:
On 14 March 2014 21:20, Pablo Musa pablitomusa@gmail.com wrote:

{
"title":"The greatest band ever - Urban Legion",
"greatest_x" : 1,
"band_x" : 1,
"ever_x" : 1,
"Urban_x": 1,
"Legion_x" : 1,
"greatest_y" : [],
"band_y" : [],
"ever_y" : [],
"Urban_y": [],
"Legion_y" : []
}

You end up with an explosion of fields, and each field has an inverted
index associated with it. Your cluster state will eventually become
enormous. Any change to the cluster state (eg adding a field, changing an
index, changes to nodes etc) results in the cluster state being copied to
every node in the cluster. If the state is very large you will experience a
significant slow down.

Instead of:

[{ custom_foo: xxx }, { custom_bar: yyy }]

Use nested fields with eg
[
{ type: "custom_foo", value: "xxx" },
{ type: "custom_bar", value: "yyy" }
]

That way you have only two fields.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSwR4p4pi_z9cbS%3DLXcn-%3Dd_OnzKzJM9P8MiXK_4xpumg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Pablo Musa) #4

Thank you very much for the hints :wink:

If it matters, you can call the _analyze API to produce the tokens before
you inject your fields.

Is there an URL there I can call? Or only using the internal API?

Your cluster state will eventually become enormous.

Yes, I saw it coming but was postponing in dev phase. Thanks for the
solution, it will help very much!!

Thanks again guys!

--Pablo

On Saturday, March 15, 2014 8:59:09 AM UTC-3, Clinton Gormley wrote:

To add to what Binh said, you really shouldn't add field names like this:
On 14 March 2014 21:20, Pablo Musa <pabli...@gmail.com <javascript:>>wrote:

{
"title":"The greatest band ever - Urban Legion",
"greatest_x" : 1,
"band_x" : 1,
"ever_x" : 1,
"Urban_x": 1,
"Legion_x" : 1,
"greatest_y" : [],
"band_y" : [],
"ever_y" : [],
"Urban_y": [],
"Legion_y" : []
}

You end up with an explosion of fields, and each field has an inverted
index associated with it. Your cluster state will eventually become
enormous. Any change to the cluster state (eg adding a field, changing an
index, changes to nodes etc) results in the cluster state being copied to
every node in the cluster. If the state is very large you will experience a
significant slow down.

Instead of:

[{ custom_foo: xxx }, { custom_bar: yyy }]

Use nested fields with eg
[
{ type: "custom_foo", value: "xxx" },
{ type: "custom_bar", value: "yyy" }
]

That way you have only two fields.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5e3f166-7c40-4bdf-b1cc-4854abdb4595%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #5

There is an REST API:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html

If you are using Java, you can avoid the network roundtrip by creating the
AnalysisService locally. For hints, see the test class:
https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/index/analysis/AnalysisModuleTests.java

--
Ivan

On Mon, Mar 17, 2014 at 12:11 PM, pablitomusa@gmail.com wrote:

Thank you very much for the hints :wink:

If it matters, you can call the _analyze API to produce the tokens
before you inject your fields.

Is there an URL there I can call? Or only using the internal API?

Your cluster state will eventually become enormous.

Yes, I saw it coming but was postponing in dev phase. Thanks for the
solution, it will help very much!!

Thanks again guys!

--Pablo

On Saturday, March 15, 2014 8:59:09 AM UTC-3, Clinton Gormley wrote:

To add to what Binh said, you really shouldn't add field names like this:
On 14 March 2014 21:20, Pablo Musa pabli...@gmail.com wrote:

{
"title":"The greatest band ever - Urban Legion",
"greatest_x" : 1,
"band_x" : 1,
"ever_x" : 1,
"Urban_x": 1,
"Legion_x" : 1,
"greatest_y" : [],
"band_y" : [],
"ever_y" : [],
"Urban_y": [],
"Legion_y" : []
}

You end up with an explosion of fields, and each field has an inverted
index associated with it. Your cluster state will eventually become
enormous. Any change to the cluster state (eg adding a field, changing an
index, changes to nodes etc) results in the cluster state being copied to
every node in the cluster. If the state is very large you will experience a
significant slow down.

Instead of:

[{ custom_foo: xxx }, { custom_bar: yyy }]

Use nested fields with eg
[
{ type: "custom_foo", value: "xxx" },
{ type: "custom_bar", value: "yyy" }
]

That way you have only two fields.

clint

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a5e3f166-7c40-4bdf-b1cc-4854abdb4595%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a5e3f166-7c40-4bdf-b1cc-4854abdb4595%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAuVXV9Cu_XY5sVuwU3J04iQcC%3Durw2tHnZRbdPG4WBxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Pablo Musa) #6

Ok, I tried all the hints but now I have can't solve my original problem.

I need to do an update to a value of type custom_foo.
In my previous approach I would do ctx._source.custom_foo.value+=1.
But now, there is a vector and I dont know which index is custom_foo.

Is there any fast method to get an nested object by value?

Thanks,
Pablo

2014-03-17 16:29 GMT-03:00 Ivan Brusic ivan@brusic.com:

There is an REST API:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html

If you are using Java, you can avoid the network roundtrip by creating the
AnalysisService locally. For hints, see the test class:
https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/index/analysis/AnalysisModuleTests.java

--
Ivan

On Mon, Mar 17, 2014 at 12:11 PM, pablitomusa@gmail.com wrote:

Thank you very much for the hints :wink:

If it matters, you can call the _analyze API to produce the tokens
before you inject your fields.

Is there an URL there I can call? Or only using the internal API?

Your cluster state will eventually become enormous.

Yes, I saw it coming but was postponing in dev phase. Thanks for the
solution, it will help very much!!

Thanks again guys!

--Pablo

On Saturday, March 15, 2014 8:59:09 AM UTC-3, Clinton Gormley wrote:

To add to what Binh said, you really shouldn't add field names like this:
On 14 March 2014 21:20, Pablo Musa pabli...@gmail.com wrote:

{
"title":"The greatest band ever - Urban Legion",
"greatest_x" : 1,
"band_x" : 1,
"ever_x" : 1,
"Urban_x": 1,
"Legion_x" : 1,
"greatest_y" : [],
"band_y" : [],
"ever_y" : [],
"Urban_y": [],
"Legion_y" : []
}

You end up with an explosion of fields, and each field has an inverted
index associated with it. Your cluster state will eventually become
enormous. Any change to the cluster state (eg adding a field, changing an
index, changes to nodes etc) results in the cluster state being copied to
every node in the cluster. If the state is very large you will experience a
significant slow down.

Instead of:

[{ custom_foo: xxx }, { custom_bar: yyy }]

Use nested fields with eg
[
{ type: "custom_foo", value: "xxx" },
{ type: "custom_bar", value: "yyy" }
]

That way you have only two fields.

clint

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a5e3f166-7c40-4bdf-b1cc-4854abdb4595%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a5e3f166-7c40-4bdf-b1cc-4854abdb4595%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/TYLV9Leqfg8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAuVXV9Cu_XY5sVuwU3J04iQcC%3Durw2tHnZRbdPG4WBxg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAuVXV9Cu_XY5sVuwU3J04iQcC%3Durw2tHnZRbdPG4WBxg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF6PhFJjiiTEKt-Y0vWOd9dVDQNtywbDiWE%3Do%3D_hgTLRBbV%3DKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Pablo Musa) #7

I solved my problem by using parent/child.
Now I can use: must + should -> has_child -> function_score.

It is working just fine. Just need to check performance.

Thanks

--Pablo

On Thursday, March 20, 2014 4:06:03 PM UTC-3, Pablo Musa wrote:

Ok, I tried all the hints but now I have can't solve my original problem.

I need to do an update to a value of type custom_foo.
In my previous approach I would do ctx._source.custom_foo.value+=1.
But now, there is a vector and I dont know which index is custom_foo.

Is there any fast method to get an nested object by value?

Thanks,
Pablo

2014-03-17 16:29 GMT-03:00 Ivan Brusic:

There is an REST API:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html

If you are using Java, you can avoid the network roundtrip by creating
the AnalysisService locally. For hints, see the test class:
https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/index/analysis/AnalysisModuleTests.java

--
Ivan

On Mon, Mar 17, 2014 at 12:11 PM,Pablo Musa wrote:

Thank you very much for the hints :wink:

If it matters, you can call the _analyze API to produce the tokens
before you inject your fields.

Is there an URL there I can call? Or only using the internal API?

Your cluster state will eventually become enormous.

Yes, I saw it coming but was postponing in dev phase. Thanks for the
solution, it will help very much!!

Thanks again guys!

--Pablo

On Saturday, March 15, 2014 8:59:09 AM UTC-3, Clinton Gormley wrote:

To add to what Binh said, you really shouldn't add field names like
this:
On 14 March 2014 21:20, Pablo Musa wrote:

{
"title":"The greatest band ever - Urban Legion",
"greatest_x" : 1,
"band_x" : 1,
"ever_x" : 1,
"Urban_x": 1,
"Legion_x" : 1,
"greatest_y" : [],
"band_y" : [],
"ever_y" : [],
"Urban_y": [],
"Legion_y" : []
}

You end up with an explosion of fields, and each field has an inverted
index associated with it. Your cluster state will eventually become
enormous. Any change to the cluster state (eg adding a field, changing an
index, changes to nodes etc) results in the cluster state being copied to
every node in the cluster. If the state is very large you will experience a
significant slow down.

Instead of:

[{ custom_foo: xxx }, { custom_bar: yyy }]

Use nested fields with eg
[
{ type: "custom_foo", value: "xxx" },
{ type: "custom_bar", value: "yyy" }
]

That way you have only two fields.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eb1ae45b-f84c-4658-88f8-52e3d539db2c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #8