Adding analyzer to dynamic fields

Hi,

I'm new to ES and need some advice about mapping / adding analyzers to
dynamic fields. The product document I want to Index in ES has some dynamic
fields (fields starting with "f_") based on the Product type.

Product A:
{
name: "Product A",
f_material: ["Material A", "Material B"],
f_type: "Type A"
}

Product B:
{
name: "Product B",
f_gender: "Male",
f_size: ["M", "L", "XL", "XXL"]
}

As you can see Fields starting with f_ are dynamic and change based on the
product type. One has f_material and f_type while the other has f_size,
f_gender.
So;

  1. How can I add comma separated analyzer to dynamic fields? Can we
    define analyzers to the field names that match some regex? (i.e "^f_.+")
  2. There may be around 200, 300 dynamically named fields, as I read from
    different topics I saw that they cause memory issues on Lucene. If I kept
    their names as "f_1", "f_2".. instead of "f_material", "f_type", would it
    prevent memory issues?

Thanks for your advices,
Onur

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Adding this as mapping type resolves my problem.

dynamic_templates: [

  • {
    • template_feature: {
      • mapping: {
        • type: multi_field
        • fields: {
          • {name}: {
            • type: {dynamic_type}
            • index: analyzed
              }
          • org: {
            • type: {dynamic_type}
            • index: not_analyzed
              }
              }
              }
      • match: f_*
        }
        }

]

On Monday, April 29, 2013 11:08:34 AM UTC+3, Onur Aktaş wrote:

Hi,

I'm new to ES and need some advice about mapping / adding analyzers to
dynamic fields. The product document I want to Index in ES has some dynamic
fields (fields starting with "f_") based on the Product type.

Product A:
{
name: "Product A",
f_material: ["Material A", "Material B"],
f_type: "Type A"
}

Product B:
{
name: "Product B",
f_gender: "Male",
f_size: ["M", "L", "XL", "XXL"]
}

As you can see Fields starting with f_ are dynamic and change based on the
product type. One has f_material and f_type while the other has f_size,
f_gender.
So;

  1. How can I add comma separated analyzer to dynamic fields? Can we
    define analyzers to the field names that match some regex? (i.e "^f_.+")
  2. There may be around 200, 300 dynamically named fields, as I read
    from different topics I saw that they cause memory issues on Lucene. If I
    kept their names as "f_1", "f_2".. instead of "f_material", "f_type",
    would it prevent memory issues?

Thanks for your advices,
Onur

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you are worried about memory issues, are you sure you want to have a
multi field on every dynamic property? In your example, your values are
multi-valued and are not a single value that is comma delimited (the
correct way to do things). Are you sure you need a custom analyzer?

--
Ivan

On Mon, Apr 29, 2013 at 2:26 AM, Onur Aktaş onurraktas@gmail.com wrote:

Adding this as mapping type resolves my problem.

dynamic_templates: [

  • {
    • template_feature: {
      • mapping: {
        • type: multi_field
        • fields: {
          • {name}: {
            • type: {dynamic_type}
            • index: analyzed
              }
          • org: {
            • type: {dynamic_type}
            • index: not_analyzed
              }
              }
              }
      • match: f_*
        }
        }

]

On Monday, April 29, 2013 11:08:34 AM UTC+3, Onur Aktaş wrote:

Hi,

I'm new to ES and need some advice about mapping / adding analyzers to
dynamic fields. The product document I want to Index in ES has some dynamic
fields (fields starting with "f_") based on the Product type.

Product A:
{
name: "Product A",
f_material: ["Material A", "Material B"],
f_type: "Type A"
}

Product B:
{
name: "Product B",
f_gender: "Male",
f_size: ["M", "L", "XL", "XXL"]
}

As you can see Fields starting with f_ are dynamic and change based on
the product type. One has f_material and f_type while the other has f_size,
f_gender.
So;

  1. How can I add comma separated analyzer to dynamic fields? Can we
    define analyzers to the field names that match some regex? (i.e "^f_.+")
  2. There may be around 200, 300 dynamically named fields, as I read
    from different topics I saw that they cause memory issues on Lucene. If I
    kept their names as "f_1", "f_2".. instead of "f_material", "f_type",
    would it prevent memory issues?

Thanks for your advices,
Onur

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

With multi-field I was able to get facets by"f_type.org", "f_material.org"
etc, however they didn't work for comma separated multi-values as you said.
Could you please tell how I can achieve this without creating multi field
on every dynamic property?.

All I want to do is to get facets based on dynamic fields without breaking
them into words except the ones containg comma.

Material:

  • Material A (5)
  • Material B (10)

Size:

  • L(1)
  • XL(5)
  • XL(10)

instead of

Material:

  • Material (5)
  • A (5)
  • B (5)

Onur

On Monday, April 29, 2013 5:19:59 PM UTC+3, Ivan Brusic wrote:

If you are worried about memory issues, are you sure you want to have a
multi field on every dynamic property? In your example, your values are
multi-valued and are not a single value that is comma delimited (the
correct way to do things). Are you sure you need a custom analyzer?

--
Ivan

On Mon, Apr 29, 2013 at 2:26 AM, Onur Aktaş <onurr...@gmail.com<javascript:>

wrote:

Adding this as mapping type resolves my problem.

dynamic_templates: [

  • {
    • template_feature: {
      • mapping: {
        • type: multi_field
        • fields: {
          • {name}: {
            • type: {dynamic_type}
            • index: analyzed
              }
          • org: {
            • type: {dynamic_type}
            • index: not_analyzed
              }
              }
              }
      • match: f_*
        }
        }

]

On Monday, April 29, 2013 11:08:34 AM UTC+3, Onur Aktaş wrote:

Hi,

I'm new to ES and need some advice about mapping / adding analyzers to
dynamic fields. The product document I want to Index in ES has some dynamic
fields (fields starting with "f_") based on the Product type.

Product A:
{
name: "Product A",
f_material: ["Material A", "Material B"],
f_type: "Type A"
}

Product B:
{
name: "Product B",
f_gender: "Male",
f_size: ["M", "L", "XL", "XXL"]
}

As you can see Fields starting with f_ are dynamic and change based on
the product type. One has f_material and f_type while the other has f_size,
f_gender.
So;

  1. How can I add comma separated analyzer to dynamic fields? Can we
    define analyzers to the field names that match some regex? (i.e "^f_.+")
  2. There may be around 200, 300 dynamically named fields, as I read
    from different topics I saw that they cause memory issues on Lucene. If I
    kept their names as "f_1", "f_2".. instead of "f_material", "f_type",
    would it prevent memory issues?

Thanks for your advices,
Onur

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

To me, it sounds like the processing needs to happen once at index time,
unless you need the original value to be returned. Your client-side
indexing process can do the comma tokenization process and you can just
index the tokens as non-analyzed. Or you can setup a index analyzer which
knows how to process the text and set your search analyzer as a simple
KeywordAnalyzer (identical to using not analyzed).

--
Ivan

On Tue, Apr 30, 2013 at 4:32 AM, Onur Aktaş onurraktas@gmail.com wrote:

With multi-field I was able to get facets by"f_type.org", "f_material.org"
etc, however they didn't work for comma separated multi-values as you said.
Could you please tell how I can achieve this without creating multi field
on every dynamic property?.

All I want to do is to get facets based on dynamic fields without breaking
them into words except the ones containg comma.

Material:

  • Material A (5)
  • Material B (10)

Size:

  • L(1)
  • XL(5)
  • XL(10)

instead of

Material:

  • Material (5)
  • A (5)
  • B (5)

Onur

On Monday, April 29, 2013 5:19:59 PM UTC+3, Ivan Brusic wrote:

If you are worried about memory issues, are you sure you want to have a
multi field on every dynamic property? In your example, your values are
multi-valued and are not a single value that is comma delimited (the
correct way to do things). Are you sure you need a custom analyzer?

--
Ivan

On Mon, Apr 29, 2013 at 2:26 AM, Onur Aktaş onurr...@gmail.com wrote:

Adding this as mapping type resolves my problem.

dynamic_templates: [

  • {
    • template_feature: {
      • mapping: {
        • type: multi_field
        • fields: {
          • {name}: {
            • type: {dynamic_type}
            • index: analyzed
              }
          • org: {
            • type: {dynamic_type}
            • index: not_analyzed
              }
              }
              }
      • match: f_*
        }
        }

]

On Monday, April 29, 2013 11:08:34 AM UTC+3, Onur Aktaş wrote:

Hi,

I'm new to ES and need some advice about mapping / adding analyzers to
dynamic fields. The product document I want to Index in ES has some dynamic
fields (fields starting with "f_") based on the Product type.

Product A:
{
name: "Product A",
f_material: ["Material A", "Material B"],
f_type: "Type A"
}

Product B:
{
name: "Product B",
f_gender: "Male",
f_size: ["M", "L", "XL", "XXL"]
}

As you can see Fields starting with f_ are dynamic and change based on
the product type. One has f_material and f_type while the other has f_size,
f_gender.
So;

  1. How can I add comma separated analyzer to dynamic fields? Can we
    define analyzers to the field names that match some regex? (i.e "^f_.+")
  2. There may be around 200, 300 dynamically named fields, as I read
    from different topics I saw that they cause memory issues on Lucene. If I
    kept their names as "f_1", "f_2".. instead of "f_material", "f_type",
    would it prevent memory issues?

Thanks for your advices,
Onur

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.