Hello,
I wan't know if it's possible to get duplicate value un field and create another field based on that value.
Supposed I have logs:
Field1, Field2, Field3
A,B, C
A,D, E
B, E, T
B, R, H
S, V, X
What I wan't to have is :
Field1, Field2, Field3, Field4
A,B, C, 1
A,D, E, 0 (because we know A)
B, E, T, 1
B, R, H, 0
S, V, X, 1
You could probably do that with an aggregate or ruby filter. Use A/B etc as the task_id. Check if the map already contains something for the task_id. If so, set the field to 1, otherwise 0. Then add a value to the map. You would never purge entries from the map, so this would leak memory.
Thank you for the response.
But the problem is I have many value on Field1 and not just A or B.
In the given example, we have three different value but in real log I have at least 4000 distinct value.
I do not see why having thousand of possible values makes a difference. If it were millions of values I would be concerned, but thousands should be OK.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.