Elasticsearch on Spark - case class limit to 22 fields on Scala


(eliasah) #1

I'm trying to work on the kdd99 Dataset for Fraud Detection. In the dataset, a record looks like this :

0,tcp,http,SF,215,45076,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,0,0,0.00,0.00,0.00,0.00,0.00,.00,0.00,0.00,normal.

A record represents a connection. For each connection, the data set contains information like the number of bytes sent, login attempts, TCP errors, and so on. Each connection is one line of CSV-formatted data, containing 38 features.

So what I am trying to do is to write the data into Elasticsearch using Spark, so I can first analyze it with Kibana on a visual level, before performing deeper computation with Spark to predict whether a record is a fraudulent action or not.

The issue is that till Scala 2.10, a case class is limited to 22 fields.

Which means that I can't create a case class to associate to a record.

How can I go around this limitation without switching to Scala 2.11 which seems that can solve the issue (SI-7296)?

I appreciate your help. Thanks in advance!


(Costin Leau) #2

Case classes are just an option of types that can be serialized out of the box. You can just as well use a Map (whether in Scala or Java) or a JavaBean - though I would recommend the former especially considering the big number of parameters involved.


(eliasah) #3

Ok thanks! So actually you'll recommend using a classic Map structure?


(Costin Leau) #4

A case class is just that - a strongly-typed Map. Why not use it? Especially if the properties fall under the same type and you can add some generics to it, the Map should work just fine and have no size issue.


(eliasah) #5

Great. Thanks! Your solution seems quite logical now. :smile:


(system) #6