How to implement river that indexes external data sources using Java?


(Sam-4) #1

Hi All

I am new to ES using 0.13.0 version .A river is a built in service
that indexes external data sources(MySql).i am Google it about how
to implements river in Java,but i am not able to get any things about
that .Please any one guide me to start indexing the database.

Thanks


(Shay Banon) #2

A river is simple a process that runs within the elasticsearch cluster and
retrieves data from external data source and index them. You can start by
writing it yourself using the elasticsearch Java API (read from mysql, and
index the data you want). Once you have that, and you want ES to run it as a
river, we can help transforming the code into a river.

On Fri, Dec 17, 2010 at 11:18 AM, sam mishra.sameek@gmail.com wrote:

Hi All

I am new to ES using 0.13.0 version .A river is a built in service
that indexes external data sources(MySql).i am Google it about how
to implements river in Java,but i am not able to get any things about
that .Please any one guide me to start indexing the database.

Thanks


(Sam-4) #3

Thanks for quick reply.

On Dec 19, 5:52 am, Shay Banon shay.ba...@elasticsearch.com wrote:

A river is simple a process that runs within the elasticsearch cluster and
retrieves data from external data source and index them. You can start by
writing it yourself using the elasticsearch Java API (read from mysql, and
index the data you want). Once you have that, and you want ES to run it as a
river, we can help transforming the code into a river.

On Fri, Dec 17, 2010 at 11:18 AM, sam mishra.sam...@gmail.com wrote:

Hi All

I am new to ES using 0.13.0 version .A river is a built in service
that indexes external data sources(MySql).i am Google it about how
to implements river in Java,but i am not able to get any things about
that .Please any one guide me to start indexing the database.

Thanks


(Sam-4) #4

As our discussion, i write an Java code that reads database and
index.my database contains the following field:

==========================
1.Id 2.keyword

In respect of my database prepared Json Document are as:

==========================================================================================================
{"Id":["fifth","first","four","second","thrid"],"Keyword":
["michel","pointing","sam","jerry","sanJohn"]}

code:

public class Example {

Logger logger;

Example()
{

DOMConfigurator.configure(getClass().getClassLoader().getResource("resources/
log4j.xml"));
logger =Logger.getLogger(Example.class);
logger.info("***** In the Example() constructor");
} // end Constructor Example()

private Node createNode() {

    int numberOfNodes =3;
    Settings settings = ImmutableSettings.settingsBuilder()
                                .put("gateway.type", "fs")
                                .put("index.shard.check_index",

true)
.put("index.store.fs.memory.enabled",true)
.put("index.mapper.dynamic",false)
.put("gateway.recover_after_nodes",
numberOfNodes)
.put("index.number_of_shards", 1)
.put("index.number_of_replicas",
0)
.build();
Settings settings1 = ImmutableSettings.settingsBuilder()
// .put("path.data", "myindex")
.put("index.number_of_shards",1)
.build();
// Here we are not using the settings and settings1 objects
Node node =
NodeBuilder.nodeBuilder().settings(settings1)
.client(true)
.local(false)
.node().start();
System.out.println("GateWay type:
"+node.settings().get("index.number_of_shards"));

         return node;
    }

void creatingIndex(Client client) throws IOException
{
    MyJSONDocument jc = new MyJSONDocument();
    JSONObject jsonObject = jc.createJSONDocument();
    IndexResponse indexResponse = client.prepareIndex("facebook",

"face", "1")
.setSource(jsonObject)
.execute()
.actionGet();
System.out.println("** Total Lengts of field is:
"+indexResponse.getId());
logger.info("****Index Created");

}

void gettingResponse(Client client) throws Exception
{
    logger.info("******* Creting GetResponse Object **********");
    GetResponse getResponse = client.prepareGet("facebook",

"face", "1")
.setRefresh(true)
.setOperationThreaded(false)
.execute()
.actionGet();

    logger.info("****Does this Exists: "+getResponse.isExists());
    logger.info("****getResponse is: "+getResponse);
    logger.info("****Id Obtained is: "+getResponse.getId());
    logger.info("****Index Obtained is: "+getResponse.getIndex());
    logger.info("****field for user is :

"+getResponse.getFields().keySet().size());
System.out.println("Response Type is : "+getResponse.type());

    System.out.println("Source as String is:

"+getResponse.sourceAsString());
logger.info("****Response Obtained");

    Map<String,GetField> map = new HashMap();
    map = getResponse.fields();
    System.out.println("Size of the map of fields is: "+

map.size());

}
void gettingCount(Client client) throws IOException
{
    logger.info("******* Creting CountResponse Object

*********");
CountResponse countResponse = client.prepareCount("facebook")
.setQuery(wildcardQuery("Keyword", "s
"))
.execute()
.actionGet();
System.out.println("Total Number of Count is :
"+countResponse.getCount());
} // end of setup()

void searchingData(Client client)
{
    logger.info("******* Creting SearchResponse Object

**********");

    SearchResponse searchResponse =

client.prepareSearch("facebook")
.setSearchType(SearchType.DEFAULT)
.setQuery(wildcardQuery("Keyword", "s*"))
.setFrom(0).setSize(60).setExplain(true)
.execute()
.actionGet();
SearchHits sh = searchResponse.getHits();
logger.info("Total Hits are sh.totalHits() :
"+sh.totalHits());
System.out.println("Total Hits are sh.totalHits() :
"+sh.hits().length);
System.out.println("Total Number of shards are :
"+searchResponse.totalShards());
System.out.println("Size: "+searchResponse);
}

public static void main(String[] s){
   Example example = new Example();
   try{
       Node node = example.createNode();
       node.start();
       Client client = node.client();
       example.creatingIndex(client);
       example.gettingResponse(client);
       example.gettingCount(client);
       example.searchingData(client);
       node.close();
   }
   catch(Exception e)
   {
       e.printStackTrace();
   }
} // end of main()

} // end of class

when i execute the following code,i got the following problem which
is
pointed out below:

  1. my prepared Json docs is correct.if not then what will be correct
    version.
  2. if it is correct,then i got only one hit count and count value
    but my Json docs contain twice repeated word(s*), which is wrong.

Please any one guide me where the problem.please point my mistakes.

Thanks

On Dec 19, 5:52 am, Shay Banon shay.ba...@elasticsearch.com wrote:

A river is simple a process that runs within the elasticsearch cluster and
retrieves data from external data source and index them. You can start by
writing it yourself using the elasticsearch Java API (read from mysql, and
index the data you want). Once you have that, and you want ES to run it as a
river, we can help transforming the code into a river.

On Fri, Dec 17, 2010 at 11:18 AM, sam mishra.sam...@gmail.com wrote:

Hi All

I am new to ES using 0.13.0 version .A river is a built in service
that indexes external data sources(MySql).i am Google it about how
to implements river in Java,but i am not able to get any things about
that .Please any one guide me to start indexing the database.

Thanks


(system) #5