ElasticSearch中提供了River模块来从其他数据源中获取数据,该项功能以插件的形式存在,目前已有的River插件包括:
River Plugins
Supported by Elasticsearch
Supported by the community
(by Dominik Dorn)
(by Alex Bogdanovski)
(by Martin Bednar)
(by David Pilato)
(by David Pilato)
(by Olivier Bazoud)
(by uberVU)
(by Steve Samuel)
(by Jörg Prante)
(by Steve Sarandos)
(by Endgame Inc.)
(by Mariam Hakobyan)
(by Tanguy Leroux)
(by Richard Louapre)
(by Steve Samuel)
(by Jörg Prante)
(by Steve Samuel)
(by RethinkDB)
(by David Pilato)
(by adamlofts)
(by Luca Cavanna)
(by Sunny Gleason)
(by Pascal Lombard)
(by Kevin Wang)
(by Hendrik Saly)
(by CodeLibs Project)
(by the European Environment Agency)
(by Laurent Broudoux)
(by Laurent Broudoux)
可以看出,已经覆盖了大部分的数据源,特别是针对关系型数据库提供了统一的jdbc-river来进行数据操作。elasticsearch-river-jdbc的源码在:github.com/jprante/elasticsearch-river-jdbc,该项目提供了详细的文档,下面以SQL Server为例简单说明使用方法。
首先,需要安装elasticsearch-river-jdbc,在elasticsearch目录下执行:
./bin/plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.5.0.0/elasticsearch-river-jdbc-1.5.0.0.zip
然后,安装SQLServer的JDBC库,链接为: Microsoft JDBC Driver.把其中的 'sqljdbc4.jar'复制到elasticsearch安装目录的lib文件夹下。
考虑到elasticsearch集群,以上两个步骤在每个节点上都需要执行。
最后也是最关键的一步,在elasticsearch中建立river,让elasticsearch自动从SQLServer中获取数据。
PUT /_river/mytest_river/_meta
{
"type" : "jdbc",
"jdbc" : {
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url":"jdbc:sqlserver://MYSQLSERVERNAME;databaseName=MYProductDatabase",
"user":"admin","password":"Password",
"sql":"select ProductID as _id, CategoryID,ManufacturerID,MfName,ProductTitle,MfgPartNumber from MyProductsTable(nolock)",
"poll":"10m",
"strategy" : "simple",
"index" : "myinventory",
"type" : "product",
"bulk_size" : 100,
"max_retries": 5,
"max_retries_wait":"30s",
"max_bulk_requests" : 5,
"bulk_flush_interval" : "5s"
}
}