问题

我需要将部署在AWS上的RDS的MySQL表数据实时或近实时（可能有几分钟的延迟）导入ElasticSearch，并在此过程中连接一些表。

我已经调查过的第一个选项是Flink。但经过一些研究，我找不到一种流式传输表数据更改的方法，因为这些表不是只追加的。

然后我发现有些人在谈论CDC（Change Data Capture），基本上是将MySQL binlog更改流式传输到一个Lambda函数，然后解析它，然后将其发布到ElasticSearch，但这听起来太复杂且容易出错。

是否有任何行业已经尝试并验证过的方法来将非只追加的表同步到ElasticSearch？

英文:

I need to feed MySQL (deployed with RDS on AWS) table data into ElasticSearch in real-time or near-real-time (maybe several minutes of delay), joining a couple tables in the process.

The first option I have investigated is Flink. But after some research I couldn't find a way to stream table data change because the tables are not append-only.

Then I found that some people are talking about CDC(Change Data Capture), basically streaming MySQL binlog changes to a lambda and parse it then post to ElasticSearch, but this just sounds too complicated and error prone.

Is there any industry tried-and-true ways to sync non-append-only tables to ElasticSearch?

答案1

得分: 1

你可以使用Logstash脚本从MySQL中提取数据到Elasticsearch。

示例Logstash代码：

input {
  jdbc {
    jdbc_driver_library => "<pathToYourDataBaseDriver>/mysql-connector-java-5.1.39.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/ecomdb"
    jdbc_user => <db用户名>
    jdbc_password => <db密码>
    tracking_column¹ => "regdate"
    use_column_value²=>true
    statement => "SELECT * FROM ecomdb.customer where regdate >:sql_last_value;"
    schedule³ => " * * * * * *"
  }
}
output {
  elasticsearch {
    document_id⁴=>" %{id}"
    document_type => "doc"
    index => "test"
    hosts => ["http://localhost:9200"]
  }
  stdout{
    codec => rubydebug
  }
}

¹跟踪列
²使用列值
³计划
⁴文档ID

英文:

You can use the logstash script to fetch data from mysql to elasticsearch.

Sample Logstash Code

    input {
  jdbc {
    jdbc_driver_library =&gt; &quot;&lt;pathToYourDataBaseDriver&gt;\mysql-connector-java-5.1.39.jar&quot;
    jdbc_driver_class =&gt; &quot;com.mysql.jdbc.Driver&quot;
    jdbc_connection_string =&gt; &quot;jdbc:mysql://localhost:3306/ecomdb&quot;
    jdbc_user =&gt; &lt;db username&gt;
    jdbc_password =&gt; &lt;db password&gt;
    tracking_column&#185; =&gt; &quot;regdate&quot;
    use_column_value&#178;=&gt;true
    statement =&gt; &quot;SELECT * FROM ecomdb.customer where regdate &gt;:sql_last_value;&quot;
    schedule&#179; =&gt; &quot; * * * * * *&quot;
  }
}
output {
  elasticsearch {
    document_id⁴=&gt; &quot;%{id}&quot;
    document_type =&gt; &quot;doc&quot;
    index =&gt; &quot;test&quot;
    hosts =&gt; [&quot;http://localhost:9200&quot;]
  }
  stdout{
  codec =&gt; rubydebug
  }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

什么是将MySQL表数据实时传输到ElasticSearch的最佳方式？

问题

答案1

如何在使用Kotlin的AWS Amplify中识别身份验证错误类型？

创建/删除唯一索引的MySQL语法，使用”name”与”column”的区别。

Streaming a long video with AWS Presigned URL, problem with expiring links

使用”Like”运算符打开Excel工作簿 | VBA

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论