英文:
DSBulk cannot connect to cluster to load CSV data
问题
我正在尝试使用dsbulk实用程序将CSV文件加载到Cassandra集群中。我有CSV文件的本地副本,并尝试连接到远程集群并将CSV加载到表中。但是,dsbulk无法识别远程集群地址,并显示以下错误消息:
无法到达任何联系点,请确保提供了有效的地址
和
导致:现有连接已被远程主机强制关闭。
我正在使用相同的连接参数从IntelliJ连接到启用SSL的集群,一切正常。我无法弄清楚为什么dsbulk无法正常工作。以下是dsbulk的application.conf文件和我尝试运行的命令:
dsbulk {
--dsbulk.connector.name = csv
--dsbulk.connector.csv.url = <CSV路径>
--dsbulk.connector.csv.header true
--datastax-java-driver.basic.contact-points = ["169.XX.XXX.XX", "169.XX.XXX.XX", "169.XX.XXX.XX"]
--datastax-java-driver.advanced.auth-provider.username = <用户名>
--datastax-java-driver.advanced.auth-provider.password = <密码>
--dsbulk.schema.keyspace = <键空间>
--dsbulk.schema.table = <表>
--datastax-java-driver.advanced.ssl-engine-factory.truststore-path = <信任库路径>
--datastax-java-driver.advanced.ssl-engine-factory.truststore-password = <密码>
--datastax-java-driver.advanced.resolve-contact-points = true
}
命令:
$ dsbulk load -url CSV路径
上述命令似乎无法识别application.conf属性,并尝试连接到127.0.0.1。出现错误:
[driver] Error connecting to Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=2c61adb4)
我不太确定为什么配置文件未被dsbulk使用的问题。
$ dsbulk load -url CSV路径 -k 键空间 -t 表 -h "[169.XX.XXX.XX, 169.XX.XXX.XX, 169.XX.XXX.XX]" -u 用户名 -p 密码
上述命令未能连接到显式添加的集群节点。出现错误:
[driver] Error connecting to Node(endPoint=/169.XX.XXX.XX:9042, hostId=null, hashCode=2a38b2fe),
Suppressed: [driver|control|id: 0x17d0139b, L:/172.31.50.184:59702 - R:/169.XX.XXX.XXX:9042] Protocol initialization request, step 1 (OPTIONS): unexpected failure (com.datastax.oss.driver.api.core.connection.ClosedConnectionException: Unexpected error on channel).
Caused by: Unexpected error on channel.
Caused by: An existing connection was forcibly closed by the remote host.
dsbulk正在重试所有节点并提供相同的错误消息。
身份验证正在重定向到明文,我认为这对我的用例将有效。
已提供用户名和密码,但未指定身份验证提供程序,推断为PlainTextAuthProvider
您能否建议一下我的配置或连接到远程集群的问题是什么?
我的实际用例是每周从Sybase归档数百万条记录到Cassandra,为此我正在尝试创建一个执行此dsbulk的简单Java实用程序。如果有其他方法也可以接受。
非常感谢!
英文:
I am trying to load csv files into cassandra cluster for which I am using dsbulk utility.I have a local copy of CSV file and trying to connect to remote cluster and load the CSV into the table. However, dsbulk is failing to recognise remote cluster address and saying
Could not reach any contact point, make sure you've provided valid addresses
and
Caused by: An existing connection was forcibly closed by the remote host.
I am using the same connection parameters from intellij to connect to cluster with sslenabled and it is working fine. Couldn't really figure why it is not working with dsbulk. Please find the application.conf for dsbulk and the command that I am trying to run
dsbulk {
--dsbulk.connector.name = csv
--dsbulk.connector.csv.url = <CSV_Path>
--dsbulk.connector.csv.header true
--datastax-java-driver.basic.contact-points = [ "169.XX.XXX.XX", "169.XX.XXX.XX", "169.XX.XXX.XX" ]
--datastax-java-driver.advanced.auth-provider.username = <user_name>
--datastax-java-driver.advanced.auth-provider.password = <pwd
--dsbulk.schema.keyspace = <key space
--dsbulk.schema.table = <table
--datastax-java-driver.advanced.ssl-engine-factory.truststore-path = <cacerts path<br/>
--datastax-java-driver.advanced.ssl-engine-factory.truststore-password = <pwd
--datastax-java-driver.advanced.resolve-contact-points = true
}
commands :
$ dsbulk load -url CSV Path**
The above command doesn't recognize the application.conf properties and trying to connect to 127.0.0.1
Error :
[driver] Error connecting to Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=2c61adb4)
Not really sure what could be the issue for conf file not being used by dsbulk
$ dsbulk load -url CSV Path -k keysapce -t table -h "[ "169.XX.XXX.XX", "169.XX.XXX.XX", "169.XX.XXX.XX" ]" -u userName -p pwd
The above command fails to connect to the cluster nodes added explicitly. Error :
[driver] Error connecting to Node(endPoint=/169.XX.XXX.XX:9042, hostId=null, hashCode=2a38b2fe),
Suppressed: [driver|control|id: 0x17d0139b, L:/172.31.50.184:59702 - R:/169.XX.XXX.XXX:9042] Protocol initialization request, step 1 (OPTIONS): unexpected failure (com.datastax.oss.driver.api.core.connection.ClosedConnectionException: Unexpected error on channel).
Caused by: Unexpected error on channel.
Caused by: An existing connection was forcibly closed by the remote host.
dsbulk is retrying on all nodes and giving the same error.<br/>
Auth is redirecting to plain text which I believe will work for my use case
Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
Could you please suggest on what is the problem with my config or my connection to the remote cluster.<br/>
My actual use case is to archive millions of records from Sybase to Cassandra every week for which I am trying to create a simple java utility that executes this dsbulk. Any other approach is also appreciated.
Many thanks in advance.
答案1
得分: 1
你的 application.conf
文件内容不正确。请参阅此文档以了解如何构建配置文件。
英文:
Your application.conf
file has incorrect contents. See this documentation on how to construct the config files.
答案2
得分: 1
问题在于您没有正确格式化配置文件中的条目,因此 DSBulk 无法解析它们。由于配置文件无法使用,DSBulk 默认连接到 localhost
(127.0.0.1
)。
正确的格式如下:
dsbulk {
connector.name = csv
schema.keyspace = "keyspacename"
schema.table = "tablename"
}
然后,您需要单独定义Java驱动程序选项,格式如下:
datastax-java-driver {
basic {
contact-points = [ "cp1", "cp2", "cp3"]
}
advanced {
ssl-engine-factory {
keystore-password = "keystorepass"
keystore-path = "/path/to/keystore.file"
class = DefaultSslEngineFactory
truststore-password = "truststorepass"
truststore-path = "/path/to/truststore.file"
}
}
}
如果未正确配置SSL,那么驱动程序将无法连接到任何节点,这是您提到的错误的原因。
请注意,您可以将Java驱动程序配置放在单独的 driver.conf
文件中,但您需要确保在应用程序配置中引用它,方式如下:
include classpath("/path/to/driver.conf")
有关详细信息,请参阅在DSBulk中使用SSL。祝一切顺利!
英文:
The problem is that you have not formatted the entries in the configuration file correctly so DSBulk cannot parse them. Since the configuration file is not usable, DSBulk defaults to connecting to localhost
(127.0.0.1
).
The correct format looks like this:
dsbulk {
connector.name = csv
schema.keyspace = "keyspacename"
schema.table = "tablename"
}
Then you need to define the Java driver options separately which looks like this:
datastax-java-driver {
basic {
contact-points = [ "cp1", "cp2", "cp3"]
}
advanced {
ssl-engine-factory {
keystore-password = "keystorepass"
keystore-path = "/path/to/keystore.file"
class = DefaultSslEngineFactory
truststore-password = "truststorepass"
truststore-path = "/path/to/truststore.file"
}
}
}
If you don't configure SSL correctly then the driver will not be able to connect to any on the nodes which is the reason for those errors you mentioned.
Note that you can place the Java driver configuration in a separate driver.conf
file but you need to make sure you reference it in the application configuration with the line:
include classpath("/path/to/driver.conf")
For details, see Using SSL with DSBulk. Cheers!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论