英文:
batchPoints with influxdb-java getting overwritten unless forced unique time?
问题
我的项目目前正在使用 influxdb-java 与 Influx 1.8.2 进行连接。目前我的代码大致如下:
InfluxDB connection = InfluxDBFactory.connect(server, client);
connection.enableBatch(100, 10, TimeUnit.MILLISECONDS);
connection.setDatabase(database);
BatchPoints batchPoints = BatchPoints.database(database).build();
long currTime = System.currentTimeMillis() * 1000000;
double[] data1 = getInfluxData();
for (int i = someInt; i < data1.length; i++) {
if (i % someInt == 0) {
double[] data2 = processData(data1, i - someInt, i);
for (int j = 0; j < data2.length; j++) {
Point p = Point.measurement(someTable)
.time(currTime + i + j, TimeUnit.NANOSECONDS) //Line A (j < someInt)
.tag("someTag", "someTagValue")
.addField("someField", data2[j])
.build();
batchPoints.point(p);
}
}
}
connection.write(batchPoints);
connection.disableBatch();
connection.close();
目前,Line A
需要存在,否则似乎我的数据点会相互覆盖,只有一个数据点会被写入 someTable
。因此,Line A
是一个不太正规的解决方案,但这是我能够让所有数据点被正确写入的唯一方法。对于使用 batchPoints 的这种方式的灵感来自于 Influx 自己的 性能测试示例,在该示例中甚至没有指定时间。那么,如果不是因为 Line A
,我做错了什么导致数据点被覆盖了呢?如果考虑到这个问题,我可能会考虑切换到 influxdb-client-java。
英文:
My project is currently using influxdb-java to connect to Influx 1.8.2. Currently my code looks something like this:
InfluxDB connection = InfluxDBFactory.connect(server, client);
connection.enableBatch(100,10,TimeUnit.MILLISECONDS);
connection.setDatabase(database);
BatchPoints batchPoints = BatchPoints.database(database).build();
long currTime = System.currentTimeMillis()*1000000;
double[] data1 = getInfluxData();
for (int i=someInt; i < data1.length; i++){
if (i % someInt == 0){
double[] data2 = processData(data1, i-someInt, i);
for (int j=0; j < data2.length; j++){
Point p = Point.measurement(someTable)
.time(currTime+i+j, TimeUnit.NANOSECONDS) //Line A (j < someInt)
.tag("someTag", "someTagValue")
.addField("someField", data2[j])
.build();
batchPoints.point(p)
}
}
connection.write(batchPoints);
connection.disableBatch();
connection.close();
Currently, Line A
needs to be present, otherwise it seems my points overwrite themselves and only 1 point will actually get written out to someTable
. Thus, Line A
is a hacky solution but it's the only way I can get all points written out. The inspiration for the use of batchPoints like this came from influx's own example performance tests where they don't even specify a time. So what am I doing wrong that's causing the points to overwrite if not for Line A
? Might be switching influxdb-client-java instead, just for context, in case that might alleviate this issue.
答案1
得分: 1
@Isaac -
如果您正在使用InfluxDB 1.8+,我建议考虑切换到influxdb-client-java
,因为它似乎更经常得到维护。
对于您使用influxdb-java
的代码示例,BatchPoints
构建器的用法不太符合惯用法。我会说示例测试中的testWritePerformance
也很奇怪,而且该测试是被禁用的。
我省略了您的大部分代码。
// 设置...
BatchPoints.Builder batchPointsBuilder = BatchPoints.database(database);
// 开始循环...
batchPointsBuilder.point(p);
// 结束循环...
connection.write(batchPointsBuilder.build());
// 清理
但是,通过查看Java代码和您的代码,我认为这不是问题,因为point(Point p)
会将Point
添加到一个List
中。因此,在这里不应该被覆盖。
如果省略时间戳,所有点将获得相同的时间戳,因为批处理同时到达(这是InfluxDB中确定省略时间戳的方式 - 当Point到达时)。我看不到您的数据和标签,但如果所有点的字段键、标签键、标签值和时间戳都相同,那么它们将相互覆盖/替换。我怀疑您的数据/标签/字段应该是不同的;仔细检查一下。
最后,我会让您检查(并在愿意的情况下分享)您是如何知道所有点都在互相覆盖?您使用的查询是什么?
~在您的示例代码中有一个拼写错误for (int i=someInt; i < data.length; i++)
- 应为data1
。~
英文:
@Isaac -
If you are using InfluxDB 1.8+, I would consider switching to the influxdb-client-java
as it seems more frequently maintained.
For your code sample using influxdb-java
, the BatchPoints builder is not being used idiomatically. I'd say testWritePerformance
in the example tests is also strange and that test is disabled.
I'm eliding much of your code.
// setup ...
BatchPoints.Builder batchPointsBuilder = BatchPoints.database(database);
// start for loops ...
batchPointsBuilder.point(p);
// finish loops ...
connection.write(batchPointsBuilder.build());
// setdown
BUT looking at the java code and yours, I don't think this is the problem as point(Point p)
is appending Point
s to a List
. So it shouldn't be overwriting here.
If you leave out the timestamp, all the points will get the same timestamp because the batch arrives all at once (this is how a left out timestamp is determined in InfluxDB - when the Point arrives). I can't see your data and tags but if all the points are identical with fields keys, tags keys, tag values, and timestamp, then they will overwrite/replace each other. I suspect your data/tags/fields are supposed to be different; double check it.
Lastly, I'd have you check (and share if you will) how you know that all the points are overwriting themselves? What is the query you are using?
~There's a typo in your sample code at for (int i=someInt; i < data.length; i++){
- should be data1
.~
答案2
得分: 1
在InfluxDB中,所有具有相同标签值和时间戳的数据点,即使具有不同的字段值,也被视为重复。因此,这些数据点会被静默地覆盖。这是根据InfluxDB的设计。
是不是InfluxDB的限制只能存在一个具有相同标签和时间戳的数据点?
是的,这是根据设计。
如果发生碰撞,旧数据会被覆盖吗?
是的,它会被静默地覆盖。
这已经有文件记录,但不是很清楚。我打开了influxdata/influxdb.com#324来解决这个问题。
英文:
In InfluxDB all points having same tag values and timestamp even when having different field values are considered to be duplicate. So such data points are silently overwritten. This is as per the InfluxDB design.
Is it a limitation of influx that only one point can exist with the same tags and timestamp?
> Yes, this is by design.
>
> If a collision does occur, does the old one get overwritten?
>
> Yes, it is silently overwritten.
>
> This is documented but not well. I opened influxdata/influxdb.com#324
> to address that.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论