2023年2月6日 21:17:47go评论54阅读模式

英文:

Compression algorithm for contiguous numbers

问题

I'm looking for an efficient encoding for storing simulated coefficients.

The data has thousands of curves with each 512 contiguous numbers with single precision. The data may be stored as fixed point while it should preserve about 23-bit precision (compared to unity level).

The curves could look like those:

My best approach was to convert the numbers to 24-bit fixed point. Repeatedly I took the adjacent difference as long as the sum-of-squares decreases. When compressing the resulting data using LZMA (xz,lzip) I get about 7.5x compression (compared to float32).

Adjacent differences are good at the beginning, but they emphasize the quantization noise at each turn.

I've also tried the cosine transform after subtracting the slope/curve at the boundaries. The resulting compression was much weaker.

I tried AEC but LZMA compressed much stronger. The highest compression was using bzip3 (after adjacent differences).

I found no function to fit the data with high precision and a limited parameter count.

Is there a way to reduce the penalty of quantization noise when using adjacent differences?

Are there encodings which are better suited for this type of data?

英文:

I'm looking for an efficient encoding for storing simulated coefficients.

The curves could look like those:

Adjacent differences are good at the beginning, but they emphasize the quantization noise at each turn.

I've also tried the cosine transform after subtracting the slope/curve at the boundaries. The resulting compression was much weaker.

I tried AEC but LZMA compressed much stronger. The highest compression was using bzip3 (after adjacent differences).

I found no function to fit the data with high precision and a limited parameter count.

Is there a way to reduce the penalty of quantization noise when using adjacent differences?

Are there encodings which are better suited for this type of data?

答案1

得分: 2

你可以尝试使用高阶预测器。你的"邻近差异"是零阶预测器，其中下一个样本被预测为与上一个样本相等。你获取实际值和预测值之间的差异，然后压缩这些差异。

你可以尝试一阶、二阶等预测器。一阶预测器将查看最后两个样本，连接这两个样本，然后预测下一个样本将落在这条直线上。二阶预测器将查看最后三个样本，拟合它们成一个抛物线，并预测下一个样本将落在这个抛物线上。以此类推。

假设你的样本在x轴上等间隔分布，那么对于x[0]到三阶预测器的预测如下：

x[-1]（你现在正在使用的）
2*x[-1] - x[-2]
3*x[-1] - 3*x[-2] + x[-3]
4*x[-1] - 6*x[-2] + 4*x[-3] - x[-4]

（请注意，系数是交替符号的二项式系数。）

我怀疑三阶多项式预测器对你来说可能不太有用，但可以尝试所有这些预测器，看看哪一个有帮助。

假设差异很小，你应该使用可变长度的整数来表示它们。想法是大多数时候使用一个字节来编码每个差异。例如，你可以在一个字节中编码七位差异，例如-64到63，高位清零。如果差异不适应这个范围，那么将设置高位，并使用第二个字节来编码另外七位，总共有14位，第二个高位清零。以此类推，对于更大的差异采用相应的编码方式。

英文:

You could try a higher-order predictor. Your "adjacent difference" is a zeroth-order predictor, where the next sample is predicted to be equal to the last sample. You take the differences between the actuals and the predictions, and then compress those differences.

You can try first, second, etc. order predictors. A first-order predictor would look at the last two samples, draw a line between those, and predict that the next sample will fall on the line. A second-order predictor would look at the last three samples, fit those to a parabola, and predict that the next sample will fall on the parabola. And so on.

Assuming that your samples are equally spaced on your x-axis, then the predictors for x[0] up through cubics are:

x[-1] (what you're using now)
2*x[-1] - x[-2]
3*x[-1] - 3*x[-2] + x[-3]
4*x[-1] - 6*x[-2] + 4*x[-3] - x[-4]

(Note that the coefficients are alternating-sign binomial coefficients.)

I doubt that the cubic polynomial predictor will be useful for you, but experiment with all of them to see if any help.

Assuming that the differences are small, you should use a variable-length integer to represent them. The idea would be to use one byte for each difference most of the time. For example, you could code seven bits of difference, say -64 to 63, in one byte with the high bit clear. If the difference doesn't fit in that, then make the high bit set, and have a second byte with another seven bits for a total of 14 with that second high bit clear. And so on for larger differences.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

压缩连续数字的算法

问题

答案1

使Go编译器默认使用int64

Information theory problem, sets of numbers 信息论问题，数集

Convert any encoding to UTF 8 in Go

如何将具有相同内部结构的两个 JSON 解组为一个单独的 Golang 结构体？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论