问题

我是一个初学者，对于 ARM NEON Intrinsics 不太熟悉，我想要将一个 float32 数组按照一个标量（2^13 = 8192）进行缩放，并将其转换为 int16_t 数组。
我认为我需要执行以下步骤：

加载 float 缓冲数组
与标量相乘（2^13 = 8192）
将它们转换为 32 位整数
将 32 位转换为 16 位整数
存储到 16 位缓冲区中

请帮我检查并纠正以下代码：

    // 将 float 转换为 int16
    uint32_t blk_cnt;
    float32x4_t f32x4;
    int32x4_t i32x4;
    int16x4_t i16x4;
    float32_t scale = 8192.0;
    /* 一次计算 4 个复杂样本 */
    blk_cnt = sz >> 2U;
    while (blk_cnt > 0U) {
        f32x4 = vld1q_f32 ((float32_t *) inpout);
        f32x4 = vmulq_n_f32(f32x4, scale);
        i32x4 = vcvtq_s32_f32 (f32x4);
        i16x4 = vmovn_s32 (i32x4);
        vst1_s16 (out, i16x4);      
        /* 增加指针 */
        out += 4;
        inpout += 4;
        /* 减少循环计数器 */
        blk_cnt--;
    }

请注意，此代码中的注释已被翻译为中文。

英文:

I'm a newbie to arm neon intrinsics and I would like to scale the float32 array with a scalar (2^13 = 8192) and typecast to int16_t array.
I believe I need to perform the below steps:

Load the float buffer array
Multiply with the scalar (2^13 = 8192)
Convert them to 32-bit integers
Convert 32-bit to 16-bit integers
Store them into 16-bit buffer

Could you please check and correct the below code:

    // convert float to int16
    uint32_t blk_cnt;
    float32x4_t f32x4;
    int32x4_t i32x4;
    int16x4_t i16x4;
    float32_t scale = 8192.0;
    /* Compute 4 complex samples at a time */
    blk_cnt = sz &gt;&gt; 2U;
    while (blk_cnt &gt; 0U) {
        f32x4 = vld1q_f32 ((float32_t *) inpout);
        f32x4 = vmulq_n_f32(f32x4, scale);
        i32x4 = vcvtq_s32_f32 (f32x4);
        i16x4 = vmovn_s32 (i32x4);
        vst1_s16 (out, i16x4);      
        /* Increment pointers */
        out += 4;
        inpout += 4;
        /* Decrement the loop counter */
        blk_cnt--;
    }

答案1

得分: 1

你很可能正在处理q13（13个小数位）的定点值。
只需将浮点数转换为q13 int32（vcvt_n_s32_f32），然后通过vqmovn缩小为int16。

链接：https://developer.arm.com/documentation/dui0473/m/vfp-instructions/vcvt--between-floating-point-and-fixed-point-

英文:

You are most probably dealing with q13 (13 fraction bits) fixed point values.
You just need to convert the floats to q13 int32 (vcvt_n_s32_f32), then shrink them to int16 by vqmovn.

https://developer.arm.com/documentation/dui0473/m/vfp-instructions/vcvt--between-floating-point-and-fixed-point-

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用 ARM NEON 内联函数将 float32 强制转换为 int16。

问题

答案1

FPCR.FIZ=1: 为哪些浮点指令输入不会被清零？

如何在苹果 M1 汇编代码中处理 SIGINT 信号？

Is it possible to use ARM compiler version 4.9.3 to compile code for newer ARM MCUs such as nRF52840 or STM32WB series?

QEMU为什么拒绝运行这个EABI（裸机）ARM二进制文件？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论