英文:
How to compile Rust for use with WASM's Shared Memory?
问题
当我在不同的Web Workers中运行循环时,尽管该变量应该是线程本地的,但循环会跨线程共享计数器变量。这 不应该 发生,但我不知道如何修复它。
问题的循环在run
函数中,如下所示,是Rust代码被编译成WASM的一部分:
#![no_main]
#![no_std]
use core::panic::PanicInfo;
use js::*;
mod js {
#[link(wasm_import_module = "imports")]
extern "C" {
pub fn abort(msgPtr: usize, filePtr: usize, line: u32, column: u32) -> !;
pub fn _log_num(number: usize);
}
}
#[no_mangle]
pub unsafe extern "C" fn run(worker_id: i32) {
let worker_index = worker_id as u32 - 1;
let chunk_start = 100 * worker_index;
let chunk_end = chunk_start + 100; //Total pixels may not divide evenly into number of worker cores.
for n in chunk_start as usize..chunk_end as usize {
_log_num(n);
}
}
#[panic_handler]
unsafe fn panic(_: &PanicInfo) -> ! { abort(0, 0, 0, 0) }
run
函数接收线程ID,范围从1到3,然后打印出一百个数字,因此三个线程应该分别记录数字0到299,尽管顺序可能会混合。我期望在线程1中看到1、2、3...,在线程2中看到101、102、103...,在线程3中看到201、202、203...。如果我按顺序运行这些函数,的确会看到这些数字。但是,如果我并行运行它们,每个线程都会帮助其他线程,因此它们会记录类似1、4、7...的数字在第一个线程,2、6、9...在第二个线程,3、5、8...在第三个线程,直到99,所有三个线程都会停止。每个线程都表现得好像它与其他线程共享了chunk_start
、chunk_end
和n
。
这不应该发生,因为.cargo/config.toml
指定了 --shared-memory
,所以编译器应该在分配内存时使用适当的锁定机制。
[target.wasm32-unknown-unknown]
rustflags = [
"-C", "target-feature=+atomics,+mutable-globals,+bulk-memory",
"-C", "link-args=--no-entry --shared-memory --import-memory --max-memory=2130706432",
]
我知道它被使用了,因为如果我将--shared-memory
标志更改为其他内容,rust-lld
会抱怨不知道它是什么。
wasm-bindgen的并行演示可以正常工作,所以我知道这是可能的。我只是看不出他们设置了什么使它们正常工作。
也许是我在Web Worker中加载模块的方式?
const wasmSource = fetch("sim.wasm") //现在开始请求,我们将需要它
//请参阅消息发送代码,了解为什么我们使用多个消息。
let messageArgQueue = [];
addEventListener("message", ({data}) => {
messageArgQueue.push(data)
if (messageArgQueue.length === 4) {
self[messageArgQueue[0]].apply(0, messageArgQueue.slice(1))
}
})
self.start = async (workerID, worldBackingBuffer, world) => {
const wasm = await WebAssembly.instantiateStreaming(wasmSource, {
env: { memory: worldBackingBuffer },
imports: {
abort: (messagePtr, locationPtr, row, column) => {
throw an Error(`? (?:${row}:${column}, thread ${workerID})`)
},
_log_num: num => console.log(`thread ${workerID}: n is ${num}`),
},
})
//初始化线程本地存储,以便我们获得独立的堆栈用于我们的局部变量。
wasm.instance.exports.__wasm_init_tls(workerID-1)
//循环,当“tick”前进时运行Rust记录循环。
let lastProcessedTick = 0
while (1) {
Atomics.wait(world.globalTick, 0, lastProcessedTick)
lastProcessedTick = world.globalTick[0]
wasm.instance.exports.run(workerID)
}
}
这里的worldBackingBuffer
是WASM模块的共享内存,它在主线程中创建。
//让我们数到300。我们将有三个Web Workers,每个负责⅓的任务。0-100, 100-200, 200-300...
//首先,分配一些共享内存。(原始任务想要在各个线程之间共享一些值。)
const memory = new WebAssembly.Memory({
initial: 23,
maximum: 23,
shared: true,
})
//然后,分配到内存的数据视图。
//这是由工作线程更新的共享内存,不在主线程上。
const world = {
globalTick: new Int32Array(memory.buffer, 1200000, 1), //当前全局tick。增加以告诉工作线程在scratchA中计数!
}
//加载一个核心并将“start”事件发送给它。
const startAWorkerCore = coreIndex => {
const worker = new Worker('worker/sim.mjs', {type:'module'})
;['start', coreIndex+1, memory, world].forEach(arg => worker.postMessage(arg)) //由于以下错误,将“start”消息封送到多个postMessage中: 1. 必须在world之前传输内存。 https://bugs.chromium.org/p/ch
<details>
<summary>英文:</summary>
When I run a loop in different Web Workers, the loop shares the counter variable across threads despite that the variable should be thread-local. It should **not** do this, but I don't know how to fix it.
The offending loop is in the `run` function, as follows in the Rust code being compiled to WASM:
```rust
#![no_main]
#![no_std]
use core::panic::PanicInfo;
use js::*;
mod js {
#[link(wasm_import_module = "imports")]
extern "C" {
pub fn abort(msgPtr: usize, filePtr: usize, line: u32, column: u32) -> !;
pub fn _log_num(number: usize);
}
}
#[no_mangle]
pub unsafe extern "C" fn run(worker_id: i32) {
let worker_index = worker_id as u32 - 1;
let chunk_start = 100 * worker_index;
let chunk_end = chunk_start + 100; //Total pixels may not divide evenly into number of worker cores.
for n in chunk_start as usize..chunk_end as usize {
_log_num(n);
}
}
#[panic_handler]
unsafe fn panic(_: &PanicInfo) -> ! { abort(0, 0, 0, 0) }
run
is passed the thread id, ranging from 1 to 3 inclusive, and prints out a hundred numbers - so all three threads should log the numbers 0 to 299, albeit in mixed order. I expect to see 1, 2, 3... from thread 1, 101, 102, 103... from thread 2, and 201, 202, 203 from thread 3. If I run the functions sequentially, that is indeed what I see. But if I run them in parallel, I get each thread helping each other thread, so they'll log something like 1, 4, 7 ... on the first thread, 2, 6, 9 on the second, and 3, 5, 8 on the third thread; up to 99, where all three threads will stop. Each thread is behaving like it is sharing chunk_start
, chunk_end
, and n
with the other threads.
It should not do this, because .cargo/config.toml
specifies --shared-memory
so the compiler should use the appropriate locking mechanisms when allocating memory.
[target.wasm32-unknown-unknown]
rustflags = [
"-C", "target-feature=+atomics,+mutable-globals,+bulk-memory",
"-C", "link-args=--no-entry --shared-memory --import-memory --max-memory=2130706432",
]
I know this is being picked up, because if I change the --shared-memory
flag to something else, rust-lld
complains it does not know what it is.
wasm-bindgen's parallel demo works fine, so I know it's possible to do this. I just can't spot what they've set to make theirs work.
Perhaps it is something in the way I load my module in the web worker?
const wasmSource = fetch("sim.wasm") //kick off the request now, we're going to need it
//See message sending code for why we use multiple messages.
let messageArgQueue = [];
addEventListener("message", ({data}) => {
messageArgQueue.push(data)
if (messageArgQueue.length === 4) {
self[messageArgQueue[0]].apply(0, messageArgQueue.slice(1))
}
})
self.start = async (workerID, worldBackingBuffer, world) => {
const wasm = await WebAssembly.instantiateStreaming(wasmSource, {
env: { memory: worldBackingBuffer },
imports: {
abort: (messagePtr, locationPtr, row, column) => {
throw new Error(`? (?:${row}:${column}, thread ${workerID})`)
},
_log_num: num => console.log(`thread ${workerID}: n is ${num}`),
},
})
//Initialise thread-local storage, so we get separate stacks for our local variables.
wasm.instance.exports.__wasm_init_tls(workerID-1)
//Loop, running the Rust logging loop when the "tick" advances.
let lastProcessedTick = 0
while (1) {
Atomics.wait(world.globalTick, 0, lastProcessedTick)
lastProcessedTick = world.globalTick[0]
wasm.instance.exports.run(workerID)
}
}
worldBackingBuffer
here is the shared memory for the WASM module, and it's created in the main thread.
//Let's count to 300. We'll have three web workers, each taking ⅓rd of the task. 0-100, 100-200, 200-300...
//First, allocate some shared memory. (The original task wants to share some values around.)
const memory = new WebAssembly.Memory({
initial: 23,
maximum: 23,
shared: true,
})
//Then, allocate the data views into the memory.
//This is shared memory which will get updated by the worker threads, off the main thread.
const world = {
globalTick: new Int32Array(memory.buffer, 1200000, 1), //Current global tick. Increment to tell the workers to count up in scratchA!
}
//Load a core and send the "start" event to it.
const startAWorkerCore = coreIndex => {
const worker = new Worker('worker/sim.mjs', {type:'module'})
;['start', coreIndex+1, memory, world].forEach(arg => worker.postMessage(arg)) //Marshal the "start" message across multiple postMessages because of the following bugs: 1. Must transfer memory BEFORE world. https://bugs.chromium.org/p/chromium/issues/detail?id=1421524 2. Must transfer world BEFORE memory. https://bugzilla.mozilla.org/show_bug.cgi?id=1821582
}
//Now, let's start some worker threads! They will work on different memory locations, so they don't conflict.
startAWorkerCore(0) //works fine
startAWorkerCore(1) //breaks counting - COMMENT THIS OUT TO FIX COUNTING
startAWorkerCore(2) //breaks counting - COMMENT THIS OUT TO FIX COUNTING
//Run the simulation thrice. Each thread should print a hundred numbers in order, thrice.
//For thread 1, it should print 0, then 1, then 2, etc. up to 99.
//Thread 2 should run from 100 to 199, and thread 3 200 to 299.
//But when they're run simultaneously, all three threads seem to use the same counter.
setTimeout(tick, 500)
setTimeout(tick, 700)
setTimeout(tick, 900)
function tick() {
Atomics.add(world.globalTick, 0, 1)
Atomics.notify(world.globalTick, 0)
}
But this looks pretty normal. Why am I seeing memory corruption in my Rust for-loop?
答案1
得分: 1
wasm-bindgen 中做了一些魔法 - 开头被替换/注入了修复内存的代码。虽然似乎存在一些问题 -
https://github.com/rustwasm/wasm-bindgen/discussions/3474
https://github.com/rustwasm/wasm-bindgen/discussions/3487
英文:
there is some magic being done in wasm-bindgen - the start is replaced/injected with code fixing memory. Although there seem to be issues with it -
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论