英文:
The image buffer stops getting produced after a few times
问题
I understand that you want a translation of the code and descriptions you provided. Here is the translation:
我正在使用核心图像框架来检测 VIPER 架构中的微笑。
视图控制器
extension ViewController: AVCaptureDataOutputSynchronizerDelegate {
func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer, didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
guard let syncedDepthData: AVCaptureSynchronizedDepthData =
synchronizedDataCollection.synchronizedData(for: multifaceCamera.depthDataOutput) as? AVCaptureSynchronizedDepthData,
let videoDataOutput = self.videoDataOutput,
let syncedVideoData: AVCaptureSynchronizedSampleBufferData =
synchronizedDataCollection.synchronizedData(for: videoDataOutput) as? AVCaptureSynchronizedSampleBufferData else {
return
}
if syncedDepthData.depthDataWasDropped || syncedVideoData.sampleBufferWasDropped {
return
}
let depthData = syncedDepthData.depthData
let _depthData = depthData.converting(toDepthDataType: kCVPixelFormatType_DepthFloat16)
let depthPixelBuffer = _depthData.depthDataMap
let sampleBuffer = syncedVideoData.sampleBuffer
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
Task {
await presenter.sendOutput(output: imageBuffer, depthPixelBuffer: depthPixelBuffer)
}
}
}
视图控制器捕获样本缓冲区和深度数据,并将它们发送给 presenter。深度数据来自 TrueDepth 相机,但此示例代码中未显示其用途,因为与问题无关。
Presenter
func sendOutput(output: CVPixelBuffer, depthPixelBuffer: CVPixelBuffer?) async {
await interactor.sendOutput(output: output, depthPixelBuffer: depthPixelBuffer)
}
Presenter 类中的 sendOutput
方法将数据传递给 interactor。
Interactor
func sendOutput(output: CVPixelBuffer, depthPixelBuffer: CVPixelBuffer?) async {
let image = CIImage(cvPixelBuffer: output)
let result = await isSmiling(image: image)
/// 处理结果
}
func isSmiling(image: CIImage) async -> Bool {
let detector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy: CIDetectorAccuracyHigh, CIDetectorSmile: true])
let faces = detector?.features(in: image, options: [CIDetectorSmile: true as AnyObject]) as? [CIFaceFeature]
if let facesFound = faces {
for face in facesFound {
return face.hasSmile
}
} else {
return false
}
return false
}
Interactor 类中的 CIDetector
使用从 presenter 发送的图像缓冲区来检测面部的微笑状态。您遇到的问题是 detector?.features(in: image, options: [CIDetectorSmile : true as AnyObject]) as? [CIFaceFeature]
不产生任何结果。这并不是指它会返回 nil。该方法被调用几次,然后突然停止,就好像意识到该特征方法未生成任何结果或任务已完成。它既不崩溃,也不会优雅地返回 nil 或错误,图像缓冲区仅停止从视图控制器生成。日志显示 detector 对象正确创建了 let detector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy: CIDetectorAccuracyHigh, CIDetectorSmile: true])
。
令人困惑的是,如果我在视图控制器中使用相同的 isSmiling
方法,它可以正常工作,如下面的示例所示。这让我想知道是否与 Task
和 async-await
有关(尽管 Vision 的人脸检测在 interactor 中正常工作)。
视图控制器
extension ViewController: AVCaptureDataOutputSynchronizerDelegate {
func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer, didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
guard let syncedDepthData: AVCaptureSynchronizedDepthData =
synchronizedDataCollection.synchronizedData(for: multifaceCamera.depthDataOutput) as? AVCaptureSynchronizedDepthData,
let videoDataOutput = self.videoDataOutput,
let syncedVideoData: AVCaptureSynchronizedSampleBufferData =
synchronizedDataCollection.synchronizedData(for: videoDataOutput) as? AVCaptureSynchronizedSampleBufferData else {
return
}
if syncedDepthData.depthDataWasDropped || syncedVideoData.sampleBufferWasDropped {
return
}
let depthData = syncedDepthData.depthData
let _depthData = depthData.converting(toDepthDataType: kCVPixelFormatType_DepthFloat16)
let depthPixelBuffer = _depthData.depthDataMap
let sampleBuffer = syncedVideoData.sampleBuffer
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
Task {
let image = CIImage(cvPixelBuffer: imageBuffer)
let result = await isSmiling(image: image)
/// 处理结果
}
}
}
您在 Xcode 控制台中唯一收到的错误消息是:
"The input tensor width must be aligned to 64-byte."
我甚至不确定这是否与 CIDetector
有关。
我尝试设置 AVCaptureVideoDataOutput
如下所示:
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: Int(kCVPixelFormatType_64RGBALE)]
/// 或 kCVPixelFormatType_64RGBALE
或者甚至选择一个宽度可被16或64整除的可用格式,并将其分配给设备的活动格式,认为每个像素占4字节,像素宽度需要是16的倍数,以使总宽度以字节的方式为64的倍数:
for format in availableFormats {
let dimensions = CMVideoFormatDescriptionGetDimensions(format.formatDescription)
if dimensions.width % 16 == 0 {
byteAlignedFormats.append(format)
}
}
/// 将 byteAlignedFormats 中的一个分配为设备的活动格式
let deviceInput = try AVCaptureDeviceInput(device: device)
try device.lockForConfiguration()
device.activeFormat = byteAlignedFormat
device.unlockForConfiguration()
不幸的
英文:
I'm using the Core Image framework to detect smiles in the VIPER architecture.
ViewController
extension ViewController: AVCaptureDataOutputSynchronizerDelegate {
func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer, didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
guard let syncedDepthData: AVCaptureSynchronizedDepthData =
synchronizedDataCollection.synchronizedData(for: multifaceCamera.depthDataOutput) as? AVCaptureSynchronizedDepthData,
let videoDataOutput = self.videoDataOutput,
let syncedVideoData: AVCaptureSynchronizedSampleBufferData =
synchronizedDataCollection.synchronizedData(for: videoDataOutput) as? AVCaptureSynchronizedSampleBufferData else {
return
}
if syncedDepthData.depthDataWasDropped || syncedVideoData.sampleBufferWasDropped {
return
}
let depthData = syncedDepthData.depthData
let _depthData = depthData.converting(toDepthDataType: kCVPixelFormatType_DepthFloat16)
let depthPixelBuffer = _depthData.depthDataMap
let sampleBuffer = syncedVideoData.sampleBuffer
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
Task {
await presenter.sendOutput(output: imageBuffer, depthPixelBuffer: depthPixelBuffer)
}
}
}
The view controller captures the sample buffer and the depth data and sends them over to the presenter. The depth data is from the TrueDepth camera, but the use of it is not shown in this example code for irrelevancy.
Presenter
func sendOutput(output: CVPixelBuffer, depthPixelBuffer: CVPixelBuffer?) async {
await interactor.sendOutput(output: output, depthPixelBuffer: depthPixelBuffer)
}
The sendOutput
method in the presenter class relays the data over to the interactor.
Interactor
func sendOutput(output: CVPixelBuffer, depthPixelBuffer: CVPixelBuffer?) async {
let image = CIImage(cvPixelBuffer: output)
let result = await isSmiling(image: image)
/// Do something the result
}
func isSmiling(image: CIImage) async -> Bool {
let detector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy: CIDetectorAccuracyHigh, CIDetectorSmile: true])
let faces = detector?.features(in: image, options: [CIDetectorSmile : true as AnyObject]) as? [CIFaceFeature]
if let facesFound = faces {
for face in facesFound {
return face.hasSmile
}
} else {
return false
}
return false
}
The CIDetector
in the interactor class uses the image buffer sent from the presenter to detect the smile status in faces. The problem I'm experiencing is that detector?.features(in: image, options: [CIDetectorSmile : true as AnyObject]) as? [CIFaceFeature]
doesn't produce any results. I don't mean that it produces nils. The method is called a few times then it simply stops as if realizing that the feature method is not producing any results or the task is being completed. Doesn't crash, doesn't gracefully produce nils or errors, the image buffer simply stops getting produced from the view controller. Logging shows that the detector object is created properly let detector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy: CIDetectorAccuracyHigh, CIDetectorSmile: true])
.
Just to test that the image buffer is being created properly, I tried using Vision's VNDetectFaceLandmarksRequest
to detect faces and it turns out this works perfectly.
What's confusing is that if I use the same isSmiling
method in the view controller, it works fine as shown in the following example, which makes me wonder if this is related to Task
and async-await
(although Vision's face detection worked fine in the interactor).
ViewController
extension ViewController: AVCaptureDataOutputSynchronizerDelegate {
func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer, didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
guard let syncedDepthData: AVCaptureSynchronizedDepthData =
synchronizedDataCollection.synchronizedData(for: multifaceCamera.depthDataOutput) as? AVCaptureSynchronizedDepthData,
let videoDataOutput = self.videoDataOutput,
let syncedVideoData: AVCaptureSynchronizedSampleBufferData =
synchronizedDataCollection.synchronizedData(for: videoDataOutput) as? AVCaptureSynchronizedSampleBufferData else {
return
}
if syncedDepthData.depthDataWasDropped || syncedVideoData.sampleBufferWasDropped {
return
}
let depthData = syncedDepthData.depthData
let _depthData = depthData.converting(toDepthDataType: kCVPixelFormatType_DepthFloat16)
let depthPixelBuffer = _depthData.depthDataMap
let sampleBuffer = syncedVideoData.sampleBuffer
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
Task {
let image = CIImage(cvPixelBuffer: imageBuffer)
let result = await isSmiling(image: image)
/// Do something the result
}
}
}
The only error message I'm getting in the Xcode console is:
>The input tensor width must be aligned to 64-byte.
which I'm not even sure if it's related to CIDetector
.
I tried setting AVCaptureVideoDataOutput
as following
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: Int(kCVPixelFormatType_64RGBALE)]
/// or kCVPixelFormatType_64RGBALE
or even selecting an available format whose width is divisible by 16 or 64 and assigning it to the device's active format, thinking that, given that each pixel is 4 bytes, the width of the image in pixels needs to be a multiple of 16 for the total width in bytes to be a multiple of 64:
for format in availableFormats {
let dimensions = CMVideoFormatDescriptionGetDimensions(format.formatDescription)
if dimensions.width % 16 == 0 {
byteAlignedFormats.append(format)
}
}
/// assign one of the byteAlignedFormats to be the device's active format
let deviceInput = try AVCaptureDeviceInput(device: device)
try device.lockForConfiguration()
device.activeFormat = byteAlignedFormat
device.unlockForConfiguration()
Unfortunately, none of the above attempts have worked.
答案1
得分: 1
你很可能正在耗尽缓冲区。
AVFoundation 仅为相机捕获保留了有限数量的帧缓冲区。当这个池子为空时,它会停止传送新的帧 - 而没有任何警告或错误... 😢 而且即使缓冲区再次可用,它也不会自动重新启动。
问题在于你正在异步进行人脸检测,这在理论上是个好主意。然而,由于检测所需的时间比捕获新帧的时间长,你的异步队列将充满检测任务,每个任务都会保留相应的捕获缓冲区。一旦捕获缓冲区池为空,会话就会停止捕获。
你可以在相机传来新缓冲区时,立刻检查检测器是否仍在检查前一帧是否有人脸。如果是这样,你可以丢弃该帧,并在下一帧到达时再次检查。这样,缓冲区会立即释放,而不会被异步队列保留。
英文:
You are most likely running out of buffers.
AVFoundation only reserves a limited number of frame buffers for camera capture. When this pool is empty, it simply stops delivering new frames – without any warning or error... 🙄 And it will not restart automatically, even if buffers are available again.
The problem here is that you are doing the face detection async, which is a good idea in theory. However, since the detection takes longer than capturing a new frame, your async queue will fill up with detection tasks, each of them retaining the corresponding capture buffer. And as soon as the capture buffer pool is empty, the session stops capturing.
What you could do is, as soon as a new buffer arrives from the camera, check if the detector is still busy checking the previous frame for faces. If this is the case, you can discard the frame and check again when the next frame arrives. This way, buffers are freed immediately instead of being retained by the async queue.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论