英文:
Tflite custom object detection model for flutter application using google_mlkit_object_detection package
问题
I can help you with the translation. Here is the content you provided in Chinese:
我正在尝试创建一个自定义的目标检测模型,以tflite格式进行转换,以便在flutter应用程序中使用google_mlkit_object_detection包。
前几个模型,我是使用yolov8创建的,并将其转换为tflite格式。标注是使用roboflow进行的,我使用的google colab笔记本是由他们提供的,转换后的tflite模型的元数据如下图所示
在这个模型上,我遇到了以下错误
输入张量的类型为kTfLiteFloat32:需要指定NormalizationOptions元数据以预处理输入图像。
因此,如建议的,我尝试更改元数据并添加NormalizationOptions,但未成功。我的第二个选择是使用官方的TensorFlow google colab笔记本TensorFlow Lite Model Maker来训练模型,它生成了以下元数据的模型
对于这个模型,错误是
输出索引1的维度数意外:得到了3D,预期要么是2D(BxN,其中B=1)或4D
因此,我检查了我正在使用的包的示例应用程序中的模型"google_mlkit_object_detection",元数据看起来像这样
所以我的问题是,我如何修改我已经训练过的模型中的两者中哪个都更容易,使其看起来像这样,包括输入和输出,我是否需要修改模型的架构或只需更改元数据?使用TensorFlow官方笔记本训练的第二个模型似乎只需包含正确的形状格式[1,N],但再次可能需要更改架构。
英文:
I'm trying to create a custom object detection model in tflite format so that I can use it in a flutter application with the google_mlkit_object_detection package.
The first few models, I created using yolov8 and converted to tflite. The annotations were made with roboflow and the google colab notebook I used was provided by them, the metadata of the converted tflite model looks like the following image
On this model I was getting the error
Input tensor has type kTfLiteFloat32: it requires specifying NormalizationOptions metadata to preprocess input images.
So as suggested I tried to change the metadata and add normalizationOptions but failed to do so. My second alternative was to train a model with the official TensorFlow google colab notebook TensorFlow Lite Model Maker and it generated a model with the following metadata
For this model the error was
Unexpected number of dimensions for output index 1: got 3D, expected either 2D (BxN with B=1) or 4D
So I checked the model from the example app from the package I am using "google_mlkit_object_detection" and the metadata looks like this
So my question is, how can I alter the models I already trained whichever it is easier, to look like this, both input and output, do I have to alter my model's architecture or just the metadata? The second one trained with the official notebook from tensor flow, it seems that all I have to do is include the correct shape format [1,N], but again I might have to change the architecture.
答案1
得分: 1
Custom Models页面上的Google文档中提到:
> 注意:ML Kit仅支持自定义图像分类模型。
> 虽然AutoML Vision允许训练目标检测模型,
> 但这些模型无法与ML Kit一起使用。
所以,你可以使用自定义目标检测模型,但前提是它们必须是图像分类模型。我以为这是不可能的,因为目标检测模型输出边界框,而分类模型输出类别分数。然而,我尝试了YOLOv8模型,标准的目标检测模型无法使用,但是分类模型,其输出形状为[1, 1000],实际上可以在Google MLKit示例应用程序中使用,并且你可以从中提取边界框。
我不完全确定这是如何实现的,但我怀疑包中捆绑了一个默认的目标检测器,用于识别可能存在物体的位置,然后你只能在其上修改分类模型。
无论如何,简单的答案是:使用输出形状为[1, N]或[1, 1, 1, N]的分类模型,其中N是类别数。如果你有一个不同架构的模型,那么你应该将输出更改为这种格式,否则它是无法正常工作的。
英文:
The Custom Models page on the Google documentation says this:
> Note: ML Kit only supports custom image classification models.
> Although AutoML Vision allows training of object detection models,
> these cannot be used with ML Kit.
So there it is, you can use custom object detection models, but only if they are image classification models. I thought this must be impossible, an object detection model outputs bounding boxes, while a classification model outputs class scores. However, I tried with the YOLOv8 model and the standard object detection model wouldn't work, but the classification model with the [1, 1000] output shape actually works with the Google MLKit example application and you can extract the bounding boxes from it.
I'm not 100% sure how this can work, but what I suspect is that there is a default object detector bundled with the package, which identifies where there could be objects, and then you can only modify the classification model on top of it.
Anyways the simple answer is: Use a classification model with a [1, N] or [1, 1, 1, N] output where N is the number of classes. If you have a model with a different architecture, then you should change the output to this format, otherwise it is not supposed to work.
答案2
得分: 0
Metadata仅用于提供模型信息。除了添加元数据,您需要确保您的模型确实满足要求。
有关如何获取这样的模型的更多详细信息,请参阅此处:https://developers.google.com/ml-kit/custom-models
ML Kit目标检测自定义模型是一个分类器模型。
英文:
Metadata is just for providing the information of the model. Beside adding metadata, you need to make sure your model really meet the requirements.
More details about how to get such a model can be found here: https://developers.google.com/ml-kit/custom-models
The ML Kit Object Detection custom model is a classifier model.
答案3
得分: 0
对于本地设备上的物体检测,我建议您使用tflite_flutter。
无论您使用什么,都需要对图像输入进行标准化。您可以使用类似以下的代码:
import 'package:image/image.dart' as img;
Future<Uint8List> imageToByteListFloat32(
img.Image image, int inputSize, double mean, double std) async {
var convertedBytes = Float32List(1 * inputSize * inputSize * 3);
var buffer = Float32List.view(convertedBytes.buffer);
int pixelIndex = 0;
for (var i = 0; i < inputSize; i++) {
for (var j = 0; j < inputSize; j++) {
var pixel = image.getPixel(j, i);
buffer[pixelIndex++] = (pixel.r - mean) / std;
buffer[pixelIndex++] = (pixel.g - mean) / std;
buffer[pixelIndex++] = (pixel.b - mean) / std;
}
}
return convertedBytes.buffer.asUint8List();
}
然后,您需要弄清楚输出。使用类似neutron的工具来确定输出的形状。所以在您的第一个YOLOv8模型中,输出形状是[1,13,13125]。1表示批处理大小,13中的前4个表示边界框坐标(x,y,宽度,高度),其余部分表示每个类别的得分(我假设您总共有9个类别),13125表示可能的边界框数量。因此,您需要循环遍历每个边界框并筛选出您需要的内容。
英文:
For on-device object detection, I would suggest you to use tflite_flutter.
What ever you use, you will need to normalize the image inputs. You can use something like this;
import 'package:image/image.dart' as img;
Future<Uint8List> imageToByteListFloat32(
img.Image image, int inputSize, double mean, double std) async {
var convertedBytes = Float32List(1 * inputSize * inputSize * 3);
var buffer = Float32List.view(convertedBytes.buffer);
int pixelIndex = 0;
for (var i = 0; i < inputSize; i++) {
for (var j = 0; j < inputSize; j++) {
var pixel = image.getPixel(j, i);
buffer[pixelIndex++] = (pixel.r - mean) / std;
buffer[pixelIndex++] = (pixel.g - mean) / std;
buffer[pixelIndex++] = (pixel.b - mean) / std;
}
}
return convertedBytes.buffer.asUint8List();
}
Then you will have to figure out the outputs. Use something like neutron to identify the output shape. So in your first YOLOv8 model, output shape is [1,13,13125]. 1- batch size, 13 - first 4 represents the bounding box coordinates (x,y,width,height) and remains denote the each class score (I assume you have 9 classes in total.), 13125 - represents the possible number of bounding boxes. So you will have to loop each an every box and filter what you need.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论