Test and valid generators 出现问题

huangapple go评论58阅读模式
英文:

Wrong with Test and valid generators

问题

在你的代码中,出现了一些问题。主要的问题可能是文件路径和列名的不匹配以及训练数据和测试数据不匹配。

  1. 首先,确保你的文件路径 IMAGE_DIR 是正确的,并且指向包含测试图像的文件夹。

  2. 在你的测试数据框架 test_df 中,确保列名 Image 与实际的图像文件名列相匹配。这是通过参数 x_col="Image" 指定的。如果列名不匹配,将导致无效的文件名错误。

  3. 你的训练数据框架 train_df 中似乎没有提供正确的图像文件名或标签列,因为你在训练数据生成器中得到了 Found 0 validated image filenames 的错误信息。确保 train_df 中的数据正确,并且包含了训练图像的文件名和标签。

  4. 还要确保测试数据和训练数据的文件路径和数据匹配。你可以通过输出测试数据框架的头部来检查它的内容,以确保列名和文件路径正确。

  5. 最后,确保图像文件位于指定的文件夹中,文件名与数据框架中的文件名匹配。

根据这些建议,你应该能够解决这个问题。如果问题仍然存在,可能需要更多的信息来进一步诊断问题。

英文:

I've been working in my model, I came to generators part.

when I use to read test.csv file like this:

test_df = pd.read_csv("Test.csv")

Its going well!

and when I came to generator part:

def get_test_generator(test_df,  train_df, image_dir, x_col, y_cols, sample_size=100, batch_size=8, seed=1, target_w = 320, target_h = 320):
    """
    Return generator for test set using 
    normalization statistics from training set.

    Args:
      test_df (dataframe): dataframe specifying test data.
      image_dir (str): directory where image files are held.
      x_col (str): name of column in df that holds filenames.
      y_cols (list): list of strings that hold y labels for images.
      sample_size (int): size of sample to use for normalization statistics.
      batch_size (int): images per batch to be fed into model during training.
      seed (int): random seed.
      target_w (int): final width of input images.
      target_h (int): final height of input images.
    
    Returns:
        test_generator (DataFrameIterator): iterators over test set
    """
    print("getting train generators...")
    # get generator to sample dataset
    raw_train_generator = ImageDataGenerator().flow_from_dataframe(
        dataframe=train_df, 
        directory=IMAGE_DIR, 
        x_col="Image", 
        y_col=labels, 
        class_mode="raw", 
        batch_size=sample_size, 
        shuffle=True, 
        target_size=(target_w, target_h))
    
    # get data sample
    batch = raw_train_generator.next()
    data_sample = batch[0]

    # use sample to fit mean and std for test set generator
    image_generator = ImageDataGenerator(
        featurewise_center=True,
        featurewise_std_normalization= True)
    
    # fit generator to sample from training data
    image_generator.fit(data_sample)

    # get test generator
 

    test_generator = image_generator.flow_from_dataframe(
            dataframe=test_df,
            directory=image_dir,
            x_col=x_col,
            y_col=y_cols,
            class_mode="raw",
            batch_size=batch_size,
            shuffle=False,
            seed=seed,
            target_size=(target_w,target_h))
    return test_generator

when I run this cell in Jupyter :

IMAGE_DIR = '/Users/awabe/Desktop/Project/PapilaDB/FundusImages test'

test_generator= get_test_generator(test_df, train_df, IMAGE_DIR, "Image", labels)

to read the images

it give me the error:

getting train generators...
Found 0 validated image filenames.
Found 488 validated image filenames.
/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/preprocessing/image.py:1139: UserWarning: Found 488 invalid image filename(s) in x_col="Image". These filename(s) will be ignored.
  warnings.warn(
/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/numpy/core/_methods.py:182: RuntimeWarning: invalid value encountered in divide
  ret = um.true_divide(
/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/numpy/core/_methods.py:265: RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in divide
  arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in divide
  ret = um.true_divide(

(the 488 images in the second line belongs to the train generator which works fine)

where is the wrong here?

答案1

得分: 0

"Image"栏中的图像名称应包含图像名称和扩展名。例如:

假设目录文件中的名称为:image1.jpg

因此,CSV中的名称应为:image1.jpg

如果您将其写成:image1

您将收到错误信息。

所以很简单 🤦。

感谢上面的TFer2。

英文:

the images name in the column Image should contain the image name + the extension. for example:

assume the name in the directory file is: image1.jpg

so the name in the csv should be: image1.jpg

if you write it as: image1

you will get an error back.

so simple 🤦.

thanks to the upper guy TFer2

huangapple
  • 本文由 发表于 2023年2月24日 06:04:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75550794.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定