“BinaryCrossentropy” 和 “binary_crossentropy” 在 tf.keras.losses 中的区别是什么?

huangapple go评论155阅读模式
英文:

Difference about "BinaryCrossentropy" and "binary_crossentropy" in tf.keras.losses?

问题

我正在使用TensorFlow 2.0和tf.GradientTape()训练模型,但我发现如果我使用tf.keras.losses.BinaryCrossentropy,模型的准确度为95%,但如果我使用tf.keras.losses.binary_crossentropy,准确度下降到75%。所以我对这两者之间的差异感到困惑。

  1. import pandas as pd
  2. import numpy as np
  3. import tensorflow as tf
  4. from tensorflow.keras import layers
  5. from sklearn.model_selection import train_test_split
  6. def read_data():
  7. red_wine = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=";")
  8. white_wine = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", sep=";")
  9. red_wine["type"] = 1
  10. white_wine["type"] = 0
  11. wines = red_wine.append(white_wine)
  12. return wines
  13. def get_x_y(df):
  14. x = df.iloc[:, :-1].values.astype(np.float32)
  15. y = df.iloc[:, -1].values.astype(np.int32)
  16. return x, y
  17. def build_model():
  18. inputs = layers.Input(shape=(12,))
  19. dense1 = layers.Dense(12, activation="relu", name="dense1")(inputs)
  20. dense2 = layers.Dense(9, activation="relu", name="dense2")(dense1)
  21. outputs = layers.Dense(1, activation="sigmoid", name="outputs")(dense2)
  22. model = tf.keras.Model(inputs=inputs, outputs=outputs)
  23. return model
  24. def generate_dataset(df, batch_size=32, shuffle=True, train_or_test="train"):
  25. x, y = get_x_y(df)
  26. ds = tf.data.Dataset.from_tensor_slices((x, y))
  27. if shuffle:
  28. ds = ds.shuffle(10000)
  29. if train_or_test == "train":
  30. ds = ds.batch(batch_size)
  31. else:
  32. ds = ds.batch(len(df))
  33. return ds
  34. loss_object = tf.keras.losses.BinaryCrossentropy()
  35. optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
  36. def train_step(model, optimizer, x, y):
  37. with tf.GradientTape() as tape:
  38. pred = model(x, training=True)
  39. loss = loss_object(y, pred)
  40. grads = tape.gradient(loss, model.trainable_variables)
  41. optimizer.apply_gradients(zip(grads, model.trainable_variables))
  42. def train_model(model, train_ds, epochs=10):
  43. for epoch in range(epochs):
  44. print(epoch)
  45. for x, y in train_ds:
  46. train_step(model, optimizer, x, y)
  47. def main():
  48. data = read_data()
  49. train, test = train_test_split(data, test_size=0.2, random_state=23)
  50. train_ds = generate_dataset(train, 32, True, "train")
  51. test_ds = generate_dataset(test, 32, False, "test")
  52. model = build_model()
  53. train_model(model, train_ds, 10)
  54. model.compile(loss='binary_crossentropy',
  55. optimizer='adam',
  56. metrics=['accuracy']
  57. )
  58. model.evaluate(test_ds)
  59. main()
英文:

I'm training a model using TensorFlow 2.0 using tf.GradientTape(), but I find that the model's accuracy is 95% if I use tf.keras.losses.BinaryCrossentropy, but degrade to 75% if I use tf.keras.losses.binary_crossentropy. So I'm confused about the difference about the same metric here?

  1. import pandas as pd
  2. import numpy as np
  3. import tensorflow as tf
  4. from tensorflow.keras import layers
  5. from sklearn.model_selection import train_test_split
  6. def read_data():
  7. red_wine = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=";")
  8. white_wine = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", sep=";")
  9. red_wine["type"] = 1
  10. white_wine["type"] = 0
  11. wines = red_wine.append(white_wine)
  12. return wines
  13. def get_x_y(df):
  14. x = df.iloc[:, :-1].values.astype(np.float32)
  15. y = df.iloc[:, -1].values.astype(np.int32)
  16. return x, y
  17. def build_model():
  18. inputs = layers.Input(shape=(12,))
  19. dense1 = layers.Dense(12, activation="relu", name="dense1")(inputs)
  20. dense2 = layers.Dense(9, activation="relu", name="dense2")(dense1)
  21. outputs = layers.Dense(1, activation = "sigmoid", name="outputs")(dense2)
  22. model = tf.keras.Model(inputs=inputs, outputs=outputs)
  23. return model
  24. def generate_dataset(df, batch_size=32, shuffle=True, train_or_test = "train"):
  25. x, y = get_x_y(df)
  26. ds = tf.data.Dataset.from_tensor_slices((x, y))
  27. if shuffle:
  28. ds = ds.shuffle(10000)
  29. if train_or_test == "train":
  30. ds = ds.batch(batch_size)
  31. else:
  32. ds = ds.batch(len(df))
  33. return ds
  34. # loss_object = tf.keras.losses.binary_crossentropy
  35. loss_object = tf.keras.losses.BinaryCrossentropy()
  36. optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
  37. def train_step(model, optimizer, x, y):
  38. with tf.GradientTape() as tape:
  39. pred = model(x, training=True)
  40. loss = loss_object(y, pred)
  41. grads = tape.gradient(loss, model.trainable_variables)
  42. optimizer.apply_gradients(zip(grads, model.trainable_variables))
  43. def train_model(model, train_ds, epochs=10):
  44. for epoch in range(epochs):
  45. print(epoch)
  46. for x, y in train_ds:
  47. train_step(model, optimizer, x, y)
  48. def main():
  49. data = read_data()
  50. train, test = train_test_split(data, test_size=0.2, random_state=23)
  51. train_ds = generate_dataset(train, 32, True, "train")
  52. test_ds = generate_dataset(test, 32, False, "test")
  53. model = build_model()
  54. train_model(model, train_ds, 10)
  55. model.compile(loss='binary_crossentropy',
  56. optimizer='adam',
  57. metrics=['accuracy']
  58. )
  59. model.evaluate(test_ds)
  60. main()

答案1

得分: 3

以下是要翻译的内容:

"They should indeed work the same; BinaryCrossentropy uses binary_crossentropy, with difference apparent in docstring descriptions; former's intended for two class labels, whereas later supports an arbitrary class count. However, if passing in targets in expected format, both apply same preprocessing before calling backend's binary_crossentropy, which does the actual computing.

The difference you observe is likely a reproducibility issue; ensure you set the random seed - see function below. For a more complete answer on reproducibility, see here.

Function

  1. def reset_seeds(reset_graph_with_backend=None):
  2. if reset_graph_with_backend is not None:
  3. K = reset_graph_with_backend
  4. K.clear_session()
  5. tf.compat.v1.reset_default_graph()
  6. print("KERAS AND TENSORFLOW GRAPHS RESET") # optional
  7. np.random.seed(1)
  8. random.seed(2)
  9. tf.compat.v1.set_random_seed(3)
  10. print("RANDOM SEEDS RESET") # optional

Usage:

  1. import tensorflow as tf
  2. import tensorflow.keras.backend as K
  3. reset_seeds(K)
英文:

They should indeed work the same; BinaryCrossentropy uses binary_crossentropy, with difference apparent in docstring descriptions; former's intended for two class labels, whereas later supports an arbitrary class count. However, if passing in targets in expected format, both apply same preprocessing before calling backend's binary_crossentropy, which does the actual computing.

The difference you observe is likely a reproducibility issue; ensure you set the random seed - see function below. For a more complete answer on reproducibility, see here.

<hr>

Function

  1. def reset_seeds(reset_graph_with_backend=None):
  2. if reset_graph_with_backend is not None:
  3. K = reset_graph_with_backend
  4. K.clear_session()
  5. tf.compat.v1.reset_default_graph()
  6. print(&quot;KERAS AND TENSORFLOW GRAPHS RESET&quot;) # optional
  7. np.random.seed(1)
  8. random.seed(2)
  9. tf.compat.v1.set_random_seed(3)
  10. print(&quot;RANDOM SEEDS RESET&quot;) # optional

<hr>

Usage:

  1. import tensorflow as tf
  2. import tensorflow.keras.backend as K
  3. reset_seeds(K)

答案2

得分: 1

  1. 模型中 outputs 的形状是 (None, 1),但提供的标签是 (None, ),这导致了与Python的广播机制意义不符。
  2. tf.keras.losses.BinaryCrossentropy() 源代码中,在计算损失时,y_predy_true 都通过名为 squeeze_or_expand_dimensions 的函数处理,而在 tf.keras.losses.binary_crossentropy 中则缺少此函数。
  3. 注意:确保输入数据和模型输出之间的形状一致。
英文:

Thanks, I find the reasons of the inconsistent accuracy:

  1. The shape of outputs in the model is (None, 1), but the feeded label is (None, ), which cause a wrong meaning with python's broadcast mechanism.

  2. In the source code of tf.keras.losses.BinaryCrossentropy(), while calculating the loss, both y_pred and y_true are processed through a function called squeeze_or_expand_dimensions, which is lacked in tf.keras.losses.binary_crossentropy.

  3. Note: Take care that whether the shape is consistent between input data and model outputs.

huangapple
  • 本文由 发表于 2020年1月6日 21:22:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/59612914.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定