英文:
Understanding tf.keras.layers.Dense()
问题
根据您提供的内容,以下是翻译好的部分:
我试图理解为什么直接计算密集层操作和使用`keras`实现之间存在差异。
根据文档(https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense),`tf.keras.layers.Dense()`应该实现操作`output = activation(dot(input, kernel) + bias)`,但下面的`result`和`result1`不相同。
```python
tf.random.set_seed(1)
bias = tf.Variable(tf.random.uniform(shape=(5,1)), dtype=tf.float32)
kernel = tf.Variable(tf.random.uniform(shape=(5,10)), dtype=tf.float32)
x = tf.constant(tf.random.uniform(shape=(10,1), dtype=tf.float32))
result = tf.nn.relu(tf.linalg.matmul(a=weights, b=x) + biases)
tf.print(result)
test = tf.keras.layers.Dense(units = 5,
activation = 'relu',
use_bias = True,
kernel_initializer = tf.keras.initializers.Constant(value=kernel),
bias_initializer = tf.keras.initializers.Constant(value=bias),
dtype=tf.float32)
result1 = test(tf.transpose(x))
print()
tf.print(result1)
输出
[[2.87080455]
[3.25458574]
[3.28776264]
[3.14319134]
[2.04760242]]
[[2.38769 3.63470697 2.62423944 3.31286287 2.91121125]]
使用test.get_weights()
我可以看到内核和偏差(b
)已设置为正确的值。我正在使用TF版本2.12.0。
<details>
<summary>英文:</summary>
I am trying to understand why there is a difference between calculating a dense layer operation directly and using the `keras` implementation.
Following the documentation (https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) `tf.keras.layers.Dense()` should implement the operation `output = activation(dot(input, kernel) + bias)` but `result` and `result1` below are not the same.
```python
tf.random.set_seed(1)
bias = tf.Variable(tf.random.uniform(shape=(5,1)), dtype=tf.float32)
kernel = tf.Variable(tf.random.uniform(shape=(5,10)), dtype=tf.float32)
x = tf.constant(tf.random.uniform(shape=(10,1), dtype=tf.float32))
result = tf.nn.relu(tf.linalg.matmul(a=weights, b=x) + biases)
tf.print(result)
test = tf.keras.layers.Dense(units = 5,
activation = 'relu',
use_bias = True,
kernel_initializer = tf.keras.initializers.Constant(value=kernel),
bias_initializer = tf.keras.initializers.Constant(value=bias),
dtype=tf.float32)
result1 = test(tf.transpose(x))
print()
tf.print(result1)
output
[[2.87080455]
[3.25458574]
[3.28776264]
[3.14319134]
[2.04760242]]
[[2.38769 3.63470697 2.62423944 3.31286287 2.91121125]]
Using test.get_weights()
I can see that the kernel and bias (b
) are getting set to the correct values. I am using TF version 2.12.0.
答案1
得分: 0
经过一些实验,我意识到密集层的kernel
需要具有shape=(10,5)
,而不是原始问题中的(5,10)
。这是因为units=5
,所以需要传递大小为10
的向量(因此input_shape=(10,)
被注释掉作为提醒)。以下是已校正的代码:
tf.random.set_seed(1)
bias = tf.Variable(tf.random.uniform(shape=(5,1)), dtype=tf.float32)
kernel = tf.Variable(tf.random.uniform(shape=(10,5)), dtype=tf.float32)
x = tf.constant(tf.random.uniform(shape=(10,1), dtype=tf.float32))
result = tf.nn.relu(tf.linalg.matmul(a=weights, b=x, transpose_a=True) + biases)
tf.print(result)
test = tf.keras.layers.Dense(units = 5,
# input_shape=(10,),
activation = 'relu',
use_bias = True,
kernel_initializer = tf.keras.initializers.Constant(value=kernel),
bias_initializer = tf.keras.initializers.Constant(value=bias),
dtype=tf.float32)
result1 = test(tf.transpose(x))
print()
tf.print(result1)
[[2.38769]
[3.63470697]
[2.62423944]
[3.31286287]
[2.91121125]]
[[2.38769 3.63470697 2.62423944 3.31286287 2.91121125]]
最终,我不完全确定底层发生了什么以及为什么keras
没有引发错误。我将检查tf.keras.layers.Dense()
的实现,但任何已经了解代码的人的想法或建议都会非常赞赏!
英文:
After some experimentation I realized that the kernel
for the dense layer needs to be of shape=(10,5)
as apposed to (5,10)
as in the code from the original question above. This is implicit because units=5
so a vector of size 10
needs to be passed (hence why input_shape=(10,)
is commented out as a reminder). Below is the corrected code:
tf.random.set_seed(1)
bias = tf.Variable(tf.random.uniform(shape=(5,1)), dtype=tf.float32)
kernel = tf.Variable(tf.random.uniform(shape=(10,5)), dtype=tf.float32)
x = tf.constant(tf.random.uniform(shape=(10,1), dtype=tf.float32))
result = tf.nn.relu(tf.linalg.matmul(a=weights, b=x, transpose_a=True) + biases)
tf.print(result)
test = tf.keras.layers.Dense(units = 5,
# input_shape=(10,),
activation = 'relu',
use_bias = True,
kernel_initializer = tf.keras.initializers.Constant(value=kernel),
bias_initializer = tf.keras.initializers.Constant(value=bias),
dtype=tf.float32)
result1 = test(tf.transpose(x))
print()
tf.print(result1)
[[2.38769]
[3.63470697]
[2.62423944]
[3.31286287]
[2.91121125]]
[[2.38769 3.63470697 2.62423944 3.31286287 2.91121125]]
Ultimately, I am not entirely sure what was happening under the hood and why keras
did not raise an error. I will check with the tf.keras.layers.Dense()
implementation but any thoughts or suggestions by someone who knows the code already are highly appreciated!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论