Whats the difference between a convolutional autoencoder (CAE) and a convolutional neural network (CNN)

huangapple go评论48阅读模式
英文:

Whats the difference between a convolutional autoencoder (CAE) and a convolutional neural network (CNN)

问题

I'm working on a bachelor's project that involves using a convolutional autoencoder [1]. Now the goal was to make a model that could take as input a pixelated image with text and as output, predict the image with depixelated text.

I also gave labels to my training process. After training several models, I concluded that it is pretty easy to reconstruct pixelated text.

Now while I'm writing a paper about the project, I'm really struggling to understand what exactly is a convolutional autoencoder and what makes it a convolutional autoencoder.

I'm completely new to any type of ML. I found that autoencoders are neural networks that aim to minimize the difference between the output and input.

What makes it a "convolutional" autoencoder is the fact that it uses convolutions in its encoder to detect edges etc.

In my case, I depixelated pixelated text in images, so the output is not meant to be as close as possible to the input image but rather be the depixelated version.

Another thing is that everywhere I look for answers, it is stated that autoencoders are primarily used for unsupervised learning. While in my case, I use supervised learning since I pass the labels to the training process.

Lastly, since I don't try to minimize the difference between the output and input image, AND I use supervised learning, then what exactly is the difference between a convolutional autoencoder and a convolutional neural network?

英文:

I'm working on a bachelor's project that involves using a convolutional autoencoder [1]. I used the code from this blog. Now the goal was to make a model that could take as input a pixelated image with text and as output, predict the image with depixelated text. The only change I made from the "convolutional autoencoder" code in the reference is that I also gave labels to my training process. After training several models, I concluded that it is pretty easy to reconstruct pixelated text.

Now while I'm writing a paper about the project, I'm really struggling to understand what exactly is a convolutional autoencoder and what makes it a convolutional autoencoder.

I'm completely new to any type of ML. When I did research on autoencoders in general, I found that autoencoders are neural networks that aim to minimize the difference between the output and input. And what makes it a "convolutional" autoencoder is the fact that it uses convolutions in its encoder to detect edges etc.

-But now in my case I depixelated pixelated text in images, so the output is not meant to be as close as possible to the input image but rather be the depixelated version.

-Another thing is that everywhere I look for answers, it is stated that autoencoders are primarily used for unsupervised learning. While in my case, I use supervised learning since I pass the labels to the training process.

-Lastly, since I don't try to minimize the difference between the output and input image, AND I use supervised learning, then what exactly is the difference between a convolutional autoencoder and a convolutional neural network?

[1] https://blog.keras.io/building-autoencoders-in-keras.html

答案1

得分: 1

自动编码器(AE)是一种自监督神经网络模型,旨在学习其输入的表示,例如图像。我来解释一下:

  • AE由两个神经网络组成:编码器 f(x) -> z解码器 g(z) -> x;第三个重要部分是z,称为潜在空间或瓶颈。
  • AEs的目标是学习z,其中它的大小小于x,通过最小化重建目标来实现。基本上,潜在空间是x的紧凑表示。
  • 有许多AE的变种,其中一种称为去噪自动编码器(DAE),这是您正在使用的一种,因为DAE旨在学习一个被一些噪声损坏的“清晰”版本的x
  • CAE是一种AE,其中编码器和解码器是CNN。实际上,还有更严格的定义,要求z是一个3D张量(如果考虑批量大小,则是4D),称为卷积瓶颈(因为它具有与图像一样的三个维度)。
  • AE通常是自监督的,因为它自己的输入用作监督:这些输入是图像。

因此,CAE是具有卷积层和卷积瓶颈的AE。相反,您也可以将CNN视为其他类型的模型的构建块:分类器,回归器,自动编码器本身等等。

英文:

An auto-encoder (AE) is a self-supervised neural network model, that aims to learn a representation of its inputs, e.g. images. I'll explain myself:

  • The AE is made of two NNs: the encoder f(x) -> z, and the decoder g(z) -> x; a third important piece is the z, called the latent space or bottleneck.
  • AEs aim at learning z, where its size is smaller than x, by minimizing a reconstruction objective. Basically the latent space is a compact representation of x.
  • There are many variations of AEs, one is called Denoising AE (DAE) and is the one your are using, because DAE aims at learning a "clear" version of x that is corrupted by some noise.
  • A CAE is an AE in which the encoder and decoder are CNNs. Actually, there are more strict definitions which require z to be a 3D tensor (4D if you consider the batch size too) called the convolutional bottleneck (because it has the three dimensions as an image).
  • The AE is usually self-supervised because you its own inputs are supervision: the images.

So a CAE is an AE with convolutional layers, and a convolutional bottleneck. Instead, you can consider a CNN as a building block for other kinds of models too: classifiers, regressors, AEs itself, and others.

huangapple
  • 本文由 发表于 2023年5月21日 00:00:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76296117.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定