What is padding in the folding network

What is the difference between expanded folding and unfolding?


In some kind of mechanistic / figurative / image-based terms:


The dilation is largely the same as ordinary convolution (frankly, deconvolution), except that it introduces gaps into your kernel, that is, while a standard kernel usually slides over contiguous sections of input, its extended counterpart, For example, can "circle "You get a bigger part of the picture - while still only having as many weights / inputs as the standard shape.

(Take good note of while the dilation zeros in the Kernel inserts, around the face dimensions / resolution of the output increase faster reduce , the transposed convolution injects zeros into the input , around the resolution of the issue too increase .)

To make this more specific, let's take a very simple example:
Suppose you have a 9x9 picture, x without filling. If you're using a standard 3x3 kernel with step 2, the first subset affected is the input x [0: 2, 0: 2], and all nine points within these limits are taken into account by the kernel. You would then over x Delete [0: 2, 2: 4] and so on.

It is clear that the output has smaller face dimensions, especially 4x4. The neurons of the next layer thus have reception fields of the exact size of these kernel passages. However, if you need or want neurons with more global spatial knowledge (e.g. if an important feature can only be defined in larger regions), you have to fold this layer a second time to create a third layer in which the effective receiving field is located is some union of the previous layers rf.

However, if you don't want to add more layers and / or feel that the information passed is too redundant (i.e. your 3x3 receive fields in the second layer only contain a "2x2" amount of different information), you can use an advanced one Filter. For the sake of clarity, let's be extreme and say we're going to use a 9x9 3-dialed filter. Now our filter "circles" the entire entrance so that we don't have to move it at all. However, we will still only be entering 3x3 = 9 data points from the input take x , typically:

x [0.0] U x [0.4] U x [0.8] U x [4.0] U x [4.4] U x [4.8] U x [8.0] U x [8.4] U x [8,8]

Now the neuron in our next layer (we only have one) has data that "represents" a much larger part of our image, and if the data of the image is highly redundant for neighboring data, we may have received the same information and learned one equivalent transformation, but with fewer layers and fewer parameters. I think, within the bounds of this description, it is clear that while it can be defined as resampling, we have one here for each kernel Perform downsampling .

Fraction or transposed or "unfolding":

This type is still a fold in the heart. Again, the difference is that we will switch from a smaller input volume to a larger output volume. OP didn't ask any questions about upsampling, so this time I'll save a little bit of width and go straight to the corresponding example.

In our previous 9x9 case, let's say we now want to upsampling to 11x11. In this case we have two common options: We can take a 3x3 kernel and with step 1 and swipe it with 2 padding over our 3x3 input so that our first pass is over the region [left pad-2: 1, over Pad-2: 1] then [left pad-1: 2, over pad-2: 1] and so on and so on.

Alternatively, we can also provide a supplement between insert the input data and swipe the kernel over it without so much padding. Of course, sometimes we'll be with each other more than once exactly the same Deal with entry points for a single kernel. Here the term "fractionated" seems to be more reasonable. I think the animation below (borrowed from here and (I believe) based on this work) will help clear things up, even though they have different dimensions. The input is blue, the white injected zeros and fills, and the output is green:

Of course, we are dealing with all of the input data as opposed to dilation, where some regions may or may not be completely ignored. And since we clearly have more data than we started, "upsampling".

I encourage you to read the excellent document I have linked to for a more informed, abstract definition and explanation of transposition convolution, and to learn why the examples shared are illustrative but largely inadequate forms for the actual computation of those presented Transformation are.

Although they both seem to be doing the same thing, which is to sample up a level, there is clear leeway between them.

First, let's talk about the advanced folding

I found this lovely blog on the above topic. As I understand it, this is more of a comprehensive study of the Input data points . Or increase the receive field of the convolution operation.

Here is an advanced convolution diagram from the paper.

This is more of a normal convolution, but it helps capture more and more global context from input pixels without increasing the size of the parameters. This can also help increase the spatial size of the output. The main thing here, however, is that this increases the size of the receiving field exponentially with the number of layers. This is very common in the field of signal processing.

This blog really explains what's new in expanded fold and how it compares to normal fold.

Blog: Dilated Convolutions and Kronecker Factored Convolutions

Now I'll explain what Unfolding is

This is called transposed convolution. This corresponds to the function we used for the convolution in back propagation.

In backprop we simply distribute gradients from a neuron in the output feature map to all elements in the receive fields. Then we also add gradients where they coincide with the same receiving elements

Here's a good resource with pictures.

So the basic idea is that unfolding works in the output space. No input pixels. An attempt is made to create wider spatial dimensions in the output map. This is done in fully folded neural networks are used for semantic segmentation .

So more of deconvolution is a learnable up-sampling layer.

An attempt is made to learn how to create a sample while combining it with the final loss

This is the best explanation I have found for unfolding. Lecture 13 in cs231 from 9:21 p.m.

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.