From today, I want to practice my english writing skills. So I will write all my afterwards blogs in English. There might be many errors, but I will try my best to make it clear for all of you.

TCN is proposed by the paper “An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling“. Although it is out of date because this paper was published in 2018. My friend tells me that it is still a very strong baseline for the task of time series.

The model is constructed based on a simple idea: **TCN = 1D FCN + causal convolutions**. Thus, this model has two key problems to solve: (1) How to make the 1D FCN be able to process the longer sequence? (2) Which convolution should we use in this model?

The answer for the first problem is **Causal Convolutions**. Although the paper uses a complicated equation to describe it, we can state it more simply: **enlarge the interval between two convoluted elements layer by layer.**

For example, there is a sequence x_{1}, x_{2}, \cdots, x_{n}, The first convolution layer (kernel size is 3) uses x_{i-2}, x_{i-1}, x_{i} to obtain x^{(1)}_i. Then the second convolution layer uses x^{(1)}_{i-4}, x^{(1)}_{i-2}, x^{(1)}_{i} to obtain x^{(2)}_i. So the interval of the first convolution layer is 1, the second is 2, the third maybe 4, it’s all up to you. Now, we can see that x^{(2)}_i can aggregate information from a long distance in this way. The figure below clearly shows this procedure:

For the second problem, TCN uses the same residual connections as the ResNet. To account for discrepant input-output widths, it uses an additional 1 \times 1 convolution to ensure that element-wise addition receives tensors of the same shape. More details can be seen in the figure below.

Besides, the paper summarised the advantages and disadvantages of TCN. The advantages are:

- Parallelism.
- Flexible receptive field size.
- Stable gradients.
- Low memory requirement for training.
- Variable length inputs.

There are also two notable disadvantages to using TCNs:

- Data storage during evaluation.
- Potential parameter change for a transfer of domain.

I think it is a pretty simple idea while it is powerful for the task of time series. To some extent, this model is similar to attention and can be seen as a weighted summation of the input.