Awesome, really nice! Few small remarks, not sure if it should change the text of the blog post though:
1) The model is essentially a mix between a TCN and WaveNet, taking the strong components of both (not sure if you want to mention that though).
2) For our library, I implemented a direct method (the original method is autoregressive), so it doesn't follow the paper completely (simply because it's faster and performs better). Also, the original paper uses an Embedding for the time series ID that I omitted (again, basically following how we implement stuff in NeuralForecast).
3) I'd remove the sentence about the STudent's t-distribution, and maybe change it to 'BiTCN can output point and probabilistic predictions, depending on your choice of loss function. In the paper the authors used a Student's t-distribution to produce probabilistic forecasts'.
4) The key thing is the backward and forward encoding of the sequence data (Fig 2 of the paper), especially the forward dilated convolutions to encode future data 'backwards' to the current timestamp is the key innovation. The overall architecture, activation function etc are somewhat secondary.