Detailed explanation of transformer

encoder

Words are vectorized, and words with similar semantics have vectors close to each other.
Then positional encoding is applied.
Finally, the semantic vector and position vector are combined into a single token.

Feedforward Neural Network

Self-Attention Mechanism

decoder