target encoding

Published 2025-05-18

Target encoding is a method to encode a categorical variable by grouping some function of the target by category. A common (and often most effective) function is the mean; target encoding is then using the mean of the target across occurrences of the category as the encoding. More formally:

TE(c)=1Nci:  xi=cyi\operatorname{TE}(c)=\frac{1}{N_c}\sum_{i:\;x_i=c} y_i

Target means are usually computed OOF to avoid leakage — as an extreme case, naively target encoding a category with a single occurrence would encode the target directly.

Variants of target encoding: