target encoding

Published 2025-05-18

Target encoding is a method to encode a categorical variable by grouping some function of the target by category. A common (and often most effective) function is the mean; target encoding is then using the mean of the target across occurrences of the category as the encoding. More formally:

\operatorname{TE}(c)=\frac{1}{N_c}\sum_{i:\;x_i=c} y_i

Target means are usually computed OOF to avoid leakage — as an extreme case, naively target encoding a category with a single occurrence would encode the target directly.

Variants of target encoding:

frequency encoding: replace each category with the count of how often it appears in the data
feature crossing: combine categorical values using n-gram encoding
embedding encoding: for high-cardinality features, learn an embedding for each category