target encoding
Published 2025-05-18
Target encoding is a method to encode a categorical variable by grouping some function of the target by category. A common (and often most effective) function is the mean; target encoding is then using the mean of the target across occurrences of the category as the encoding. More formally:
Target means are usually computed OOF to avoid leakage — as an extreme case, naively target encoding a category with a single occurrence would encode the target directly.
Variants of target encoding:
- frequency encoding: replace each category with the count of how often it appears in the data
- feature crossing: combine categorical values using n-gram encoding
- embedding encoding: for high-cardinality features, learn an embedding for each category