Short Season, Long Models - Dealing with Seasonality
- by Michel Adar
Accounting for seasonality presents a challenge for the
accurate prediction of events. Examples of seasonality include: ·
Boxed cosmetics sets are more popular during
Christmas. They sell at other times of the year, but they rise higher than other
products during the holiday season.
·
Interest in a promotion rises around the time
advertising on TV airs
·
Interest in the Sports section of a newspaper
rises when there is a big football match
There
are several ways of dealing with seasonality in predictions.
Time Windows
If the length of the model time windows is short enough
relative to the seasonality effect, then the models will see only seasonal
data, and therefore will be accurate in their predictions. For example, a model
with a weekly time window may be quick enough to adapt during the holiday
season.
In order for time windows to be useful in dealing with
seasonality it is necessary that:
The time window is
significantly shorter than the season changes
There is enough volume of
data in the short time windows to produce an accurate model
An additional issue to consider is that sometimes the season
may have an abrupt end, for example the day after Christmas.
Input Data
If available, it is possible to include the seasonality
effect in the input data for the model. For example the customer record may
include a list of all the promotions advertised in the area of residence.
A model with these inputs will have to learn the effect of
the input. It is possible to learn it specific to the promotion – and by the
way learn about inter-promotion cross feeding – by leaving the list of ads as
it is; or it is possible to learn the general effect by having a flag that
indicates if the promotion is being advertised.
For inputs to properly represent the effect in the model it
is necessary that:
The model sees enough
events with the input present. For example, by virtue of the model
lifetime (or time window) being long enough to see several “seasons” or by
having enough volume for the model to learn seasonality quickly.
Proportional Frequency
If we create a model that ignores seasonality it is possible
to use that model to predict how the specific person likelihood differs from
average. If we have a divergence from average then we can transfer that
divergence proportionally to the observed frequency at the time of the
prediction.
Definitions:
Ft = trailing average frequency of the event at time “t”. The
average is done over a suitable period of to achieve a statistical significant
estimate.
F = average frequency as seen by the model.
L = likelihood predicted by the model for a specific person
Lt = predicted likelihood proportionally scaled for time “t”.
If the model is good at predicting deviation from average,
and this holds over the interesting range of seasons, then we can estimate Lt as:
Lt = L * (Ft / F)
Considering that:
L = (L – F) + F
Substituting we get:
Lt = [(L – F) + F] * (Ft / F)
Which simplifies to:
(i)
Lt = (L – F) * (Ft / F) + Ft
This latest expression can be understood as “The adjusted
likelihood at time t is the average likelihood at time t plus the effect from
the model, which is calculated as the difference from average time the
proportion of frequencies”.
The formula above assumes a linear translation of the
proportion. It is possible to generalize the formula using a factor which we
will call “a” as follows:
(ii)
Lt = (L – F) * (Ft / F) * a + Ft
It is also possible to use a formula that does not scale the
difference, like:
(iii)
Lt = (L – F) * a
+ Ft
While these formulas seem reasonable, they should be taken
as hypothesis to be proven with empirical data. A theoretical analysis provides
the following insights:
The Cumulative Gains Chart
(lift) should stay the same, as at any given time the order of the
likelihood for different customers is preserved
If F is equal to Ft then
the formula reverts to “L”
If (Ft = 0) then Lt in (i)
and (ii) is 0
It is possible for Lt to
be above 1.
If it is desired to avoid going over 1, for relatively high
base frequencies it is possible to use a relative interpretation of the
multiplicative factor.
For example, if we say that Y is twice as likely as X, then
we can interpret this sentence as:
If X is 3%, then Y is 6%
If X is 11%, then Y is 22%
If X is 70%, then Y is 85%
- in this case we interpret “twice as likely” as “half as likely to not
happen”
Applying this reasoning to (i) for example we would get:
If (L < F) or (Ft < (1 /
((L/F) + 1))
Then Lt = L * (Ft / F)
Else
Lt = 1 – (F / L) + (Ft * F / L)