As an extension of the basic model, we made certain variations in
the inference. First, we allow a single patch in the image being
mapped from a group sub-patches located in different positions of
the epitome. For instance, a
patch can correspond to
a single
patch in the epitome, or four
sub-patches, or nine
ones. In this way, we
introduce variation on the patch sizes, so that the epitome
captures patterns from both large and small areas.
However, how the patches are mapped (beyond where they are mapped)
becomes another set hidden variables
.
denotes the grouping method for
from all the sub-patches in
. Its dependency graph is depicted as follows, where the mapping
and grouping are independent given the epitome of the image.
The joint distribution hence becomes
| (15) | |||
| (16) | |||
![]() |
(17) |
To train the epitome, we modify the target function as
| (18) | |||
| (19) |
Instead of looping over both
and
, we can simplify the
problem by only considering the most likely grouping method for a
given mapping. In other words,
where
In this way, we can apply the same EM formulas. But in the
``expectation'' step, we compute
first for each
, then
substitute the posterior probability by
in 11.
The second variation we made is that we allow linear transformations on a patch, such as flips and rotations. We expect the epitome can be more condensed, since the patches with symmetric properties in the original image can be generated from the same patch in the epitome.
Again, we make the inference hard by introducing transformations
as hidden variables. But we can tackle that by
the same simplification method as in 20 and
22. We compute the best transformation for every
possible mapping, given the observed patch and the epitome
parameters. We use that configuration in the posterior calculations
and update the model parameters accordingly.