It's just an implementation detail. As scale increases, at some point you reach

an even multiple -- twice the size of the current base scale. At that point,

instead of continuing to blur the full current image, it's computationally

efficient to down-sample it because that can be done directly from the 2X blur

(without interpolation) by skipping every other pixel.

Historically, starting with Marr, then with Burt and Adelson (sp?), scale-space

pyramids were typically done in 2X increments called octaves. A few years before

SIFT appeared, several researchers (most notably Lindeberg and Crowley) were

starting to experiment with scale steps smaller than 2X. The terminology in

Lowe's work reflects the terminology of his contemporaries.

- Robin

________________________________

From: Stefán Freyr Stefánsson <

[hidden email]>

To:

[hidden email]
Sent: Thu, November 25, 2010 9:14:20 AM

Subject: [OpenCV] The relevance of the octaves in SIFT.

Hello.

First of all, I know this is a little off topic as it doesn't relate directly to

OpenCV but I have already tried a usenet newsgroup with no success (so, also

sorry for the cross posting if anybody here follows that group as well).

I've been diving a little into the SIFT algorithm by D.G.Lowe, reading both his

1999 and 2004 papers on it.

I'll admit that I don't have much of a signal processing background so I'm

having some difficulties understanding it but I'm mainly concerned with one

question that I hope to get an answer to here.

The Lowe papers refer to quite a few papers regarding the scale-space. I haven't

read them all but I've skimmed a few of them and nowhere did I find anything

about the "octaves" that are produced in SIFT, that is, creating the scale-space

(DoGs) not only for the original image resolution, but also for different sizes

of the image, all the way from double the original image size down to a few

pixels.

Can anybody explain to me (in plain terms preferably) what the purpose of the

octaves (different image resolutions) are? As I understand it, the scale-space

procedure (the DoG extrema detection) finds interest points that will be

invariant to changes in scale (size) but then I'm not quite getting what the

octaves are supposed to accomplish... even more scale invariance??? Where does

this come from since the Witkin paper (1983) that Lowe references doesn't seem

to mention anything about the different image resolutions, just the Gaussian

method.

With kind regards,

Stefan Freyr.