The relevance of the octaves in SIFT.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

The relevance of the octaves in SIFT.

Stefán Freyr Stefánsson
Hello.

First of all, I know this is a little off topic as it doesn't relate
directly to OpenCV but I have already tried a usenet newsgroup with no
success (so, also sorry for the cross posting if anybody here follows that
group as well).

I've been diving a little into the SIFT algorithm by D.G.Lowe, reading both
his 1999 and 2004 papers on it.

I'll admit that I don't have much of a signal processing background so I'm
having some difficulties understanding it but I'm mainly concerned with one
question that I hope to get an answer to here.

The Lowe papers refer to quite a few papers regarding the scale-space. I
haven't read them all but I've skimmed a few of them and nowhere did I find
anything about the "octaves" that are produced in SIFT, that is, creating
the scale-space (DoGs) not only for the original image resolution, but also
for different sizes of the image, all the way from double the original image
size down to a few pixels.

Can anybody explain to me (in plain terms preferably) what the purpose of
the octaves (different image resolutions) are? As I understand it, the
scale-space procedure (the DoG extrema detection) finds interest points that
will be invariant to changes in scale (size) but then I'm not quite getting
what the octaves are supposed to accomplish... even more scale
invariance??? Where does this come from since the Witkin paper (1983) that
Lowe references doesn't seem to mention anything about the different image
resolutions, just the Gaussian method.

With kind regards,
Stefan Freyr.
Reply | Threaded
Open this post in threaded view
|

Re: The relevance of the octaves in SIFT.

Robin Hewitt
It's just an implementation detail. As scale increases, at some point you reach
an even multiple -- twice the size of the current base scale. At that point,
instead of continuing to blur the full current image, it's computationally
efficient to down-sample it because that can be done directly from the 2X blur
(without interpolation) by skipping every other pixel.

Historically, starting with Marr, then with Burt and Adelson (sp?),  scale-space
pyramids were typically done in 2X increments called octaves. A few years before
SIFT appeared, several researchers (most notably Lindeberg and Crowley) were
starting to experiment with scale steps smaller than 2X. The terminology in
Lowe's work reflects the terminology of his contemporaries.

- Robin



________________________________
From: Stefán Freyr Stefánsson <[hidden email]>
To: [hidden email]
Sent: Thu, November 25, 2010 9:14:20 AM
Subject: [OpenCV] The relevance of the octaves in SIFT.

 
Hello.

First of all, I know this is a little off topic as it doesn't relate directly to
OpenCV but I have already tried a usenet newsgroup with no success (so, also
sorry for the cross posting if anybody here follows that group as well).

I've been diving a little into the SIFT algorithm by D.G.Lowe, reading both his
1999 and 2004 papers on it.

I'll admit that I don't have much of a signal processing background so I'm
having some difficulties understanding it but I'm mainly concerned with one
question that I hope to get an answer to here.

The Lowe papers refer to quite a few papers regarding the scale-space. I haven't
read them all but I've skimmed a few of them and nowhere did I find anything
about the "octaves" that are produced in SIFT, that is, creating the scale-space
(DoGs) not only for the original image resolution, but also for different sizes
of the image, all the way from double the original image size down to a few
pixels.

Can anybody explain to me (in plain terms preferably) what the purpose of the
octaves (different image resolutions) are? As I understand it, the scale-space
procedure (the DoG extrema detection) finds interest points that will be
invariant to changes in scale (size) but then I'm not quite getting what the
octaves are supposed to accomplish... even more scale invariance??? Where does
this come from since the Witkin paper (1983) that Lowe references doesn't seem
to mention anything about the different image resolutions, just the Gaussian
method.

With kind regards,
Stefan Freyr.
 


     
Reply | Threaded
Open this post in threaded view
|

Re: The relevance of the octaves in SIFT.

Neon22
 

a visual explanatin is here:
http://aishack.in/tutorials/computer-vision
under SIFT

On 11/26/2010 6:48 AM, Robin Hewitt wrote:

 
It's just an implementation detail. As scale increases, at some point you reach an even multiple -- twice the size of the current base scale. At that point, instead of continuing to blur the full current image, it's computationally efficient to down-sample it because that can be done directly from the 2X blur (without interpolation) by skipping every other pixel.

Historically, starting with Marr, then with Burt and Adelson (sp?),  scale-space pyramids were typically done in 2X increments called octaves. A few years before SIFT appeared, several researchers (most notably Lindeberg and Crowley) were starting to experiment with scale steps smaller than 2X. The terminology in Lowe's work reflects the terminology of his contemporaries.

- Robin


From: Stefán Freyr Stefánsson [hidden email]
To: [hidden email]
Sent: Thu, November 25, 2010 9:14:20 AM
Subject: [OpenCV] The relevance of the octaves in SIFT.

 

Hello. 


First of all, I know this is a little off topic as it doesn't relate directly to OpenCV but I have already tried a usenet newsgroup with no success (so, also sorry for the cross posting if anybody here follows that group as well).

I've been diving a little into the SIFT algorithm by D.G.Lowe, reading both his 1999 and 2004 papers on it. 

I'll admit that I don't have much of a signal processing background so I'm having some difficulties understanding it but I'm mainly concerned with one question that I hope to get an answer to here. 

The Lowe papers refer to quite a few papers regarding the scale-space. I haven't read them all but I've skimmed a few of them and nowhere did I find anything about the "octaves" that are produced in SIFT, that is, creating the scale-space (DoGs) not only for the original image resolution, but also for different sizes of the image, all the way from double the original image size down to a few pixels. 

Can anybody explain to me (in plain terms preferably) what the purpose of the octaves (different image resolutions) are? As I understand it, the scale-space procedure (the DoG extrema detection) finds interest points that will be invariant to changes in scale (size) but then I'm not quite getting what the octaves are supposed to accomplish... even more scale invariance??? Where does this come from since the Witkin paper (1983) that Lowe references doesn't seem to mention anything about the different image resolutions, just the Gaussian method.

With kind regards, 
Stefan Freyr. 


__._,_.___
Recent Activity:
Change settings: http://www.yahoogroups.com/mygroups, select
   Get Emails (get all posts)
   Daily Digest (one summary email per day)
   Read on the web (read posts on the web only)Or Unsubscribe by mailing [hidden email]
MARKETPLACE

Stay on top of your group activity without leaving the page you're on - Get the Yahoo! Toolbar now.

<script language=javascript> if(window.yzq_d==null)window.yzq_d=new Object(); window.yzq_d['RbgUDWKImjI-']='&U=13crltsht%2fN%3dRbgUDWKImjI-%2fC%3d493064.13983314.14041046.13298430%2fD%3dMKP1%2fB%3d6060255%2fV%3d1'; </script>

Hobbies & Activities Zone: Find others who share your passions! Explore new interests.

<script language=javascript> if(window.yzq_d==null)window.yzq_d=new Object(); window.yzq_d['R7gUDWKImjI-']='&U=13ch10r56%2fN%3dR7gUDWKImjI-%2fC%3d493064.14012770.13963757.13298430%2fD%3dMKP1%2fB%3d6015306%2fV%3d1'; </script>

Find useful articles and helpful tips on living with Fibromyalgia. Visit the Fibromyalgia Zone today!

<script language=javascript> if(window.yzq_d==null)window.yzq_d=new Object(); window.yzq_d['RrgUDWKImjI-']='&U=13cocu075%2fN%3dRrgUDWKImjI-%2fC%3d493064.13814537.14041040.10835568%2fD%3dMKP1%2fB%3d6260316%2fV%3d1'; </script>
.

__,_._,___