Matrix Pdf 174223 | 14 2viewgeometry Notes

Partial capture of text on file.
                           16.485: Visual Navigation for Autonomous Vehicles (VNAV)                                                  Fall 2021
                                                             Lecture 14: 2-view Geometry
                           Lecturer: Luca Carlone                                                                                    Scribes: -
                        Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
                        They may be distributed outside this class only with the permission of the Instructor(s).
                        This lecture discusses:
                            • basic principles of 2-view geometry for calibrated cameras (Epipolar constraint, Essential matrix),
                            • how to compute the essential matrix from pixel correspondences in 2 camera images,
                            • how to estimate the relative pose (up to scale) between the two cameras from the Essential matrix.
                        The presentation mostly follows Chapter 5 in [1, Sections 5.1-5.3].
                        14.1        Epipolar constraint and Essential matrix
                        From Lecture 12, we know how to compute keypoint correspondences in two images using feature tracking
                        or descriptor-based feature matching. In other words, given a pixel x1 in image I1, we are able to compute
                        the corresponding pixel x in image I (assuming the two images are picturing the same scene). Note that
                                                      2             2
                        “corresponding pixels” refers to the fact that the two pixels picture the same 3D point.
                        In this lecture our goal is to compute the geometry of the two cameras taking images I and I (i.e., the
                                                                                                                               1       2
                        relative pose between the cameras) given a number of pixel correspondences. Towards this goal, we take two
                        main assumptions:
                            • the pixel correspondences are correct, i.e., for every correspondence (x1,x2), the two pixels do represent
                               the same 3D point. We will relax this assumption during the next lecture, since in practice many of
                               the correspondences may be wrong. We also assume that the 3D point does not move.
                            • the cameras are calibrated, i.e., the calibration matrices:
                                                      s    f    s f      o                        s    f     s f      o    
                                                         x   1     θ  1     x                          x    2    θ   2    x
                                                         1         1        1                         2        2        2 
                                             K =          0      s   f    o                 K =          0      s   f    o                   (14.1)
                                                1                 y   1     y                 2                  y   2    y
                                                                   1         1                                    2        2
                                                          0         0      1                             0        0       1
                               of each camera is known. This is an acceptable assumption in robotics, where we can typically calibrate
                               the cameras on our robots and compute K1 and K2 before deployment (note: for a mobile robot
                               K1=K2,i.e., both images are collected by the same camera at diﬀerent time instants).
                                                                            w
                        The perspective projection of the 3D point p          to the two cameras (Fig. 14.1) can be written as:
                                                   c               c   c    w             c               c   c    w
                                                   1 ˜              1   1  ˜               2 ˜             2   2  ˜
                                                 p   x =K [R t ]p                       p   x =K [R t ]p                                     (14.2)
                                                   z   1      1    w w                    z   2      2    w w
                                   ci  ci                                                                            ci
                        where (R t ) is the (inverse of) the pose of the camera i in the world frame, p                is the depth of the point
                                   w w                                                                               z
                                                              w
                                                             ˜
                        with respect to camera i, and p          contains the point coordinates with respect to the world frame (note:
                        homogeneous coordinates).
                                                                                   14-1
               14-2                                                                                     Lecture 14: 2-view Geometry
                                                           Figure 14.1: 2-view geometry.
               Since the point is unknown anyway and we only attempt to compute the relative pose between the two
               cameras, we may simply assume that c is the w frame, which allows simplifying the expressions above as:
                                                           1
                                                            c1        c1                           c2  c2  c1
                                         ˜                ˜                           ˜                   ˜
                                      d x =K [I 0 ]p =K p                          d x =K [R t ]p                                  (14.3)
                                        1 1       1 3 3             1               2 2        2   c1  c1
                                                                            c1             c2
               where to simplify the notation we also deﬁned d = p             and d =p .
                                                                      1     z        2     z
               Since the calibration matrices are known, we can pre-multiply both members of the equation on the left by
               K−1 and both members of the equation on the right by K−1:
                 1                                                                2
                                                             c                       c  c      c
                                                     ˜        1             ˜        2   1      2
                                                   d y =p                 d y =R p +t                                              (14.4)
                                                    1 1                    2 2       c         c
                                                                                     1          1
                       ˜        −1˜          ˜        −1˜                                                                 ˜        ˜
               where y = K x and y = K x (both still expressed in homogeneous coordinates). y and y are
                        1       1    1        2       2   2                                                                1        2
               often called “calibrated” pixel coordinates.
               Substituting pc1 from the expression on the left to the right one:
                                                                ˜          c2 ˜     c2
                                                             d y =d R y +t                                                         (14.5)
                                                               2 2     1   c  1     c
                                                                            1        1
               To simplify notation further, we drop the super- and subscripts for the rotation and translation and write t
                             c2                         c2
               (instead of t ) and R (instead of R ). This should not cause confusion since these are the only translation
                             c                          c
                              1                          1
               and rotation we are going to deal with in this lecture. Therefore, (14.5) becomes:
                                                                  ˜          ˜
                                                               d y =d Ry +t                                                        (14.6)
                                                                 2 2     1    1
               Premultiplying both members by [t]×:
                                                                    ˜               ˜
                                                             d [t] y = d [t] Ry                                                    (14.7)
                                                               2  × 2       1  ×     1
                       Lecture 14: 2-view Geometry                                                                                       14-3
                                                                                                     T
                                                                                                    ˜
                       where we noticed that [t] t = 0 . Pre-multiplying both members by y :
                                                   ×       3                                         2
                                                                     T               T
                                                                    ˜      ˜       ˜         ˜
                                                                 d y [t] y = d y [t] Ry                                                (14.8)
                                                                  2 2    × 2      1 2   ×     1
                                    T
                                  ˜       ˜                   ˜                     ˜
                       However, y [t] y = 0 (since [t] y is orthogonal to y ). Therefore:
                                    2   × 2      3          × 2                      2
                                                                           T
                                                                         ˜         ˜
                                                                       d y [t] Ry =0                                                   (14.9)
                                                                        1 2   ×     1
                       Since d is non-zero, this leads to the Epipolar constraint:
                               1
                                                                        T
                                                                       ˜           ˜
                                                                      y [t] R y =0                                                    (14.10)
                                                                        2    ×      1
                       which relates corresponding pixels in two images. The matrix
                                                                          E=[t]×R                                                     (14.11)
                       is known as the Essential matrix.
                       Deﬁnitions:
                           • epipolar plane: plane passing through the optical centers and the point p
                           • epipoles: intersection between the segment connecting the optical centers and the image planes
                                                                                         ˜
                           • epipolar line: line corresponding to the set of pixels y2 in the second image that satisfy the epipolar
                                                            ˜
                              constraint for a given pixel y1 in the ﬁrst image.
                       Geometric interpretation. The epipolar constraint can be written as (see slide 12):
                          c              c              c        T c                    c       T c                   T c      c
                          2    ˜          2 ˜            2   ˜       2 ˜                 2  ˜       2 ˜             ˜     2     2 ˜
                        (t  ×y )⊥(R y ) ⇐⇒ (t ×y ) R y =0 ⇐⇒ ([t ] y ) R y =0 ⇐⇒ y [t ] R y =0
                          c     2        c  1           c     2     c   1               c × 2      c   1             2   c × c 1
                          1               1              1           1                   1          1                     1     1
                       Stereo example (slide 14):
                         T                       T 0     0    0                   T 0 
                        ˜        ˜              ˜  0     0    b  ˜               ˜      b   
                       y [t] Ry =0 ⇐⇒ y                             y =0 ⇐⇒ y                     =0 ⇐⇒ bv −bv =0 ⇐⇒ v =v
                         2   ×    1              2    0  −b 0        1              2    −bv                   2      1             2     1
                                                                                             1
                       Properties of the essential matrix:
                           • The epipolar constraint does not constrain the scale of the translation.
                           • A matrix is an essential matrix if and only if it has singular values {σ,σ,0} (in particular σ = ktk),
                              see Theorem 5.5 in [1].
                                                                                                                         2
                              Proof. We only prove that the largest eigenvalue of E = [t]×R is λmax(E) = ktk and the smallest
                              eigenvalue is zero. A complete (but more involved) proof can be found in [1, Thm 5.5].
                                                                       2            T T                T T T
                                              λ    (E) = max kEdk = max d E Ed= max d R [t] [t] Rd                                    (14.12)
                                               max         kdk=1            kdk=1               kdk=1          × ×
                                                               T   T                        2                  2       2
                                                     = max d [t] [t] d = max k[t] dk = max kt×dk = ktk                                (14.13)
                                                       kdk=1       × ×        kdk=1     ×        kdk=1
                                                                                   2                  2
                                                          λmin(E) = min kEdk = min kt×dk =0                                           (14.14)
                                                                       kdk=1            kdk=1
              14-4                                                                                    Lecture 14: 2-view Geometry
                  • The space of the essential matrices is called the Essential space S           (i.e., the space of 3 × 3 matrices
                                                                                                E
                                                                                          3
                     that can be written as [t]×R for some R ∈ SO(3) and t ∈ R ). The projection of a matrix M onto
                     the Essential space can be computed as prescribed in [1, Thm 5.9]:
                                                                           λ1+λ2        0      0 
                                                                   2            2      λ +λ            T
                                               argminkE−Mk =U                  0       1   2   0 V                           (14.15)
                                                                   F                     2
                                                E∈SE                            0        0      0
                     where M =Udiag(λ ,λ ,λ )VT is a singular value decomposition of M.
                                             1   2   3
              14.2        How to estimate the Camera Poses from Correspondences?
              In this section we address the following problem: given N (calibrated) pixel correspondences, compute the
              relative pose (up to scale) between the cameras. This is typically done in 2 steps, discussed below:
                  • use the pixel correspondences and the epipolar constraint to estimate the essential matrix E
                  • retrieve the relative pose (R,t) between the cameras from the essential matrix E
              14.2.1       Compute the Essential Matrix from Pixel Correspondences
                                                                                           ˜     ˜
              Assume that we are given N (calibrated) pixel correspondences (y                 , y   ) for k = 1,...,N. As men-
                                                                                            1,k   2,k
              tioned at the beginning of this document, we assume that there is no outlier (i.e., we do not have wrong
              correspondences). Each of these correspondences need to satisfy the epipolar constraint (14.10):
                                                        T
                                                      ˜       ˜
                                                      y    Ey =0            k = 1,...,N                                        (14.16)
                                                        2,k     1,k
                               ˜     ˜
              Noticing that (y      , y  ) are known pixel values, we realize that these are simply linear equalities. Note that
                                 1,k  2,k
              the essential matrix can be only computed up to scale since we can multiply (14.17) by an arbitrary constant
              without altering the equality. Recalling that E = [t] R this means that we can apply an arbitrary scaling
                                                                          ×
              to the vector t without altering the epipolar constraint. This is consistent with the geometric intuition
              that from a set of camera images we are not able to resolve the scale of the scene, i.e., we are not able to
              understand if we are observing a small scene from close distance of a large scene from a far distance.
              Therefore, it is customary to assume ktk = 1, which means we only try to estimate the direction of the
              translation rather than it’s scale.
              Eight-point algorithm. In absence of noise, we can compute the essential matrix by solving a linear
              system. Rearranging the entries of E in a vector e ∈ R9, the set of linear equations (14.17) can be written
              as:
                                                            T
                                                          a e=0         k = 1,...,N                                            (14.17)
                                                            k
                                                                                                                     ˜    ˜
              where a are known vectors whose entries are only function of the pixel correspondences (y                  , y  ).
                        k                                                                                             1,k  2,k
                                        T
              Stacking the vectors a as rows of a matrix A, the linear equations
                                        k
                                                                    Ae=0                                                       (14.18)
              For this linear system to admit a unique solution, A ∈ RN×9 should have rank 8, therefore we need N = 8
              point correspondences to compute the essential matrix using the linear system (14.18).
The words contained in this file might help you see if this file matches what you are looking for:

...Visual navigation for autonomous vehicles vnav fall lecture view geometry lecturer luca carlone scribes disclaimer these notes have not been subjected to the usual scrutiny reserved formal publications they may be distributed outside this class only with permission of instructor s discusses basic principles calibrated cameras epipolar constraint essential matrix how compute from pixel correspondences in camera images estimate relative pose up scale between two presentation mostly follows chapter and we know keypoint using feature tracking or descriptor based matching other words given a x image i are able corresponding assuming picturing same scene note that pixels refers fact picture d point our goal is taking e number towards take main assumptions correct every correspondence do represent will relax assumption during next since practice many wrong also assume does move calibration matrices f o k y each known an acceptable robotics where can typically calibrate on robots before deploy...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area