1 | initial version |
Answering your questions:
1) This approach is difficult to apply in non controlled conditions. Do not expect good results if Moreover, Photometric Stereo (PS) does not recover directly the depth, it does computes the surface normal and albedo for each pixel. Depth is then recovered by surface gradient integration methods. Each step has its own particularities.
Normal computation: Depends largely on the approach which you want use. The classical approach assumes that the surface finish is Lambertian (matte paint, ceramic and paper-like) and light sources direction and intensity well known, so you will need to calibrate your PS rig well. Example-based approach allows you reconstruct most materials, as far you know how the normalized light intensities for a given normal behave,thus it needs a reference table normally built from a reference object with known normals under the same lighting conditions of the object to be reconstructed. Such approaches demands at least 3 light sources. SVD-based approaches do not need calibration (known light source positions, intensities), but they need a good number of light sources in order to recover good normals, using just 3 or 4 will give you "flatten" normals. Most suppose that the surface finish is lambertian or has a lambertian component with small specular component.
The worst problem is in fact when the observed pixels do not follow closely the surface reflectance model expected by your PS method. This includes shadows (self or projected), interreflection or specularity, non uniform lighting (most suppose a constant light field provided by a light source at infinity). In presence of those effects the normal computed at such regions are distorted or completely wrong.
Depth computation: First, since it is a monocular method, you cannot determine the depth in a metric scale, but in an arbitrary scale with depth relative to reference pixel (lets say, the top left one). Some methods such as Frankot-Chellapa integrator are really fast and used for real-time applications. The problem is that when you integrate the surface gradient, you have to known where depth discontinuities are (for instance, in a face, the pixel on the chin is not connected to the neck), otherwise your depth map will be very very distorted. Robust integrators makes use of weight maps which deal with this well, but Photometric stereo is not able to retrieve it alone. You will need to determine it by other method such as stereo correspondence. The relative depth between disconnected regions is also impossible to determine by PS only.
2) Depends on the speed of your moving object and number or light sources. I have seen some facial capture projects of persons in movement which needed 200 fps for 5 light sources. An 15 fps and 4 light sources photometric stereo rig worked with more or less 150 fps. Remember that you do need perfect synchronization when turning on/off lights and capturing frames. If your light source intensity varies between captures (a light source is fading in/out), your computed normals will be wrong. The observed light source intensity must be constant always for each captured frame.
3) If your surface is arbitrary (concave-convex), with unknown albedo, you will need at least 3 light sources in no coplanar position. Less light sources or co-planar/co-linear dispositions will lead to ambiguity (look for Bass-Relief ambiguity) which you wont be able to solve without supposing something about the nature of the surface (known albedo, known normal orientation,etc). Keep in mind that regions with projected and self-shadows will require MORE than 3 frames to recover compute the normal, because a frame with a shadowed pixel is essentially useless. If the albedo is not important, you can improve frame-rate requirements by using multi-spectral approaches, i.e. capturing a frame where the light is visible only for one of the camera sensor( i.e. red, green,blue lights), and turning them on simultaneously.
4) It is since a while that I do not see state-of-art works in PS but I do recommend read the works from R.J. Woodham, Barski and Petrou (classical), Ahertzman, Saracchini (example based PS), Hayakawa (SVD based), Frankot/Chellapa,Agarwal (Integrators), Broadbent and Cipolla (applications, real-time capture), as well works which cite them so you can gasp how each methodology works.
2 | (Corrected incomplete phrase in the introduction) |
Answering your questions:
1) This approach is difficult to apply in non controlled conditions. Do not expect good results if your object does not fits well in the approach constraints. Moreover, Photometric Stereo (PS) does not recover directly the depth, it does computes the surface normal and albedo for each pixel. Depth is then recovered by surface gradient integration methods. Each step has its own particularities.
Normal computation: Depends largely on the approach which you want use. The classical approach assumes that the surface finish is Lambertian (matte paint, ceramic and paper-like) and light sources direction and intensity well known, so you will need to calibrate your PS rig well. Example-based approach allows you reconstruct most materials, as far you know how the normalized light intensities for a given normal behave,thus it needs a reference table normally built from a reference object with known normals under the same lighting conditions of the object to be reconstructed. Such approaches demands at least 3 light sources. SVD-based approaches do not need calibration (known light source positions, intensities), but they need a good number of light sources in order to recover good normals, using just 3 or 4 will give you "flatten" normals. Most suppose that the surface finish is lambertian or has a lambertian component with small specular component.
The worst problem is in fact when the observed pixels do not follow closely the surface reflectance model expected by your PS method. This includes shadows (self or projected), interreflection or specularity, non uniform lighting (most suppose a constant light field provided by a light source at infinity). In presence of those effects the normal computed at such regions are distorted or completely wrong.
Depth computation: First, since it is a monocular method, you cannot determine the depth in a metric scale, but in an arbitrary scale with depth relative to reference pixel (lets say, the top left one). Some methods such as Frankot-Chellapa integrator are really fast and used for real-time applications. The problem is that when you integrate the surface gradient, you have to known where depth discontinuities are (for instance, in a face, the pixel on the chin is not connected to the neck), otherwise your depth map will be very very distorted. Robust integrators makes use of weight maps which deal with this well, but Photometric stereo is not able to retrieve it alone. You will need to determine it by other method such as stereo correspondence. The relative depth between disconnected regions is also impossible to determine by PS only.
2) Depends on the speed of your moving object and number or light sources. I have seen some facial capture projects of persons in movement which needed 200 fps for 5 light sources. An 15 fps and 4 light sources photometric stereo rig worked with more or less 150 fps. Remember that you do need perfect synchronization when turning on/off lights and capturing frames. If your light source intensity varies between captures (a light source is fading in/out), your computed normals will be wrong. The observed light source intensity must be constant always for each captured frame.
3) If your surface is arbitrary (concave-convex), with unknown albedo, you will need at least 3 light sources in no coplanar position. Less light sources or co-planar/co-linear dispositions will lead to ambiguity (look for Bass-Relief ambiguity) which you wont be able to solve without supposing something about the nature of the surface (known albedo, known normal orientation,etc). Keep in mind that regions with projected and self-shadows will require MORE than 3 frames to recover compute the normal, because a frame with a shadowed pixel is essentially useless. If the albedo is not important, you can improve frame-rate requirements by using multi-spectral approaches, i.e. capturing a frame where the light is visible only for one of the camera sensor( i.e. red, green,blue lights), and turning them on simultaneously.
4) It is since a while that I do not see state-of-art works in PS but I do recommend read the works from R.J. Woodham, Barski and Petrou (classical), Ahertzman, Saracchini (example based PS), Hayakawa (SVD based), Frankot/Chellapa,Agarwal (Integrators), Broadbent and Cipolla (applications, real-time capture), as well works which cite them so you can gasp how each methodology works.