Although it's sometimes possible to make facial recognition work on existing cameras, you should be methodical about attempting to re-use existing cameras for facial recognition. A thorough survey of locations you want to monitor should be performed to determine the goals at each location and the requirements needed to achieve those goals. Only then should you take stock of existing cameras to see if they meet the requirements to meet your goals. If not, new cameras should be employed that allow you to meet your goals.
The best results with facial recognition generally happen when you set up your facial recognition cameras at choke points such as narrow passages (e.g. doorways) or concentrated standing areas (e.g. bus stops).
Below are some considerations when evaluating for choke points:
Example choke points:
Scenario | Optimal Location | Challenges | Potential Mitigation Strategies |
---|---|---|---|
Doorway, elevator exit, or gateway |
|
|
|
Hallway |
|
|
|
Stairway/escalator |
|
|
|
Front queue |
|
|
|
Near artwork or other objects of interest |
|
|
|
Below are the key factors that affect the success of camera facial recognition.
The number of pixels a camera allocates to a face is determined by three main variables, listed in order of impact:
Video resolution describes the number of pixels in each video frame. Video resolution is measured as width x height (in that order). For convenience, some people only cite the height measurement when talking about video resolution. Thus, cameras with resolutions of 1920x1080 might be said to have 1080p resolution.
Obviously, the higher the camera's video resolution, the better. The minimum video resolution that we recommend for successful facial recognition is about 2500p. Below you can see the effect of different video resolutions.
Note: "4K Ultra HD" has a resolution of approximately 2500p.
As you can see, the license plate is much more legible in the higher resolution.
Angle of View (AoV) is another significant factor impacting face image size; it can increase the face image size by orders of magnitude. A camera with a wide AoV will spread its limited number of pixels over a wide area, a problem which increases dramatically as subjects get further from the camera. Conversely, a small AoV will retain the number of pixels it can use for face image size even as the distance increases.
Wide-angled lenses tend to be bad for facial recognition. They introduce significant perspective distortion, as well as requiring closer distances for accurate results.
Cameras' AoVs are usually reported in their camera specifications sheets. You can also get this information from tools such as IPVM.com calculator.
This is the distance from the subject to the camera lens. Obviously, you want the camera to be as close to subjects as possible.
Cameras' zoom functionality can mitigate distance from the subject. Be aware that there are fundamentally two different types of zoom:
Focus is critical for successful facial recognition. If a camera model provides a focus control, you should set the camera's focus to where you expect to capture subjects' facial images, as best you can.
Calibrating the camera using manual focusing and a good focus chart will almost always produce better results than using the camer's auto-focus, even if camera manufacturers claim otherwise. Auto-focus may just focus on a door or something at the extreme back end of where you want to focus.
When considering different cameras, a larger depth of field is desired because it means that the camera will be able to maintain a sharp and clear focus for a greater near and far distance.
Subjects are only in focus for a specific distance from the camera. Subjects both further or closer will be out of focus.
Auto-focus is typically not used with facial recognition because multiple people at different distances will sometimes need to be recognized, and auto-focus typically only focuses on individual objects rather than a group of objects. Auto-focus usually prioritizes focusing on closer objects, which will cause objects further back to lose focus. Furthermore, that closer object might not even be a person's face.
Video compression is the process of encoding video files such that they consume less space and are easier to transmit over the network. Compression is often provided as a setting for the number of bits per second (aka the bitrate) delivered by a video stream. To receive the highest quality video you will need to perform analysis with each specific model of camera that you intend to use. As an initial guide, select a bitrate between 4K (4096) and 8K (8192) with VBR (variable bitrate, as opposed to CBR, constant bitrate), on the h.264 encoder is usually good to start.
It is critical that subjects' faces are illuminated well enough that facial details are clearly visible by human eyes. The color of the light should be white; colored light can alter or "flatten" people's skin tones.
If the environment behind people is brighter than the light illuminating people's faces, the people will appear dark and with reduced details because the camera's sensor will be overwhelmed by the brightness behind the people. In such situations, a bright white light located near the camera illuminate people's faces. Such a light has the added benefit of causing most people to look directly at the camera as they seek the source of the bright light shining in their faces, which helps their Center Pose Quality. (See the Center Pose Quality section below for more information.)
A good low light camera will produce a video image that maintains image detail both within dark areas as well as within bright areas. A bad low light camera produces banding and noisy/grainy video when in low light. These functional differences are often the result of which sensor type the camera is using. Good low light cameras often use CCD sensors, while bad low light cameras often use less expensive CMOS sensors instead.
A value from 0 to 1 that specifies how directly a face is looking at the camera. If a face is looking directly at the camera, this value is 1. The more that the face turns away from the camera, the lower this value becomes.
Yaw is the horizontal angle between the direction a subject is looking and the camera line of sight. The ideal angle for facial recognition is 0°. (i.e. The subject is looking directly at the camera.) Facial recognition works well for angles up to 30°. Between 30° and 60° recognition still occurs but only if motion is relatively low or the lighting is good. At angles above 60° up to 90° facial recognition is very challenging but still possible.
Depression angle is the vertical angle from the subject's face up (or down) to the camera. A value of 15° or less is best though up to 30° is acceptable. Values greater than 45° will present a challenge to the face recognition software.
Frame rate refers to the number of video frames delivered by the camera per second. In general, 15 frames per second is considered the minimum for real-time surveillance. When selecting which camera(s) to use for facial recognition, check to see if the frame rate changes significantly with resolution. If it does, that's an indication that after-capture software is scaling the video, which is bad for facial recognition.
The video bitrate should be selected to ensure highest quality possible within the network limitations. The table below provides the recommended video bitrate for common resolutions.
Resolution | Bitrate (Kbps) | 30 fps | 20fps | 15fps | 10 fps |
---|---|---|---|---|---|
3000p | Max | 27000 | 20500 | 16400 | 12300 |
Avg | 11000 | 8200 | 6600 | 5200 | |
2160p | Max | 20000 | 14300 | 11300 | 9200 |
Avg | 8000 | 6100 | 5100 | 4200 | |
2048p | Max | 14000 | 9200 | 7700 | 6100 |
Avg | 6000 | 4200 | 3700 | 2900 | |
1920p | Max | 11000 | 8200 | 6700 | 5100 |
Avg | 5000 | 3700 | 3200 | 2600 | |
1440p | Max | 8000 | 5100 | 4400 | 3600 |
Avg | 4000 | 2600 | 2200 | 2200 | |
1080p | Max | 5000 | 3100 | 2200 | 1500 |
Avg | 2500 | 1900 | 1600 | 1200 |