Camera Best Practices

Although it's sometimes possible to make facial recognition work on existing cameras, you should be methodical about attempting to re-use existing cameras for facial recognition. A thorough survey of locations you want to monitor should be performed to determine the goals at each location and the requirements needed to achieve those goals. Only then should you take stock of existing cameras to see if they meet the requirements to meet your goals. If not, new cameras should be employed that allow you to meet your goals.

Where to Set Up Your Cameras

The best results with facial recognition generally happen when you set up your facial recognition cameras at choke points such as narrow passages (e.g. doorways) or concentrated standing areas (e.g. bus stops).

Below are some considerations when evaluating for choke points:

Example choke points:

Scenario Optimal Location Challenges Potential Mitigation Strategies
Doorway, elevator exit, or gateway
  • Exit side of door/elevator, 5-10m away, and 3-4m high.
  • If a wall or post is 3-4m away, then you can place the camera 2.5m high.
  • Subjects turn left/right as they pass thru doorway/exit elevator.
  • Subjects look left/right as they pass thru doorway/exit elevator.
  • Strong backlight (not applicable for elevator).
  • Automatic doors cause sudden changes in lighting.
  • If possible, avoid backlight conditions. If not possible, add more light to subject's faces to counter the backlight.
  • Use a camera with good Wide Dynamic Range (WDR) performance.
Hallway
  • At end of hall where subjects turn left or right 2.5m high and 2-4m back.
  • Hung from ceiling 3-4m high and 5-10m back.
  • Tall ceilings.
  • If mounted on a wall, consider mounting on wall but target subjects more than 10m away use the camera's optical zoom.
  • If poor lighting, use SAFR's Contrast Enhancement feature.
Stairway/escalator
  • 2-4m from top of stair/elevator pointing down, parallel to stairs.
  • Subjects tend to look up when going up so a higher camera position is OK.
  • Poor lighting
  • If poor lighting, use SAFR's Contrast Enhancement feature.
Front queue
  • Exit side of queue 5-10m away and 3-4m high in line with queue.
  • Queues where subjects stand and wait.
  • Subjects turn left/right as they exit queue.
  • Moving stations (e.g. airports).
  • Add object of interest such as a TV monitor to draw eyes towards camera.
Near artwork or other objects of interest
  • Centered above an object of interest.
  • Distance from object.
  • Wide field of view.
  • Use a high resolution camera.
  • Target a narrow field of view.

Camera Facial Recognition Factors

Below are the key factors that affect the success of camera facial recognition.

Face Image Size

The number of pixels a camera allocates to a face is determined by three main variables, listed in order of impact:

Video Resolution

Video resolution describes the number of pixels in each video frame. Video resolution is measured as width x height (in that order). For convenience, some people only cite the height measurement when talking about video resolution. Thus, cameras with resolutions of 1920x1080 might be said to have 1080p resolution.

Obviously, the higher the camera's video resolution, the better. The minimum video resolution that we recommend for successful facial recognition is about 2500p. Below you can see the effect of different video resolutions.

Note: "4K Ultra HD" has a resolution of approximately 2500p.

As you can see, the license plate is much more legible in the higher resolution.

Angle of View

Angle of View (AoV) is another significant factor impacting face image size; it can increase the face image size by orders of magnitude. A camera with a wide AoV will spread its limited number of pixels over a wide area, a problem which increases dramatically as subjects get further from the camera. Conversely, a small AoV will retain the number of pixels it can use for face image size even as the distance increases.

Wide-angled lenses tend to be bad for facial recognition. They introduce significant perspective distortion, as well as requiring closer distances for accurate results.

Cameras' AoVs are usually reported in their camera specifications sheets. You can also get this information from tools such as IPVM.com calculator.

Distance to Subject

This is the distance from the subject to the camera lens. Obviously, you want the camera to be as close to subjects as possible.

Cameras' zoom functionality can mitigate distance from the subject. Be aware that there are fundamentally two different types of zoom:

Sharpness

Focus

Focus is critical for successful facial recognition. If a camera model provides a focus control, you should set the camera's focus to where you expect to capture subjects' facial images, as best you can.

Calibrating the camera using manual focusing and a good focus chart will almost always produce better results than using the camer's auto-focus, even if camera manufacturers claim otherwise. Auto-focus may just focus on a door or something at the extreme back end of where you want to focus.

Depth of Field

When considering different cameras, a larger depth of field is desired because it means that the camera will be able to maintain a sharp and clear focus for a greater near and far distance.

Subjects are only in focus for a specific distance from the camera. Subjects both further or closer will be out of focus.

Auto-focus is typically not used with facial recognition because multiple people at different distances will sometimes need to be recognized, and auto-focus typically only focuses on individual objects rather than a group of objects. Auto-focus usually prioritizes focusing on closer objects, which will cause objects further back to lose focus. Furthermore, that closer object might not even be a person's face.

Video Compression

Video compression is the process of encoding video files such that they consume less space and are easier to transmit over the network. Compression is often provided as a setting for the number of bits per second (aka the bitrate) delivered by a video stream. To receive the highest quality video you will need to perform analysis with each specific model of camera that you intend to use. As an initial guide, select a bitrate between 4K (4096) and 8K (8192) with VBR (variable bitrate, as opposed to CBR, constant bitrate), on the h.264 encoder is usually good to start.

Lighting Conditions

It is critical that subjects' faces are illuminated well enough that facial details are clearly visible by human eyes. The color of the light should be white; colored light can alter or "flatten" people's skin tones.

Backlight

If the environment behind people is brighter than the light illuminating people's faces, the people will appear dark and with reduced details because the camera's sensor will be overwhelmed by the brightness behind the people. In such situations, a bright white light located near the camera illuminate people's faces. Such a light has the added benefit of causing most people to look directly at the camera as they seek the source of the bright light shining in their faces, which helps their Center Pose Quality. (See the Center Pose Quality section below for more information.)

Low Light

A good low light camera will produce a video image that maintains image detail both within dark areas as well as within bright areas. A bad low light camera produces banding and noisy/grainy video when in low light. These functional differences are often the result of which sensor type the camera is using. Good low light cameras often use CCD sensors, while bad low light cameras often use less expensive CMOS sensors instead.

Center Pose Quality

A value from 0 to 1 that specifies how directly a face is looking at the camera. If a face is looking directly at the camera, this value is 1. The more that the face turns away from the camera, the lower this value becomes.

Yaw Angle (horizontal)

Yaw is the horizontal angle between the direction a subject is looking and the camera line of sight. The ideal angle for facial recognition is 0°. (i.e. The subject is looking directly at the camera.) Facial recognition works well for angles up to 30°. Between 30° and 60° recognition still occurs but only if motion is relatively low or the lighting is good. At angles above 60° up to 90° facial recognition is very challenging but still possible.

Depression Angle (vertical)

Depression angle is the vertical angle from the subject's face up (or down) to the camera. A value of 15° or less is best though up to 30° is acceptable. Values greater than 45° will present a challenge to the face recognition software.

Data Rate

Frame Rate

Frame rate refers to the number of video frames delivered by the camera per second. In general, 15 frames per second is considered the minimum for real-time surveillance. When selecting which camera(s) to use for facial recognition, check to see if the frame rate changes significantly with resolution. If it does, that's an indication that after-capture software is scaling the video, which is bad for facial recognition.

Video Bitrate

The video bitrate should be selected to ensure highest quality possible within the network limitations. The table below provides the recommended video bitrate for common resolutions.

Resolution Bitrate (Kbps) 30 fps 20fps 15fps 10 fps
3000p Max 27000 20500 16400 12300
Avg 11000 8200 6600 5200
2160p Max 20000 14300 11300 9200
Avg 8000 6100 5100 4200
2048p Max 14000 9200 7700 6100
Avg 6000 4200 3700 2900
1920p Max 11000 8200 6700 5100
Avg 5000 3700 3200 2600
1440p Max 8000 5100 4400 3600
Avg 4000 2600 2200 2200
1080p Max 5000 3100 2200 1500
Avg 2500 1900 1600 1200

See Also