Camera Best Practices

Where to Set Up Your Cameras
Camera Facial Recognition Factors
See Also

Although it's sometimes possible to make facial recognition work on existing cameras, you should be methodical about attempting to re-use existing cameras for facial recognition. A thorough survey of locations you want to monitor should be performed to determine the goals at each location and the requirements needed to achieve those goals. Only then should you take stock of existing cameras to see if they meet the requirements to meet your goals. If not, new cameras should be employed that allow you to meet your goals.

Where to Set Up Your Cameras

The best results with facial recognition generally happen when you set up your facial recognition cameras at choke points such as narrow passages (e.g. doorways) or concentrated standing areas (e.g. bus stops).

Below are some considerations when evaluating for choke points:

Look for places where people are traveling slower or are stationary.
Find places where people are facing a consistent direction.
The narrower the choke point, the more pixels that can be devoted to the face.
- A door that's 6m wide yields half the pixels as a door that's 3m wide.
Lighting is critical. The lighting conditions section below goes into more detail about desired lighting.

Example choke points:

Scenario	Optimal Location	Challenges	Potential Mitigation Strategies
Doorway, elevator exit, or gateway	Exit side of door/elevator, 5-10m away, and 3-4m high. If a wall or post is 3-4m away, then you can place the camera 2.5m high.	Subjects turn left/right as they pass thru doorway/exit elevator. Subjects look left/right as they pass thru doorway/exit elevator. Strong backlight (not applicable for elevator). Automatic doors cause sudden changes in lighting.	If possible, avoid backlight conditions. If not possible, add more light to subject's faces to counter the backlight. Use a camera with good Wide Dynamic Range (WDR) performance.
Hallway	At end of hall where subjects turn left or right 2.5m high and 2-4m back. Hung from ceiling 3-4m high and 5-10m back.	Tall ceilings.	If mounted on a wall, consider mounting on wall but target subjects more than 10m away use the camera's optical zoom. If poor lighting, use SAFR's Contrast Enhancement feature.
Stairway/escalator	2-4m from top of stair/elevator pointing down, parallel to stairs. Subjects tend to look up when going up so a higher camera position is OK.	Poor lighting	If poor lighting, use SAFR's Contrast Enhancement feature.
Front queue	Exit side of queue 5-10m away and 3-4m high in line with queue. Queues where subjects stand and wait.	Subjects turn left/right as they exit queue. Moving stations (e.g. airports).	Add object of interest such as a TV monitor to draw eyes towards camera.
Near artwork or other objects of interest	Centered above an object of interest.	Distance from object. Wide field of view.	Use a high resolution camera. Target a narrow field of view.

Camera Facial Recognition Factors

Below are the key factors that affect the success of camera facial recognition.

Face Image Size – The number of pixels that are present in a facial image.
- Video Resolution – The width and height of a video, measured in pixels.
- Angle of View – Determined by the angle of the camera lens.
- Distance to Subject – The distance from the camera to the subject of interest.
Sharpness – The degree to which edges remain crisp and pixels are not blurred together.
- Focus – The degree to which camera image is sharp.
- Depth of Field – The distance between the nearest and the furthest objects that can be in focus for a camera at the same time.
- Video Compression – The process of encoding video files such that they consume less space and are easier to transmit over the network. Video compression can have the effect of blurring video images, however.
Lighting Conditions – Adequate lighting conditions are critical for successful facial recognition. There are two aspects of lighting that are particularly important:
- Backlight – Bright lighting behind the subject of interest.
- Low Light – Nighttime or dim indoor environments.
Center Pose Quality – A value from 0 to 1 that specifies how directly a face is looking at the camera.
- Yaw Angle (horizontal) – The horizontal angle between the subject's gaze and the direct line to the camera.
- Depression Angle (vertical) – The vertical angle between the subject and the camera.
Data Rate – Data processing factors that affect performance and quality.
- Frame Rate – Number of video frames delivered by the camera per second.
- Video Bitrate – The amount of data allocated to the digitized video, measured in bits per pixel (bps).

Face Image Size

The number of pixels a camera allocates to a face is determined by three main variables, listed in order of impact:

Video resolution
Angle of view
Distance to subject

Video Resolution

Video resolution describes the number of pixels in each video frame. Video resolution is measured as width x height (in that order). For convenience, some people only cite the height measurement when talking about video resolution. Thus, cameras with resolutions of 1920x1080 might be said to have 1080p resolution.

Obviously, the higher the camera's video resolution, the better. The minimum video resolution that we recommend for successful facial recognition is about 2500p. Below you can see the effect of different video resolutions.

Note: "4K Ultra HD" has a resolution of approximately 2500p.

As you can see, the license plate is much more legible in the higher resolution.

Angle of View

Angle of View (AoV) is another significant factor impacting face image size; it can increase the face image size by orders of magnitude. A camera with a wide AoV will spread its limited number of pixels over a wide area, a problem which increases dramatically as subjects get further from the camera. Conversely, a small AoV will retain the number of pixels it can use for face image size even as the distance increases.

Wide-angled lenses tend to be bad for facial recognition. They introduce significant perspective distortion, as well as requiring closer distances for accurate results.

Cameras' AoVs are usually reported in their camera specifications sheets. You can also get this information from tools such as IPVM.com calculator.

Distance to Subject

This is the distance from the subject to the camera lens. Obviously, you want the camera to be as close to subjects as possible.

Cameras' zoom functionality can mitigate distance from the subject. Be aware that there are fundamentally two different types of zoom:

Optical zoom – Zooms achieved by using the camera's lens. The lens is used to bend the light onto the full region of the sensor, usually resulting in negligible image loss. Optical zooms are very helpful when performing facial recognition.
Digital zoom - Zooms achieved Scaling video in software is known as digital zoom. Digital zoom takes a smaller region from the already digitized image, cut the existing pixels into smaller ones and stretch those. This process often creates significant degradation of the image. It should be avoided always.

Sharpness

Focus

Focus is critical for successful facial recognition. If a camera model provides a focus control, you should set the camera's focus to where you expect to capture subjects' facial images, as best you can.

Calibrating the camera using manual focusing and a good focus chart will almost always produce better results than using the camer's auto-focus, even if camera manufacturers claim otherwise. Auto-focus may just focus on a door or something at the extreme back end of where you want to focus.

Depth of Field

When considering different cameras, a larger depth of field is desired because it means that the camera will be able to maintain a sharp and clear focus for a greater near and far distance.

Subjects are only in focus for a specific distance from the camera. Subjects both further or closer will be out of focus.

Auto-focus is typically not used with facial recognition because multiple people at different distances will sometimes need to be recognized, and auto-focus typically only focuses on individual objects rather than a group of objects. Auto-focus usually prioritizes focusing on closer objects, which will cause objects further back to lose focus. Furthermore, that closer object might not even be a person's face.

Video Compression

Video compression is the process of encoding video files such that they consume less space and are easier to transmit over the network. Compression is often provided as a setting for the number of bits per second (aka the bitrate) delivered by a video stream. To receive the highest quality video you will need to perform analysis with each specific model of camera that you intend to use. As an initial guide, select a bitrate between 4K (4096) and 8K (8192) with VBR (variable bitrate, as opposed to CBR, constant bitrate), on the h.264 encoder is usually good to start.

Lighting Conditions

It is critical that subjects' faces are illuminated well enough that facial details are clearly visible by human eyes. The color of the light should be white; colored light can alter or "flatten" people's skin tones.

Backlight

If the environment behind people is brighter than the light illuminating people's faces, the people will appear dark and with reduced details because the camera's sensor will be overwhelmed by the brightness behind the people. In such situations, a bright white light located near the camera illuminate people's faces. Such a light has the added benefit of causing most people to look directly at the camera as they seek the source of the bright light shining in their faces, which helps their Center Pose Quality. (See the Center Pose Quality section below for more information.)

Low Light

A good low light camera will produce a video image that maintains image detail both within dark areas as well as within bright areas. A bad low light camera produces banding and noisy/grainy video when in low light. These functional differences are often the result of which sensor type the camera is using. Good low light cameras often use CCD sensors, while bad low light cameras often use less expensive CMOS sensors instead.

Center Pose Quality

A value from 0 to 1 that specifies how directly a face is looking at the camera. If a face is looking directly at the camera, this value is 1. The more that the face turns away from the camera, the lower this value becomes.

Yaw Angle (horizontal)

Yaw is the horizontal angle between the direction a subject is looking and the camera line of sight. The ideal angle for facial recognition is 0°. (i.e. The subject is looking directly at the camera.) Facial recognition works well for angles up to 30°. Between 30° and 60° recognition still occurs but only if motion is relatively low or the lighting is good. At angles above 60° up to 90° facial recognition is very challenging but still possible.

Depression Angle (vertical)

Depression angle is the vertical angle from the subject's face up (or down) to the camera. A value of 15° or less is best though up to 30° is acceptable. Values greater than 45° will present a challenge to the face recognition software.

Data Rate

Frame Rate

Frame rate refers to the number of video frames delivered by the camera per second. In general, 15 frames per second is considered the minimum for real-time surveillance. When selecting which camera(s) to use for facial recognition, check to see if the frame rate changes significantly with resolution. If it does, that's an indication that after-capture software is scaling the video, which is bad for facial recognition.

Video Bitrate

The video bitrate should be selected to ensure highest quality possible within the network limitations. The table below provides the recommended video bitrate for common resolutions.

Resolution	Bitrate (Kbps)	30 fps	20fps	15fps	10 fps
3000p	Max	27000	20500	16400	12300
	Avg	11000	8200	6600	5200
2160p	Max	20000	14300	11300	9200
	Avg	8000	6100	5100	4200
2048p	Max	14000	9200	7700	6100
	Avg	6000	4200	3700	2900
1920p	Max	11000	8200	6700	5100
	Avg	5000	3700	3200	2600
1440p	Max	8000	5100	4400	3600
	Avg	4000	2600	2200	2200
1080p	Max	5000	3100	2200	1500
	Avg	2500	1900	1600	1200

When using constant bitrate (CBR), the Max value shown above is recommended.
Select the best compression technology available for the camera (h.264, h.264+, or h.265). Some cameras offer custom technologies that reduce bandwidth usage even further. For example, ZipStream by Axis supports dynamic frame rate, dynamic GOP, and region of motion encoding which greatly reduce bandwidth usage while still maintaining compatibility with all standard decoders.