Recognize and Track Persons

You use an instance of the ObjectTracker class to detect, recognize, and track persons in a video stream. The video stream may originate from a camera or a video file. The object tracker allows you to either process the video stream in realtime or as fast as possible. The former mode would be typically used for a video stream that originates from a camera while the later mode can be used to process the content of a video file at a speed faster than the normal playback speed of the video file.

The following explains the creation and use of an object tracker.

Use the C API to Run the Object Tracker

Initialize the ArgusKit Framework

You must initialize the ArgusKit framework before you request any of its APIs. You can do this by calling the ARKInit() function at the beginning of your application.

Note: You may call this function only once per application session.

// Initialize ArgusKit
ARKInit()

Create the Object Tracker

Create an instance of an object tracker configuration object that stores all the configuration information an object tracker needs to do its work. Most of the configuration information has sensible default values, but the following information must be explicitly provided:

The following code snippet shows how to set up the object tracker configuration and how to initiate the object tracker.

// Using the C API directly to configure the object tracker
 
let configRef = ARKObjectTrackerConfigurationCreate()!
let envRef = ARKEnvironmentCopyNamed("com.real.PROD")
ARKObjectTrackerConfigurationSetObject(configRef, kARKObjectTrackerConfigurationKey_Environment, envRef)
 
let userRef = ARKUserCreate("<Username>", "<Password>")
ARKUserSetDirectory(userRef, "main")
ARKEventReporterConfigurationSetObject(configRef, ARKEventReporterConfigurationKey(kARKEventReporterConfigurationKey_User.rawValue), userRef)
ARKObjectRelease(userRef);
ARKObjectRelease(envRef)
             
// Configure all the desired properties here
ARKObjectTrackerConfigurationSetString(configRef, kARKObjectTrackerConfigurationKey_SiteID, "Building 1")
ARKObjectTrackerConfigurationSetString(configRef, kARKObjectTrackerConfigurationKey_SourceID, "Camera 1!")
             
 
// Create the object tracker
var callbacks1 : ARKObjectTrackerCallbacks = ARKObjectTrackerCallbacks()
callbacks1.context = Unmanaged.passUnretained(self).toOpaque()
callbacks1.willBeginTracking = object_tracker_will_begin_tracking_callback
callbacks1.didEndTracking = object_tracker_did_end_tracking_callback
callbacks1.didCreateTrackingResult = object_tracker_did_create_tacking_rsult_callback
 
var trackerRef = ARKObjectTrackerCreate(configRef, true, &callbacks1)
ARKObjectRelease(configRef)

This example code selects the desired cloud environment and creates a new user object with the required user identifier and password. It also sets the cloud directory where the cloud-based face recognizer should save recognition-related information.

It then creates an object tracker configuration object and sets the cloud environment, cloud user, and some additional information to help identify the camera. It then sets up the necessary callbacks the object tracker should invoke as it processes the video stream. Finally, the example code creates the actual object tracker object.

Note: The example code assumes that the input video stream originated from a camera. This is why true is passed to the real-time parameter of the ARKObjectTrackerCreate() function.

Start a Tracking Session

Next, start a new tracking session. All tracking-related activities are done in the context of a tracking session. The object tracker uses a tracking session to maintain the necessary state. The following code snippet shows how to start a new tracking session.

ARKObjectTrackerBeginTracking(trackerRef);

Always end the current tracking session and start a new session if the video source has changed or the resolution or frame rate of the video stream has changed. For example, end the current tracking session and start a new one if the user switched cameras or selected a different capture profile.

The object tracker invokes the begin-tracking callback at the start of a new tracking session. Your application can use this callback as a signal that a new tracking session has started. The following code snippet shows an example of such a callback.

private func object_tracker_will_begin_tracking_callback(_ trackerRef: ARKObjectTrackerRef, _ context: UnsafeMutableRawPointer?)
{
    print("willBeginTracking\n");
}

Run a Tracking Session

A new video frame object should be created for every decoded video frame, and this video frame object should then be passed to the object tracker. The object tracker in turn runs a face detector on the video frame and it triggers face recognitions as needed. The object tracker then updates its internal list of tracked object with the result of detectors and recognizers.

The object tracker invokes the application-provided callback with the current state of the tracked objects list. The application can inspect this list and trigger actions based on it. Note however that the application should create a copy of the tracked object list if it wants to retain the data (e.g. if the application wants to process the tracked objects on a different thread).

The following code snippet shows how to create a video frame object and how to pass it to the object tracker.

let frameRef = ARKVideoFrameCreateWithPixelBuffer(imageBuffer, timestamp, false)!
 
// Pass this video frame to the object tracker
ARKObjectTrackerTrackObjects(trackerRef, frameRef);
ARKObjectRelease(frameRef)

You must provide a timestamp when you create a video frame object. This is typically the presentation timestamp of the video frame. The isSceneChange parameter of the ARKVideoFrameCreateWithPixelBuffer() function should be set to true if the video frame is the first video frame after a scene change. Scene changes in movies are often indicated by a cut transition from one scene to another. The object tracker uses this information to enhance its ability to disambiguate between persons in different scenes.

Note: Although a video stream from a camera includes key frames, these key frames do not indicate a scene change for tracking purposes, so the key frames should not be treated as a scene change.

The following code snippet shows an example implementation of the did-track that simply prints a description of the new tracking-results callback.

private func object_tracker_did_create_tacking_rsult_callback(_ trackerRef: ARKObjectTrackerRef, _ resultRef: ARKTrackingResultRef, _ context: UnsafeMutableRawPointer?)
{
    ARKObjectPrintDebugDescription(resultRef)
}

The object tracker invokes the end-tracking callback at the end of the tracking session. Your application code can use this callback as a signal that a tracking session has ended. The following code snippet shows an example implementation of such a callback:

private func object_tracker_did_end_tracking_callback(_ trackerRef: ARKObjectTrackerRef, _ context: UnsafeMutableRawPointer?)
{
}

End a Tracking Session

You inform the object tracker about the end of a tracking session by invoking the ARKObjectTrackerEndTracking() function. This allows the object tracker to clean up its internal state and know that it should execute pending callbacks as soon as possible. The following code snippet shows how to end a tracking session.

ARKObjectTrackerEndTracking(trackerRef);

Use the Swift ArgusKit Classes to Run the Object Tracker

The steps below describe the integration points with the application view controller and the ArgusKit Swift classes. For additional information, look in the CameraViewController class for comments that begin with: // ArgusKit: Integration Point.

  1. Create a variable that holds an instance of the ArgusKitController.

    // ArgusKit: Integration Point
    // Instance variable for the ArgusKitController
    private var argusKitController: ArgusKitController = ArgusKitController()
  2. In the configureArgusKitController() method, locate the following code.

    // ArgusKit: Integration Point
    // Decide whether the application needs object tracking in video or image analyzing in photos or both.
    // Begin: Video Object Tracking - This code is only needed if the application wants to track objects in video
    //
    // Load the object tracker configuration from the preferences.  If there are no username/password credentials then ArgusKit will run in offline mode which means it will only detect faces and not recognize them.
    let objectTrackerConfiguration = ObjectTrackerConfiguration.objectTrackerConfigurationFromAppPreferences()
    
    // Create the object tracker with the specified configuration
    argusKitController.createObjectTracker(withConfiguration: objectTrackerConfiguration)
    
    // Set the tracking result handler.  This will be called every time there is an update in the tracking status.
    argusKitController.trackingResultHandler = {  [unowned self] (trackingResult: TrackingResult) in
        self.handleTrackingResult(trackingResult)
    }

    This creates an object tracker configuration. This is read from the persistent preferences in the sample app, which uses them, along with some other hard-coded values, to create the object tracker configuration. It then creates an object tracker with the specified configuration. It also sets up the handler called whenever the tracking state changes. If no credentials are provided in the configuration, then the object tracker runs in offline mode and only detects faces locally. The cloud server is not contacted for recognition in this case.

  3. After capturing a new frame of video, pass it to the object tracker. The following code illustrates how this is done in the demo application.

    @objc public func captureOutput(_ captureOutput: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    
            // ArgusKit: Integration Point
            // Pass the sample buffers in to the object tracker
            let videoFrame = VideoFrame.videoFrame(fromSampleBuffer: sampleBuffer)
            argusKitController.trackObjects(videoFrame: videoFrame)
    
            // Save off all these values here so the CMSampleBuffer isn't referenced while we updated the properties later on the main thread.
            let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)!
            let videoFrameImage = CIImage(cvPixelBuffer: imageBuffer)
            let videoResolutionRect = CVImageBufferGetCleanRect(imageBuffer)
    
            // Here we update the video frame image
            DispatchQueue.main.async {
                self.trackedObjectsVideoOverlayView.videoResolution = videoResolutionRect.size
                self.videoView.videoFrameImage = videoFrameImage
            }
    }
  4. Implement a completion handler to receive the tracking results.

    // ArgusKit: Integration Point
    // Receive tracking events from the object tracker
    private func handleTrackingResult(_ trackingResult: TrackingResult) {
    
        // This is called every time a change occurs in the current tracking result.
        DispatchQueue.main.async {
            self.trackedObjectsVideoOverlayView.trackedFaces = self.argusKitController.trackedFaces
        }
    }

See Also