Session Analysis API
General Session Analysis
After a session is run, different analyses can be run, depending on how the session is configured. The following route allows for querying the analysis status.
Wait for Session Analysis to Complete
The following route can be used to wait for the session analysis to complete.
To wait for up to 10 seconds to complete, run:
The <code classs="dcode">timeout</code> parameter is interpreted as follows:
- <code class="dcode">timeout < 0</code>: Wait indefinitely for the analysis to complete.
- <code class="dcode">timeout > 0</code>: Wait for up to <code classs="dcode">timeout</code> seconds for the analysis to complete.
- <code class="dcode">timeout = 0</code>: Check the current state of the analysis.
If <code classs="dcode">timeout</code> is omitted entirely, it is assumed to be negative, and will thus block indefinitely until the analysis is complete.
Example Response
If the request times out, the <code class="dcode">status</code> field will be set to <code class="dcode">timeout</code>. For example:
NOTE: The <code class="dcode">status</code> field may also be <code class="dcode">error</code> if there is a problem with the request.
Wait for Specified Session Analyses to Complete
The following route can be used to wait for a particular set of analyses to complete.
The table below shows the analyses that are currently supported for tracking:
Example 1: Check the curent state of the <code class="dcode">"video-quality-mos"</code> analysis
Example Response: The <code class="dcode">"video-quality-mos"</code> analysis completed
Example Response: The <code class="dcode">"video-quality-mos"</code>analysis is still on-going
Example 2: Wait until both <code class="dcode">"audio-activity"</code> and <code class="dcode">"audio-loudness-issue"</code> analyses complete
Example Response When Both Analyses Finish
Visual Page Load Analysis
Please see the "Page Load" section in our documentation on measuring visual load time in HeadSpin Sessions.
Loading Animation Analysis
Please see the "Loading Animation" section in our documentation on measuring visual load time in HeadSpin Sessions.
Video Quality Analyses
HeadSpin's suite of reference-free video quality analyses (see User-Perceived Video Quality for more details) measure the quality of video content in a session screen recording and plot the time series of the metrics in the Waterfall UI. The video quality analyses are currently applied automatically to time intervals in a session where the device is in the landscape orientation. To apply these analyses to custom time intervals in a session, you can use the endpoints documented in this section. Whereas it is appropriate to run these video quality analyses on time intervals in a session that contain video content (e.g., a video ad playing on the screen), it would not be useful to run them when there is no video content (e.g., a text-heavy news article is on the screen).
Video Quality Analysis Keys
The following table shows the available analysis keys along with what each analysis does:
For each <code class="dcode">"video_content"</code> label, you can provide the analysis keys in the <code class="dcode">"data"</code> field to specify the analyses you would like to run.
For example, to run the brightness and colorfulness analyses, add the corresponding analysis keys in the following format to the <code class="dcode">"data"</code> field:
Note that if no analysis keys are provided, then the entire suite of video quality analyses will be run for the <code class="dcode">"video-content"</code> label.
Video Quality Analyses via Session Labels
Add <code class="dcode">"video-content"</code> labels to provide the time intervals in a session to be analyzed for video quality. Optionally, you can do the following:
- Provide the bounding box on the screen where you wish to run video quality analyses.
- Provide the specific analyses you would like to run.
Run video quality analyses by adding <code class="dcode">"video-content"</code> labels
To run the video quality analyses, simply add labels on the time intervals of interest with the <code class="dcode">"label_type"</code> specified to <code class="dcode">"video-content"</code>. Optionally, you can specify the coordinates of a bounding box in the <code class="dcode">"video_box"</code> field of a label. See our documentation on applying spatial filtering on video analyses. Also optionally, you can provide the analysis keys in the <code class="dcode">"data"</code> field to specify the analyses you would like to run for each label.
Example
Response
These are the label IDs of the three <code class="dcode">video-content</code> labels that you added. See the video quality metric time series plotted on the time intervals specified by the labels on the Waterfall UI.
Video Quality API
WARNING: This API is deprecated. Please add <code class="dcode">"video-content"</code> labels to run the video quality analyses instead.
*Tags may be added to the session via the Session Tag API.
Request body
The request body must be a JSON object with the <code class="dcode">"session_id"</code> field. It can also optionally contain <code class="dcode">"regions"</code> to provide the time intervals in a session to be analyzed. <code class="dcode">"start_time"</code> and <code class="dcode">"end_time"</code> are timestamps relative to the session start. See Note on Time Formatting for a description of allowable <code class="dcode"><time></code> formatting. If the optional <code class="dcode">"regions"</code> argument is omitted, the API will run the analysis over the full session.
Example 1: Run blurriness analysis
Recommended: You can run the suite of video quality analyses (which includes blurriness analysis) via adding "video-content" labels instead. Instead of the request above, you can use the example request below:
Response 1
If the request is successful:
A successful request also plots a time series of the blurriness scores in the Waterfall UI.
A <code class="dcode">HTTP 404</code> response is returned if the session is not found. Check that the <code class="dcode">session_id</code> is correct.
Example 2: Run screen freezing analysis
The <code class="dcode">analysis: screen freezing</code> session tag must be included in the session data prior to running the screen freezing analysis. You can add the session tag with the following request:
Once the tag has been successfully added, then you can run the screen freezing analysis:
Recommended: You can run the suite of video quality analyses (which includes screen freezing analysis) via adding "video-content" labels instead. The <code class="dcode">analysis: screen freezing</code> session tag is still required for the screen freezing analysis to run. Instead of the request above, you can use the example request below:
Response 2
If the request is successful:
If screen freezing regions are found, then the Screen Freezing Issue card and highlights are rendered in the Waterfall UI.
A <code class="dcode">HTTP 404</code> response is returned if the session is not found. Check that the <code class="dcode">session_id</code> is correct.
VMAF Analysis
Run the Video Multi-Method Assessment Fusion (VMAF) perceptual video quality algorithm developed by Netflix. This analysis requires a reference video to run and must be specified in the API request body or in the session tag. The analysis will produce the following set of time series, as generated by the Netflix VMAF algorithm:
- Video Multi-Method Assessment Fusion (VMAF)
- Detail Loss Metric (ADM2)
- Multi-Scale Structural Similarity Index (MS-SSIM)
- Peak Signal To Noise Ratio (PSNR)
- Structural Similarity Index (SSIM)
- Visual Information Fidelity (VIF)
For more information on the VMAF video quality time series metrics, refer to the Waterfall UI documentation.
The VMAF analysis is designed for strict frame-by-frame comparison between two videos; the two videos being compared should have identical content frame to frame as well as identical fps and dimensions. An example scenario where the VMAF analysis is useful is optimizing compression of a video to minimize the file size while preserving the subjective quality of the content. VMAF scores may only be meaningfully compared across variants of a single reference video (e.g., multiple compressed versions of a single source video). Comparison of scores across multiple reference videos is not meaningful.
This analysis can also be launched by applying a session tag with key <code class="dcode">vmaf</code> and value <code class="dcode">{reference_session_id}</code>. For information on adding session tags, see the Session Tag API.
Example 1: Run VMAF analysis & wait indefinitely until it finishes
Response 1
If the request is successful, and the analysis ran successfully:
Note that although the analysis ran successfully, <code class="decode">warnings</code> may indicate that the videos being compared are not suitable for the VMAF analysis. Please keep in mind that only videos that can be compared pixel-by-pixel and frame-by-frame can provide meaningful VMAF scores.
If the request is successful, but an error occurred during the analysis:
Example 2: Start the VMAF analysis & return immediately
Response 2
If the request is successful:
Checking the VMAF analysis status
You can check the status of the analysis with the following route:
Example
Response
If the analysis is still in progress:
If the analysis finished, and the analysis ran successfully:
Note that although the analysis ran successfully, warnings may indicate that the videos being compared are not suitable for the VMAF analysis. Please keep in mind that only videos that can be compared pixel-by-pixel and frame-by-frame can provide meaningful VMAF scores.
If the analysis finished, but an error occurred during the analysis:
Audio activity analysis
The audio activity analysis finds time intervals of audio activity based on the Audio Volume time series (which tracks the intensity of the audio waveform). This analysis is useful for finding the exact timing of audio events and is highly customizable. See the section below on Tunable Thresholds for details regarding customization.
Run audio activity analysis by adding labels
To run the audio activity analysis, simply add labels on the regions of interest with the <code class="dcode">"label_type"</code> specified to <code class="dcode">"audio-activity-request"</code>. For each <code class="dcode">"audio-activity-request"</code> label that you add, you can provide custom thresholds in the <code class="dcode">"data"</code> field. If none are provided, then the default thresholds (see the section below Tunable Thresholds) are used.
Example: Add <code class="dcode">"audio-activity-request"</code> labels
Response
These are the label IDs of the two <code class="dcode">"audio-activity-request"</code> labels that you added. If the audio activity regions are found, then "audio-activity-result" labels are created on them during the analysis.
Retrieve audio activity analysis results by fetching labels
You can check the session Waterfall UI to see if any <code class="dcode">"audio-activity-result"</code> labels have been created.
You can also look up the labels with a label group ID since a <code class="dcode">"audio-activity-request"</code> label and its resulting <code class="dcode">"audio-activity-result"</code> label(s) are "linked" by the same label group ID.
Example: Look up a <code class="dcode">"audio-activity-request"</code> label
First, look up the "audio-activity-request" label with its label ID to obtain its label group ID:
Response
Example: Get <code class="dcode">"audio-activity-result"</code> labels by looking up all labels associated with a label group
Then look up the labels that share the same label group ID to find the <code class="dcode">"audio-activity-result"</code> label(s):
Tuneable Thresholds
The following thresholds can be tuned (default values shown) in the audio activity analysis:
The thresholds are used in the following ways:
- Regions of the session timeline where the audio volume time series values are above the <code class="dcode">"volume"</code> threshold for more than <code class="dcode">"duration_ms"</code> are identified.
- If the gaps between any of the consecutive regions are less than <code class="dcode">"merge_ms"</code>, then the consecutive regions are merged into one.
For example, let's say two regions, 100 - 200 ms and 500 - 800ms, are identified after Step 1. Since the gap between these two regions 300 ms (which is greater than the default <code class="dcode">"merge_ms"</code> threshold of 200 ms), these regions are recognized as two separate audio activity regions, creating two <code class="dcode">"audio-activity-result"</code> labels. If the <code class="dcode">"merge_ms"</code> threshold had been set to 500, then these regions would be merged into one, and only one <code class="dcode">"audio-activity-result"</code> label would be created.
If you wish to use custom thresholds on a region of interest, specify the thresholds in the "data" field when adding a <code class="dcode">"audio-activity-request"</code> label.
Integrated Loudness in LUFS
LUFS (Loudness Units relative to Full Scale) is a standardized measurement of audio loudness that factors human perception and electrical signal intensity together. Standardized in EBU R 128 and ITU-R BS.1770, LUFS is synonymous with LKFS (Loudness, K-weighted, relative to Full Scale), and one unit of LUFS or LKFS is equal to one unit dB. LUFS are used to set targets for audio normalization in broadcast systems for cinema, TV, radio, and music streaming. For example, Spotify and Youtube stream audio at around -14 LUFS.
You can get the integrated loudness measurement (loudness over the entire duration of the audio track) of the session video using the following route:
Note that integrated LUFS isn't artificially skewed by silences or super low-level audio because it actually excludes audio below a certain loudness. Please refer to ITU-R BS.1770-4 for details on how the integrated loudness in LUFS is computed.
Example
Response
If the session video does not contain audio:
If the session video does contain audio but integrated loudness in LUFS cannot be computed:
If integrated loudness in LUFS is successfully computed:
OCR
The HeadSpin platform provides the capability to run OCR (Optical Character Recognition) to extract text from the frames in the session video capture. We use Tesseract, a popular open-source OCR engine, and the specific version we support is 4.1.1, which is backed by an LSTM (Long Short-Term Memory)-based neural network.
Currently the only languages we support are English and Japanese.
Run OCR by adding labels
To run OCR, simply add labels on the regions of interest with the <code class="dcode">"label_type"</code> specified to <code class="dcode">"ocr-request"</code>. You must specify the coordinates of the bounding box where you wish to run OCR in the <code class="dcode">"video-box"</code>. See our documentation on applying spatial filtering on video analyses.
For each <code class="dcode">"ocr-request"</code> label that you add, you can optionally provide custom analysis parameters in the <code class="dcode">"data"</code> field. If none are provided, then the default parameters are used. See the Configurable Parameters section below for details.
Example: Add <code class="dcode">"ocr-request"</code> labels
Response
These are the label IDs of the two <code class="dcode">"ocr-request"</code> labels that were added. For each <code class="dcode">"ocr-request"</code> label, <code class="dcode">"ocr-result"</code> labels are created for each of the text segments detected with the given analysis parameters. The <code class="dcode">"name"</code> field in an <code class="dcode">"ocr-result"</code> label contains the detected text, and the <code class="dcode">"confidence"</code> key in the <code class="dcode">"data"</code> field maps to the model's confidence value for each respective word.
Retrieve OCR results by fetching labels
You can check the session Waterfall UI to see if any "ocr-result" labels have been created.
You can also look up the labels with a label group ID since a <code class="dcode">"ocr-request"</code> label and its resulting <code class="dcode">"ocr-result"</code> label(s) are <code class="dcode">"linked"</code> by the same label group ID.
Example: Look up a <code class="dcode">"ocr-request"</code> label
First, look up the "ocr-request" label with its label ID to obtain its label group ID:
Response
Example: Get <code class="dcode">"ocr-result"</code> labels by looking up all labels associated with a label group
Then look up the labels that share the same label group ID to find the <code class="dcode">"ocr-result"></code> label(s):
Response
Configurable Parameters
The following parameters can be configured (default values shown) for OCR:
"ocr_config" (Default: "--psm 7")
This is the custom configuration that gets passed directly to the Tesseract API. Multiple config options can be specified with space delimiters, but the only config options currently accepted on the HeadSpin platform are shown below:
--psm N
Specify the Page Segmentation Mode to set Tesseract to only run a subset of layout analysis and assume a certain form of image. Note that the automatic modes are much slower than the specific modes. The accepted options for N are:
-l LANG
Set the language to detect. If none is specified, English is assumed. Multiple languages may be specified, separated by plus characters. Tesseract uses 3-character ISO 639-2 language codes. The accepted options for LANG are:
For example, if you're trying to detect both English and Japanese, then set -l eng+jpn.
-c CONFIGVAR=VALUE
Set value for parameter CONFIGVAR to VALUE. Multiple -c arguments are allowed. The currently accepted options for CONFIGVAR are:
For example, if you only want to detect numeric digits, then set <code class="dcode">-c tessedit_char_whitelist=0123456789</code>.
"confidence_threshold" (Default: 80)
This is the threshold for the confidence for the text detected by Tesseract, and it should be a numeric value between 0 and 100. Any detected word with a confidence value below this threshold will be dropped from the result output. For example, say the detected text for an <code class="dcode">"ocr-request"</code> label is <code class="dcode">big fluffy cat</code>, and the confidence values for each word are <code class="dcode">[80, 30, 80]</code>. If the <code class="dcode">"confidence_threshold"</code> is set at <code class="dcode">80</code>, then <code class="dcode">fluffy</code> with its confidence value of 30 is dropped, and the <code class="dcode">"ocr-result"</code> label is created with <code class="dcode">big cat</code> as the name.
"landscape" (Default: <code class="dcode">false</code>)
This parameter is applicable only for sessions captured on mobile sessions and is used to indicate whether the time interval on which you're running OCR is captured in landscape mode. If the video frames in your time interval of interest are in landscape mode, you must indicate <code class="dcode">"landscape": true</code> so that the text can be oriented correctly for analysis.
"target_height" (Default: <code class="dcode">null</code>)
This is the target height (in pixels) that the bounding box should be resized to (with the width resized proportionally). The default value is <code class="dcode">null</code>, which means no resizing is to take place. If provided, it must be an integer value that's larger than 20. See below in Troubleshooting for the explanation on when you would want to set this parameter.
Example: Custom Parameters
Troubleshooting
If you're not getting the desired results with OCR, following one or more of the recommended actions below may help improve the quality of the output.
Please note that you can retrieve keyframe screenshots to see how well the bounding box contains the text you're trying to detect. An example request to download the screenshot of the frame at the start of a label is shown below -- the screenshot will be cropped to the bounding box that you specified in the label:
- [Only applicable to mobile sessions] Is the text you're trying to detect from video frames that are captured in landscape mode?
For a session captured on a mobile device, the screen may have been captured in landscape mode for the entire session duration or parts of the session. If the time interval of interest was captured in landscape mode, you must indicate so when running OCR (see <code class="dcode">landscape</code> in Configurable Parameters) since the text is assumed to be vertically oriented by default.
- What kind of text is in the bounding box?
The default Page Segmentation Mode (psm) is 7 (See <code class="dcode">ocr_config</code> in Configurable Parameters), which means that Tesseract is expecting a single line of text. Does that apply to the text you're trying to detect? If not, try a different Page Segementation Mode. Please keep in mind that compared to the specified modes, the automatic modes are much slower and may underperform in accuracy.
- How are you cropping the text you're trying to detect?
Tesseract expects the image to have some empty margins around the text. How close are the boundaries of the bounding box to the text you're trying to detect? Download the screenshot to make sure the bounding box you selected is not cropped too close to the text.
- In the bounding box, how well does the text contrast against the background relative to other objects?
Tesseract internally applies various image pre-processing steps prior to running the actual OCR, one of which is binarization. Using Otsu's Binarization, Tesseract converts an image to black and white in order to better segment the text from the background. Tesseract assumes that the image, when converted to grayscale, has a bimodal distribution of pixel values (there are two distinct regions in the image histogram). Are there primarily two constrasting colors -- one for the background and the other for the text you're trying to detect -- in the bounding box? And does the bounding box contain other objects that also contrast against the background? If there are other objects that contrast against the background just as much as the text does, then OCR will try to interpret these objects as text, which can produce undesirable results. In the worst case scenario, if the other objects contrast much more strongly against the background than the text does, then when binarized, the text can get lumped in with background, leaving only the objects to be distinguishable. Download the screenshot to see if you can specify the coordinates for a "cleaner" bounding box that consists only of the contrasting text and background and is free of interfering objects.
- How big is the text you're trying to detect?
The accuracy of Tesseract can drop significantly if the text in the bounding box is too small. If the text that you're trying to detect in the bounding box is smaller than 20 pixels in height, we recommend that you set <code class="dcode">"target_height"</code> in your custom parameters, which will resize the bounding box to this target height (with the width resized proportionally) prior to running the OCR. You must select an appropriate value for <code class="dcode">"target_height"</code> so that the text within the bounding box (not just the bounding box itself) will be resized to at least 20 pixels tall. Note that resizing adds a computational overhead, so the analysis will take longer to complete.
Audio Match Analysis
The Audio Match Analysis finds where the "reference audio" (assumed to be the original audio resource) exists in the "test audio" (assumed to be the longer, captured audio that contains the "reference audio") and outputs metadata about the quality of the match. This analysis is equivalent to the Audio Match analysis used with audio files in the Audio API and can run on sessions that have been captured with audio. When the analysis is used on sessions, the "test audio" is the audio segment of a session (represented by a label), whereas the "reference audio" can be another audio segment of any session (represented by a label) or any media file (from the Audio API or AVBox API). Please note that the current version of the analysis is optimized for "test audio" shorter than 5 minutes and "references audio" shorter than 40 seconds.
Run audio match analysis by adding labels
To run audio match analysis, simply add labels on the time intervals of interest with the <code class="dcode">"label_type"</code> specified to <code class="dcode">"audio-match-request"</code>. For each <code class="dcode">"audio-match-request"</code> label that you add, the audio segment in the time interval indicated by the label will serve as the <code class="dcode">"test audio"</code>, whereas the ID of the <code class="dcode">"reference audio"</code> (that you wish to find in your <code class="dcode">"test audio"</code>) should be provided as either <code class="dcode">"ref_audio_label_id"</code> or <code class="dcode">"ref_audio_media_id"</code> in the <code class="dcode">"data"</code> field of the label.
If using <code class="dcode">"ref_audio_label_id"</code>, then you must provide a label ID, and the <code class="dcode">"reference audio"</code> is the audio segment in the time interval indicated by this label ID. If using <code class="dcode">"ref_audio_media_id"</code>, then you must provide either an audio ID or media ID, and the <code class="dcode">"reference audio"</code> is either an audio file or video file that contains audio that you uploaded via Audio API or AVBox API.
For each <code class="dcode">"audio-match-request"</code> label that you add, you can optionally provide custom thresholds for determining the match status in the <code class="dcode">"data"</code> field. If none are provided, then the default thresholds are used. See the Tunable Match Status Thresholds section below for details.
Example: Add <code class="dcode">"audio-match-request"</code> labels
Response
These are the label IDs of the two <code class="dcode">"audio-match-request"</code> labels that were added. For each of these labels, an <code class="dcode">"audio-match-result"</code> label is created on the matched time interval if all the criteria are met. See the Audio Match Result Explained section for details relating to the matching criteria and analysis result.
Retrieve audio match analysis results by fetching labels
You can check the session Waterfall UI to see if any <code class="dcode">"audio-match-result"</code> labels have been created.
You can also look up the labels with a label group ID since a <code class="dcode">"audio-match-request"</code> label and its resulting <code class="dcode">"audio-match-result"</code> label are <code class="dcode">"linked"</code> by the same label group ID.
Example: Look up a <code class="dcode">"audio-match-request"</code> label
First, look up the "audio-match-request" label with its label ID to obtain its label group ID:
Response
Example: Get <code class="dcode">"audio-match-result"</code> labels by looking up all labels associated with a label group
Then look up the labels that share the same label group ID to find the <code class="dcode">"audio-match-result"</code> label:
Response
Audio Match Result Explained
For each of the <code class="dcode">"audio-match-request"</code> labels, an <code class="dcode">"audio-match-result"</code> label is created on the matched time interval if the following criteria are met:
- The "reference audio" is found in its entirety in the "test audio".
- The <code class="dcode">"match_score"</code> is high enough (according to the <code class="dcode">"match_status_thresholds"</code>) to be a <code class="dcode">"partial"</code> or <code class="dcode">"full"</code> match.
The <code class="dcode">"data"</code> field of an <code class="dcode">"audio-match-result"</code> label is a JSON object with the following keys:
Tuneable Match Status Thresholds
The following thresholds for determining the match status can be configured (default values shown):
See Audio Match Result Explained for details on <code class="dcode">"match_status_thresholds"</code>.
Note that if using custom <code class="dcode">"match_status_thresholds"</code>, all four thresholds must be provided -- e.g., using <code class="dcode">"match_status_thresholds": {"full": {"match_score": 0.8, "prominence_ratio": 1.1}, "partial": {"match_score": 0.5}}"</code> will result in an error.
Image Match Analysis
The Image Match Analysis suite consists of algorithms to match segments of session video against user-uploaded query images. For example, performing a template match -- one of the analyses in the image match category -- determines time intervals in session video that contain a provided image anywhere on the screen. The general workflow is similar for all of the image match analyses:
- Upload a query image through the provided API endpoint and receive a corresponding <code class="dcode">image_id</code>.
- Create a request label over the region of interest and specify the <code class="dcode">image_id</code> of the desired query image.
- After the analysis finishes running, head to the Waterfall UI or use the API to fetch your result labels.
Query images
Query images describe what to find in a session video during an image match request. For example, during a template match, a query image corresponds to the sub-image in a session video the analysis should identify. There are a number of endpoints for interacting with and managing query images, which are each assigned and referred to by a unique <code class="dcode">image_id</code>. Query images are stored at the organization level, so other users within your organization may use your uploaded query images given the <code class="dcode">image_id</code>, or vice versa.
Upload query images
Often times query images may originate from snippets of session video, in which case it may be useful to first retrieve a keyframe screenshot (with spatial filtering to crop as needed) to then upload as a query image. With a desired image in mind, the following endpoint can then be used to upload the query image and retrieve its unique <code class="dcode">image_id</code>.
Query images support all OpenCV-compatible formats, as described in the OpenCV documentation. This includes most common image types, such as <code class="dcode">.png</code> and <code class="dcode">.jpg</code>/<code class="dcode">.jpeg</code>.
Example: upload a query image with an optional name
Response
Download query images
Example
Response
- <code class="dcode">HTTP 200 OK</code> if the request is successful. The file is saved as specified by the <code class="dcode">-o</code> option.
- <code class="dcode">HTTP 400</code> if the provided <code class="dcode">image_id</code> is not a valid UUID.
- <code class="dcode">HTTP 404</code> if no query image exists with the provided <code class="dcode">image_id</code>.
Delete query images
Example
Response
- <code class="dcode">HTTP 200 OK</code> if the query image is successfully deleted.
- <code class="dcode">HTTP 400</code> if the provided <code class="dcode">image_id</code> is not a valid UUID.
- <code class="dcode">HTTP 404</code> if no query image exists with the provided image_id.
Retrieve query image metadata
Example
Response
List query images
The query image listing endpoint is used to fetch all <code class="dcode">image_id</code> values for a user's organization. Additionally, these results can optionally be filtered by the email address of the uploader and the value of the name field. Filtering on email address requires the exact email address, while filtering on name uses standard MySQL pattern matching syntax.
Example: list query images with optional name and email address
Response
Run image match analysis by adding labels
To run an image match analysis, simply add labels on the time intervals of interest with the <code class="dcode">"label_type"</code> set to <code class="dcode">"image-match-request"</code>. Additionally, specify the following information in the label's <code class="dcode">"data"</code> field:
- <code class="dcode">"method"</code>: the type of image match analysis to run. See the section on image match methods for more information.
- <code class="dcode">"image_id"</code>: the id of the query image to use for the analysis.
- <code class="dcode">"threshold"</code>: a number between 0 and 1 representing the confidence threshold for a positive match.
Specific image match methods may require additional items in the <code class="dcode">"data"</code> field.
Example: Add an <code class="dcode">"image-match-request"</code> label (template match)
Response
Retrieve image match analysis results by fetching labels
You can check the session Waterfall UI to see if any <code class="dcode">"image-match-result"</code> labels have been created.
You can also look up the labels with a label group ID since a <code class="dcode">"image-match-request"</code> label and its resulting <code class="dcode">"image-match-result"</code> labels are <code class="dcode">"linked"</code> by the same label group ID.
Example: Look up an <code class="dcode">"image-match-request"</code> label
First, look up the "image-match-request" label with its label ID to obtain its label group ID:
Response
Example: Get <code class="dcode">"image-match-result"</code> labels by looking up all labels associated with a label group
Using the <code class="dcode">"label_group_id"</code> value from the previous step:
Response
Image match result explained
In the example label group response above, one of the labels is the previously created <code class="dcode">"image-match-request"</code> label, and the other two -- the <code class="dcode">"image-match-result"</code> labels -- contain the results of the template match analysis. Each of the result labels correspond to a separate time interval within the range of the original request label where the query image was found in the session video.
You may notice that the <code class="dcode">"video_box"</code> field is populated in the result labels despite being left blank in the request label. Some image match analyses (such as the template match analysis) set the video box of each individual result label to the spatial region where the match was found. If the match moves around during the result interval, the video box describes the region of the highest confidence match in the interval. For all analyses that do not localize a match region, the video box is simply the video box of the request label.
Image match methods
In the table above, each image match method is aliased to a <code class="dcode">"method"</code> name that may be used in the <code class="dcode">"data"</code> field of a request label (see creating an image match request label). The video box column indicates whether or not the respective image match method sets the spatial region of each match in the video box of the result labels, as described above. The pros and cons of each method are also listed.
Additionally, the recommended values for the <code class="dcode">"threshold"</code> field may vary based on the selected method, and some methods may require or allow additional values in the <code class="dcode">"data"</code> field; see the sections on individual methods below for more details.
Template match
Running a template match determines time intervals in the session video which contain a given query image. Template matching is not scale- or rotation-invariant, so matching is only possible for target images of equivalent size and orientation. Under the hood, template matching uses the industry-standard matchTemplate method packaged in the OpenCV library.
Template match is accessible under the method name <code class="dcode">"template"</code>.
Generally, true positive matches beget very high confidence values, so <code class="dcode">"threshold"</code> values may hover around 0.9, if not higher.
An example <code class="dcode">"data"</code> field for template matching is shown below:
Template match analysis will set the video box of result labels to the spatial region of the match, as described above.
See the Template matching section the in Image Matching Methods Explained documentation for a more in-depth explanation of the template matching method.
Multiscale template match
Multiscale template match works in the same way as template match, except the template matching operation is done for each scale in a range of pre-defined scales. This allows for the the query image to be detected in a frame even if a larger or smaller version of it appears in the frame. Note that due to information compressed/approximated in the up/downscaling of the query image, matches at different scales will no longer be perfect. The optimal threshold to achieve the desired matching results will be lower than if regular template matching was performed on the images at the same scale. We recommend initially testing with a <code class="dcode">"threshold"</code> value around 0.7.
Multiscale matching can be accessed via setting the flag <code class="dcode">"use_multiscale"</code> to true in the <code class="dcode">"data"</code> field. The <code class="dcode">"template"</code> method defaults to regular template matching if <code class="dcode">"use_multiscale"</code> is not set.
Additionally, the default scales used by multiscale template matching are <code class="dcode">[0.25, 0.5, 0.75, 1.0, 1.5, 2.0]</code> if the <code class="dcode">"scaling_factors"</code> field is omitted, but these scales can be optionally customized. The analysis will not be carried out for any scaling factor that will reduce the template size to 0 or enlarge it to a size bigger than the frame.
An example <code class="dcode">"data"</code> field for multiscale template matching (with default <code class="dcode">"scaling_factors"</code> of <code class="dcode">[0.25, 0.5, 0.75, 1.0, 1.5, 2.0]</code>) is shown below:
And if custom scales are to be used:
See the Multiscale template matching section in the Image Matching Methods Explained documentation for a more in-depth explanation of the multiscale template matching method.
Keypoint match
Running a keypoint match determines time intervals in the session video which contain a given query image. This method uses the BRISK algorithm to generate interesting keypoints and corresponding feature vectors which describe the query image and video frames. Similar keypoints are matched between the query image and video frames to find time intervals in the session video where the query image may have appeared.
Unlike template match, the keypoint matching method is tolerant to changes in scale and rotation, if conditions on the number of interesting features are met in both the query image and the object of interest in the video frame. Although measures are taken to maximize the number of interesting points found, the analysis will not be carried out if fewer than 10 keypoints are detected in the query image (due to it being too small or not simple). The matching may also produce undesired results if not enough features are detected from the target of interest in the video frames.
See the Keypoint matching section in the Image Matching Methods Explained documentation for a more in-depth explanation of the keypoint matching method.
The <code class="dcode">"threshold"</code> parameter falls within [0, 1] and controls the sensitivity of the algorithm in returning the final desired time intervals. For keypoint match, the analysis can be performed without having to specify the <code class="dcode">"threshold"</code> parameter beforehand. If not provided, the top 5 suggestions will be made and returned via the <code class="dcode">"image-match-result"</code> labels belonging to the same <code class="dcode">"label_group_id"</code>. The result labels will have <code class="dcode">"<label name> : <suggested threshold>"</code> as their label names. The suggested thresholds will also be available in the <code class="dcode">"Data"</code> field of the result labels.
Since the thresholds are suggested based on results of the analysis, the suggested thresholds and results will change from one query image to another as well as from one session to another session (if the content of the sessions changes). Threshold tuning should be done outside the range of the suggested values as there will be no difference in results if tuned within the suggested values. For example, in the case shown above, running the analysis with the threshold set to anything between 0.8 and 0.935 (non-inclusive) will return the same results. Since the optimal threshold may vary from case to case, we advise that you initially run the analysis without setting a threshold to get a sense of what range of values may be most appropriate for further testing.
Keypoint match is accessible under the method name <code class="dcode">"keypoint”</code>.
An example <code class="dcode">"data"</code> field for keypoint matching (without setting a threshold) is shown below:
And if specifying a threshold:
Keypoint match analysis does not set the video box and returns only the time intervals in which the keypoint score exceeds the given threshold.
Session PCAP Stats Analysis
The session PCAP Stats Analysis provides an interface for extracting various statistical properties derived from the session PCAP. For example, the caller may be interested in the number of bytes that were sent over a particular connection, and what fraction of those bytes were TCP vs UDP.
Query for PCAP Stats
The request body can be configured to tailor the kind of analysis that is performed. The request body supports the following keys:
- <code class="dcode">"timeout"</code>: This field is optional. How long to wait (in seconds) for the analysis before timing out. By default, the request will block indefinitely.
- <code class="dcode">"protocol": "TCP"</code> or <code class="dcode">"UDP"</code>. This field is optional. If specified, the results will only include results pertaining to the supplied protocol.
- <code class="dcode">"units"</code>: <code class="dcode">"frame"</code> (default) or <code class="dcode">"bytes"</code>. The units that should be counted for each packet. <code class="dcode">frame</code> simply counts the packet, whereas <code class="dcode">bytes</code> counts the size of the packet (in bytes).
- <code class="dcode">"index"</code>: <code class="dcode">"protocol"</code> (default),
- <code class="dcode">"direction"</code>, or <code class="dcode">"interpacket"</code>. The kind of index to maintain. <code class="dcode">protocol</code> will return the counts by protocol, <code class="dcode">direction</code> will return the counts by direction, and <code class="dcode">interpacket</code> will return the distribution of inter-packet times (i.e., the time between packet timestamps).
Example: get byte counts indexed by protocol
Response
The data returned is a JSON object representing the requested index. The top level key for each object is <code class="dcode">hosts</code>, which maps to an object with keys corresponding to IP addresses in a connection. The connection keys map to an object with directional keys (either <code class="dcode">forward</code> or <code class="dcode">backward</code>). The directional keys map to objects that have protocol keys that map to the values for that type of index. The schema is likely to change as this feature develops.
Time Series Data Analysis
The HeadSpin platform generates many time series for a session either automatically or via analyses that the user specifically requests to run. In each time series, a metric that pertains to HeadSpin Issue activity, network activity, device metrics, or screen activity is tracked along the session timeline. Although a time series is rich in information and useful for visualizing how a metric changes over time, it can often be challenging to extract useful insights from it. In order to make the time series data more easily digestible, we provide methods of analyzing the time series data.
The general workflow is similar for all of the time series data analyses:
- For a given session, choose the time series that you're interested in analyzing. The time series must be available in the session, and you must retrieve the "time_series_key" of the time series.
- Choose a <code class="dcode">"method"</code> from the table of available time series data analysis methods.
- Create a request label over the time interval of interest. In the <code class="dcode">"data"</code> field of the label, specify the <code class="dcode">"time_series_key"</code>, <code class="dcode">"method"</code>, and <code class="dcode">"parameters"</code> that are specific to the method.
- To check the results of the time series data analysis, depending on the <code class="dcode">"method"</code> used, either 1) fetch your result labels via the session Waterfall UI or the label API to or 2) use the method-specific API to fetch the results.
Retrieve the time series key
The time series keys can be accessed via the following request (see the "Retrieve Available Session Time Series" section in the Session API documentation for more details):
This outputs a json dictionary that maps time series key -> <code class="dcode">time series name, time series category, and time series units</code> of all the available time series in the given session. You must pick one of the time series keys from this output.
Time series data analysis methods
The table below shows information about the available time series data analysis methods. Each time series data analysis method is aliased to a <code class="dcode">"method"</code> name, which is to be used in the <code class="dcode">"data"</code> field of a request label (see creating a time series request label).
Range
The <code class="dcode">"range"</code> method identifies time intervals in a time series that lie within a certain range of values. The following parameters can be configured (default values shown) with the <code class="dcode">"range"</code> method:
You are required to provide only one of the limits, and you can specify whether to the limits should be inclusive or exclusive (by default, the lower limit is inclusive, whereas the upper limit is exclusive). For example, if you are interested in identifying time intervals in a time series that have:
- values between 0.3 and 0.6 (both inclusive): <code class="dcode">{"lower_limit": 0.3, "upper_limit": 0.6, "include_lower_limit": true, "include_upper_limit": true}</code>
- values 0.5 and below: <code class="dcode">{"upper_limit": 0.5, "include_upper_limit": true}</code>
- values above 0.5: <code class="dcode">{"lower_limit": 0.5, "include_lower_limit": false}</code>
The optional <code class="dcode">"duration_threshold_ms"</code> and <code class="dcode">"merge_threshold_ms"</code> parameters are used in the following ways:
- If the duration of the identified time interval (that meets the range condition) is less than <code class="dcode">"duration_threshold_ms"</code>, the time interval is discarded.
- After the <code class="dcode">"duration_threshold_ms"</code> has been applied, we then apply the <code class="dcode">"merge_threshold_ms"</code>. If the gaps between any of the consecutive time intervals are equal to or less than <code class="dcode">"merge_threshold_ms"</code>, then the consecutive time intervals are merged into one.
For example, say the identified time intervals in the specified range are (0 ms, 500 ms) and (500 ms, 550 ms), (700 ms, 800 ms). If <code class="dcode">"duration_threshold_ms"</code> is set to 100 ms, then the second time interval (with a duration of 50 ms) is discarded, which leaves (0 ms, 500 ms) and (700 ms, 800 ms). If <code class="dcode">"merge_threshold_ms"</code> is set to 200 ms, then the two time intervals are merged into (0, 800 ms) since the gap between the two is 200 ms.
Summary Statistics
The <code class="dcode">"stats"</code> method computes the summary statistics of a time series in the time intervals of interest. The table below lists the summary statistics that we support:
You must indicate which summary statistics you would like to compute in the parameters. Using the alias names, provide a list of summary statistics in the following format:
Make sure to check out the section on running the time series data analysis on a group of labels if you wish to compute summary statistics of a time series over multiple time intervals.
Run time series data analysis by adding labels
To run a time series data analysis, simply add labels on the time intervals of interest with the <code class="dcode">"label_type"</code> set to <code class="dcode">"time-series-request"</code>. Additionally, specify the following information in the label's <code class="dcode">"data"</code> field:
- <code class="dcode">"method"</code>: the type of time series data analysis to run. See the section on time series data anslysis methods for more information.
- <code class="dcode">"time_series_key"</code>: the key of the time series you wish to analyze. See the section on retrieving the time series key for more information.
- <code class="dcode">"parameters""</code>: json dictionary of fields specific to the method of your choosing. See the section specific to the method for more information.
Example Request: Add a <code class="dcode">"time-series-request"</code> label with the <code class="dcode">"range"</code> method
Example Response
Run time series data analysis on a group of labels
For time series data analyses such as computing summary statistics, you may wish to run the analysis on multiple merged time intervals instead of being limited to just one time interval. For instance, instead of separately computing two mean values of a time series from 0 to 5 seconds and then from 20 to 30 seconds, you would like to compute one mean value over the merged time series obtained by combining one slice from 0 to 5 seconds with a second slice from 20 to 30 seconds. You can indicate that the time series data analysis should run on multiple time intervals by grouping related <code class="dcode">"time-series-request"</code> labels (i.e., assigning the same label group ID to the labels). As an example, follow the steps below to compute the summary statistics of a time series over multiple (merged) time intervals:
[Step 1] Add one "time-series-request" label with the "stats" method:
Example Request
Example Response
[Step 2] Look up the label you just created with its label ID to obtain its label group ID:
Example Request
Example Response
[Step 3] Add more <code class="dcode">"time-series-request"</code> label(s) with the <code class="dcode">"stats"</code> method as well as the <code class="dcode">"label_group_id"</code> value of the first label:
Example Request
Notice that each label in the label group specifies different <code class="dcode">"metrics"</code>. The union of all the <code class="dcode">"metrics"</code> specified in the label group will be computed, so for this label group, <code class="dcode">["mean", "percentile 25, "percentile 50", "percentile 75", "percentile 90]</code> will be computed.
Example Response
[Optional Step 4] To retrieve the analysis results, see the section on retrieving the computed summary statistics for the label group id.
Retrieve time series data analysis results by fetching labels
If you used a method that ouputs analysis results as labels (see the time series data analysis methods table), then you can check the session Waterfall UI to see if any <code class="dcode">"time-series-result"</code> labels have been created.
You can also look up the labels with a label group ID since a <code class="dcode">"time-series-request"</code> label and its resulting <code class="dcode">"time-series-result"</code> labels are <code class="dcode">"linked"</code> by the same label group ID.
Example Request: Look up an <code class="dcode">"time-series-request"</code> label
First, look up the "time-series-request" label with its label ID to obtain its label group ID:
Example Response
Example Request: Get <code class="dcode">"time-series-result"</code> labels by looking up all labels associated with a label group
Using the <code class="dcode">"label_group_id"</code> value from the previous step:
Example Response
Retrieve time series data analysis results via method-specific API
Method: Summary Statistics
For the summary statistics analysis of time series data, the analysis results do not get output as labels (see the table of the time series data analysis methods). You can instead query the computed summary statistics for a session and optionally a label group.
Example Request: Get the computed summary statistics for a label group
Example Response