This is the second in a two-part series on interacting with screen regions that match a certain image in Appium. If you haven't read Part 1 yet, do that first!
In this edition, we'll take a look at some of the advanced usage techniques that go along with Appium's "find by image" feature, made available as the -image locator strategy. Why do we need any "advanced" techniques? The trouble is that image recognition is a bit complicated, and lots of things can get in the way of a successful image match.
For example, you could send in a reference image that's scaled larger than the size of the device screenshot. This would cause the image match algorithm (implemented in the OpenCV library) to blow up (because how can you look for a reference image inside a screenshot if it's bigger than the screenshot?). Or what if the element you found via image has changed position in between the time you found it and the time you initiated a tap command on it? You'd end up tapping a screen coordinate which no longer forms part of your element!
In each of these cases, Appium can run some logic to help you out (for example, downscaling reference images for you, or automatically re-finding an image element when you attempt to tap it, so that its position is updated if necessary). Appium could just do this on its own, delivering the most robust and reliable image finding experience without you being any the wiser about its magic. The problem is that each of these "fixes" requires Appium to spend potentially a considerable amount of time working or talking to the automation engine. In the interests of speed, only basic image element finding functionality is turned on by default. So let's look at the options available to us if, for whatever reason, that basic functionality is not enough.
Each of these options are made available through the Appium Settings API. This is an API only available in Appium (not Selenium) which enables the toggling or resetting of capabilities mid-test. It's used for the same type of parameters as Desired Capabilities, except it allows the Appium client to update the settings at any time and as many times as you like. In the Java client, the Settings API is hidden behind the HasSettings interface. If we have an AndroidDriver or IOSDriver object, it will implement this interface and expose the setSetting method. (If we have an AppiumDriver, on the other hand, we'll need to typecast to HasSettings first).
Usage is extremely simple:
Basically, we provide a setting name (actually, an element of the Setting enum), and a value. Let's take a look at how we can use this to module image element finding.
Change the image match threshold
With OpenCV, image matching is not a binary outcome. Instead, there are degrees of match, on the scale between 0 and 1. The scale itself is arbitrary, but 1 represents a pixel-for-pixel perfect match, and 0 represents no comparability whatsoever. With a little bit of experimentation, Appium's default match threshold has been set to 0.4. This means that, by default, any match attempt which results in a similarity measure of less than 0.4 will be rejected.
For whatever reason, you might wind up in a situation where 0.4 is too stringent (not letting you find your element), or not stringent enough (giving you matches even when your element is not present). You can adjust it using the IMAGE_MATCH_THRESHOLD setting:
How do you know what value to put? You'll have to experiment, because the scale itself is arbitrary (and non-linear besides). Luckily you can change the threshold for different image element find operations using the Settings API as shown above.
Change the image element tap strategy
Once you've found an image element, and call element.click() on it, how does Appium know how to tap the element? Without magic, unfortunately. Appium simply takes the bounds of the matched screen region and performs a tap on the center coordinate of those bounds. Of course, Appium has a couple different ways to tap at a point, for example using the W3C Actions API or the older Touch Actions API (see the beginning of this Appium Pro edition for an explanation of the history behind these two APIs).
If Appium's default tap strategy (using W3C Actions) does not work for you (say because you're using a driver which has not been updated to support the W3C Actions API), you can always fall back to the older API, using the IMAGE_ELEMENT_TAP_STRATEGY setting:
(The two valid options are "w3cActions" and "touchActions")
Fix screenshot and device size mismatches
Appium's image match algorithms operate on two images: a base image (the one we will attempt to locate the element within), and a reference image or template (the one corresponding to the element we are trying to find). You don't have to worry about the base image: Appium just uses a screenshot from your device, since it represents what's happening at the moment on the screen. But what if the base image (screenshot) and the screen itself don't have the same dimensions? Then the match coordinates returned by the match algorithm would correspond to the screenshot, but not the device. Unfortunately the device is where the eventual tap happens, which means the tap would be happening somewhere other than intended.
Why would a screenshot not match the dimensions of the screen itself? For a variety of reasons, not least because of the way pixel scaling is handled across platforms. iOS, for example, will say that a screen is 375 pixels wide ("logical" width), then happily generate a screenshot which is 750 pixels wide ("Retina" width)!
Getting these dimensions matched correctly is so important that Appium does it by default. If, however, you'd prefer not to have Appium spend the CPU cycles making this happen, you can opt out:
Fix the reference image size
As mentioned above, it's necessary for the reference image (template) to have a size smaller than the screenshot for the algorithm to work. Sometimes, when we're generating our reference images (say by snapping a manual screenshot of an element), we're not entirely sure of the dimensions of our template. It could be that the way we captured the template has resulted in its having dimensions larger than what Appium has determined the screenshot dimensions to be.
If you want to be sure that you can use your template image, no matter what size it happens to be, let Appium know it's OK to resize it for you with this setting:
By default, Appium doesn't do this for you because it could hide valuable feedback about potentially incorrect templates (not to mention that it takes some computational time and energy to do the resizing).
Check for image element staleness
Something you might be familiar with from the Selenium world is the concept of a "stale element". This is an element which, in the time between its being found and its being interacted with, has somehow disappeared. In this case it's not possible to complete the desired interaction (tap, send keys, etc...), and so a StaleElementException is thrown.
The same kind of situation can happen with image elements. What if the element (represented by a set of coordinates returned by the image match algorithm) has gone away in between finding it and tapping on it? Because this would result in Appium tapping on coordinates that no longer represent your element, it attempts to verify the match again whenever a tap is requested. If the match is once again successful, and returns the same coordinates as before, then the tap will take place. If the match is not successful at all, or the coordinates differ, then a StaleElementException will be returned instead.
If you'd prefer to get a bit of a performance improvement and do away with this safety check, you can always turn it off:
Refresh elements automatically
By default, Appium will let you know with an exception if an image element is stale. But what if the element has not disappeared, only changed position? Maybe it's OK to simply tap on it at its new location without further ado. If you'd prefer to have Appium automatically determine a new match location for image elements when you request a tap, you can use the UPDATE_IMAGE_ELEMENT_POSITION setting:
This is set to false normally, to preserve the normal situation where you're notified if something has changed in between a find and a tap.