Appium Pro is normally all about... well, Appium! And other mobile testing related topics. However, in this post we're going to discuss an exciting development in AI in the world of Selenium, Appium's web-based forebear. Read on--I think you'll get something out of this even if you're focused purely on mobile testing.
For some time Appium has been experimenting with AI/ML approaches to augmenting mobile test automation. In addition to its visual testing capabilities, there is also a special plugin for finding elements using ML models (even when all you have is a screenshot). Part of what makes these features possible with Appium is the fact that it is possible to write plugins for Appium that integrate with these various other projects.
Check out: Appium Mobile Testing - All You Need to Know About
I've often wondered how we can do the same thing with Selenium. Unfortunately, Selenium's architecture is not quite set up for third parties to write plugins that take advantage of behavior in the Selenium server itself. That doesn't stop us from writing client-side "plugins" that have access to the driver object, though!
AI-based element finding for Selenium
How do we create a client-side plugin for Selenium? Basically, by putting together a library which takes an existing Selenium session (a driver object) and uses it for its own purposes. In our case, this special library will have access to the Test.ai classification model that already exists as part of the Test.ai + Appium classifier plugin. This plugin was originally developed to give Appium users access to the classification model via the -custom locator strategy. The advantage of this approach was precisely that it was the Appium server being augmented--all the work could be done in one language (Node.js) and made available to every client library with minimal modifications.
Also check: A complete guide to Selenium testing
In the case of Selenium, the equivalent work would have needed to be done as an extension to each client library. That was way too much work! So instead, we extended the capabilities of the existing Appium classifier plugin, so that it could also act as a classification server. This approach (very much akin to the client/server architecture of Selenium and Appium themselves) keeps the heavy lifting in one place and allows very thin clients to be written in every language. The only downside is that you have to make sure to have the classifier server up and running.
The Classifier server
If you already have the test-ai-classifier package installed via NPM, no extra install steps are necessary. Otherwise, npm install -g test-ai-classifier. Then, running the server is quite simple:
With no arguments, the server will start up on localhost, port 50051 (the default for gRPC-based services). Of course, you can always pass in -h and -p flags with custom host and port information (using 0.0.0.0 for host if it's important to listen on all interfaces).
Recommended Post: Facts to Note About Using Appium for Desktop Applications
The Classifier client
Once you've got the server running, you need to decide which client to use. There are four available:
- Java client
- Python client
- Node client
- Ruby client
We'll use the Java client for our purposes. To get it included in you Gradle-based Java project, the easiest thing to do is use Jitpack, and then to include a directive like the following, to get the client downloaded from GitHub:
There are a few different ways to use the client, including the ability to pass image data to it directly, outside of the context of Appium, Selenium, or anything else. Either way, the first thing we need to do is instantiate the client:
The only parameters are the expected host and port values. Of most interest for us in terms of what we can call on classifier here is the method findElementsMatchingLabel, which takes two parameters: a driver object and a string representing the label for which we want to find matching elements. Have a look at this example:
In this case, we're looking for any elements that look like a Twitter logo. Notice that the return value of this method is exactly what you'd expect--a list of standard WebElement objects. You can click them, get their attributes, and anything else you'd be able to do with a regular element.
Also check: Automating Mobile Web Browsers with Appium
How does all this magic work? Well, the Classifier client runs a special XPath query that attempts to find any leaf node element, and then directs the browser to take a screenshot of each element, all on its own. From these screenshots, the client has all the image data it needs to send over to the Classifier server, which sends back information about the strength of any matches. The client can then map these results to the elements it found via XPath, filter out any which don't match the requested label, and return the rest to you!
What this does mean is that any browser driver you use will need to support the "take element screenshot" command. In my experimentation, only Chrome was reliable enough to not fail in weird ways when asked to take screenshots of so many elements. This API is relatively new, so I expect we'll see better reliability from Safari and Firefox (the only two other browsers I tried) soon enough. At any rate, take a look at the full code sample below, which demonstrates how we can load up a webpage, find an icon using only its semantic label, and then interact with it: