Much of automated UI testing relies on finding elements by their text-based attributes, such as selectors and classes. However, sometimes these attributes aren’t enough to reliably and accurately interact with certain types of elements, such as canvas elements, PDFs, maps, or elements that lack unique attributes.
With visual find, you can leverage the power of AI to tap or click on a target element based on its visual characteristics instead of relying on text-based attributes in the DOM or page source.
Availability
Visual find is available for click steps in browser tests and tap steps in mobile tests:
- Click steps: visual find for click steps is now available in early access. To enable it for your workspace, visit the Labs page in the mabl app: Settings > Labs.
- Tap steps: visual find for mobile tap steps is already generally available.
This article explains how to train visual find steps that tap or click on a target area based on visual characteristics.
Add a new step
After launching your application in the mabl Trainer, get your application in the correct state and add a new step:
- Mobile tests: + (Add step) Tap
- Browser tests: + (Add step) Click
Select the target area
The step configuration menu selects “Visual area” by default. Click and drag to draw a box over the area you want to target.
mabl automatically sends a screenshot of the target area to the model, which sends back an AI-generated description of the selected area. mabl performs the find using the GenAI description to confirm that it finds a match.
For mobile tests, the image preview shows where mabl will perform the tap step in the selected area on execution. If you need to target a precise location within the selected area, you can adjust the x-y coordinates.
Review the description
If the description fails to highlight the correct element, reselect the area or manually edit the description and test the find again.
Even if the description highlights the correct element, you should also ensure the description has the correct intent. Depending on what you’re trying to achieve, the model might generate a description that accurately highlights the correct element during training, but doesn’t capture your ultimate intent.
To illustrate, consider an app with many grid elements.
The model generated the description “A turquoise rounded square tile with the text ‘Car 10’ in the center”, which highlights the correct element. However, if the intent is to target a tile at a specific position on the page, you should manually update the description to reflect this intent - “The second square from the left, third row from the top” - and test it out.
Save the step
When you are satisfied with the step, click Save to add it to your test.
Running tests with visual find
On execution, mabl sends the GenAI description and the screenshot at run time back to the model, which returns the bounding boxes and performs the tap or click action.