GenAI Assertions are a great option for validating more complex aspects of your application, including text and visual details. However, since generative AI is non-deterministic in nature, it is important to write an effective assertion description that yields valid and reliable results.
This article offers best practices that you can apply when creating and testing out your GenAI Assertions:
- Write a clear and concise assertion description
- Include examples
- Test a few passing and failing scenarios
Mabl’s generative AI capabilities are built on top of Google Cloud’s enterprise AI tools. Neither mabl nor our service partner, Google Cloud, use customer data for training these models. If you have any concerns about the use of generative AI in mabl, please reach out to your customer success manager.
Write a clear and concise assertion description
To increase the likelihood that a GenAI Assertion behaves as you expect it to, make sure your assertion description is clear and concise. A good assertion description is sufficiently strict without being overly specific.
If your assertion description is too vague, the test can pass when in fact a regression or failure has occurred. For example, if you assert that the page has loaded
, the assertion could pass because the page loads without any significant errors, even if the page is actually missing components that are critical to its purpose, such as a submit button on a form.
On the other hand, too many specific criteria or complex combinations can be difficult to evaluate correctly. If necessary, break down a complex assertion with many different cases or criteria into multiple GenAI Assertions.
If you ask that a table is sorted, the models used for GenAI Assertions have a tendency to lean toward "strict" ascending or descending order. If there are duplicate values in the table, the assertion will fail. In this situation, you should write a more explicit assertion description that specifies that the sorting can include duplicate values.
Include examples
Add examples to the end of the assertion description to ensure that the test focuses on the correct details:
For example, the assertion should fail in cases like…, …, or …. The should pass in cases like…, …, …
like … or …
such as … or …
Test a few passing and failing scenarios
Since generative AI is non-deterministic in nature, the results of a GenAI Assertion can vary. To build a more reliable assertion, test a few passing and failing scenarios and iterate on the assertion description if the result isn't satisfactory.
Imagine you want to assert that a page is in Japanese. The following examples show how you can iterate on the assertion description based on the results:
Assertion description: "The page should be in Japanese." |
Result: "FAIL - While most of the page was in Japanese, some words, including 'New', 'FRESH', 'latest', and 'webhook', were in English. |
The assertion description didn't take exceptions into account, so you revise the assertion description to exclude the English words returned in the result:
Assertion description: "Verify that the page is in Japanese except for 'NEW', 'FRESH', 'latest', and 'webhook'." |
Result: "FAIL - While most of the page was in Japanese, there are other English acronyms on the page, such as 'URL'." |
With this feedback, you decide to generalize the exceptions to categories that are allowed to appear in English:
Assertion description: Verify that the page is in Japanese and does not have text in English other than words that are brands, names, acronyms, and words with no common Japanese translation and some small badges with short, common English words like "NEW." |
Result: "PASS - excluding the exceptions, the page is in Japanese." |
Success! You decide to test the assertion on a page which has an untranslated "submit" button. The assertion should fail, but it doesn't. Based on the result from this failing scenario, you update the assertion description again with examples:
Assertion description: Verify that the page is in Japanese and does not have any buttons, links, or significant chunks of text in English other than words that are brands, names, acronyms, and words with no common Japanese translation and some small badges with short, common English words like "NEW". Some examples of allowed English that should pass the assertion: "latest" with a different background color than the surrounding text would be considered a badge; "Mt. Everest", "John Doe", "JavaScript", and "Pepsi" would be names and brands that do not need translation; "URL", "API", "E2E", and "webhook" would be acronyms and words that do not need translation. Some examples of English that should fail the assertion: "Violation history", "Submit", "Cancel", "OK", "name", and "result" are all English words or phrases that should have been translated to Japanese because they do not meet the allowed English exception criteria. |
Result: "FAIL - Most of the page is in Japanese, and most English words on the page fit the assertion criteria. However, the page includes a button with the English word 'submit', which does not fit the assertion criteria." |
Success! The assertion fails, which was the result you wanted. Iterating on the assertion description and testing it out in passing and failing scenarios helps ensure that the assertion helps you accomplish your testing goals.
In browser tests, to simulate a failing scenario, you can use Chrome DevTools to manually update the page.