AI in exploratory testing - Xray Blog

Written by Beatriz Biscaia | Jul 3, 2025 3:39:10 PM

AI is more than just a buzzword now - it's becoming an integral part of various processes, including software testing. But how effective is it really, especially when applied to the dynamic nature of Exploratory Testing?

In this webinar, Sérgio Freire stated that he recently experimented with leveraging AI in his own Exploratory Testing sessions and discovered both promising applications and significant limitations. Here's a deep dive into his journey, along with practical insights on using AI to enhance (or not) your testing efforts.

The experiment: integrating AI into exploratory testing

The goal was to see how AI could support and streamline exploratory testing - a testing approach where testers simultaneously design and execute tests. Exploratory testing emphasizes freedom and creativity, which is important for uncovering unexpected issues.

The setup: combining AI tools

It was decided to test an AI-assisted workflow using a combination of tools:

PDF AI Assistant: to summarize the test session reports stored in PDF format;
OpenAI whisper speech recognition: to convert audio notes taken during testing into text;
Custom Python scripts: these scripts integrated various data sources (like notes, screenshots, and PDFs) to create a comprehensive summary of each testing session using OpenAI’s GPT-4 API.

The plan was to process data from multiple sources:

Audio notes were transcribed into text;
Screenshots were analyzed for context;
PDF reports were parsed using regular expressions to extract session metadata.

All of this data was fed into a large language model (LLM) to generate high-level summaries of the testing session, offering a snapshot of what happened, the problems identified, and key takeaways.

But, what are LLMs?

Artificial Intelligence is a broad field encompassing various technologies designed to simulate human intelligence. Within this domain, Large Language Models (LLMs) like GPT (Generative Pre-trained Transformer) are specialized AI models focused on understanding and generating human-like text.

Key capabilities of LLMs include:

Content creation: generating articles, code snippets, or even test cases;
Summarization: extracting key information from documents;
Translation and rephrasing: converting content across languages and formats;
Multimodal interaction: Integrating text, audio, images, and videos to respond to complex queries.

LLMs can be incredibly powerful, but they also have limitations, such as the potential for incorrect or biased outputs, inconsistencies in responses, limited context size, and retraining needs.

They operate on probabilities, predicting the next word based on previous inputs. However, this means they can sometimes "hallucinate": produce plausible but incorrect information. This means the need for careful human oversight when using LLMs is present.

During this phase, Sérgio also introduced the concept of the Learning Lollipop, a model used during testing to cycle through questioning, test design, execution, and analysis. AI can support this model by:

Questioning: Generating questions to explore ambiguous requirements;
Designing: Providing potential test cases based on identified risks;
Analyzing: Highlighting unexpected results or anomalies during test execution.

The findings: what worked, what didn't

The output was mixed: while AI showed potential in some areas, there were inconsistencies that underscored the need for human oversight.

1. Summarizing test sessions

Sérgio used AI to generate summaries of my testing sessions, hoping it would provide a quick overview of what transpired. Interestingly, the summaries varied significantly, even when the input data was similar. For instance:

Three different testers approached the same application using distinct styles—some relied more on text notes, others on screenshots, and some on audio recordings;
The AI-generated summaries differed in quality, depending on the format and style of the input. A minor tweak in the input (e.g., changing a few words in the prompt) could drastically alter the summary, leading to inconsistencies;
Lesson learned: While the AI-generated summaries were useful for a high-level overview, they weren't reliable enough for making critical decisions without a thorough review.

2. Drafting bug reports

The idea was to turn the AI summaries into structured bug reports, which could potentially save time on documentation:

AI struggled with consistency in generating bug reports. Some outputs were detailed and accurate, while others missed critical context or produced irrelevant information;
The quality of the bug reports varied greatly depending on the input style and detail level of the session notes;
Conclusion: AI can assist in drafting bug reports, but it still requires human validation and refinement to ensure accuracy.

3. Assessing the value of AI in Exploratory Testing

The biggest takeaway from this experiment is that while AI can enhance some aspects of Exploratory Testing, it’s not a silver bullet:

Ambiguity: AI struggles with ambiguous descriptions - a common challenge in Exploratory Testing. It’s particularly evident when processing user stories or vague requirements. For instance, AI might interpret the phrase "box" differently depending on the context, leading to misaligned expectations;
Generative AI for test cases: the output often needs refining to fit the specific context of your application. The generated steps can be inaccurate or irrelevant for highly detailed test cases due to the AI's lack of deep knowledge.
Bug report automation: AI-generated bug reports may not always align with the actual findings from the testing session, necessitating a thorough review to ensure their validity.

You can use the Xray Exploratory App for session-based testing:

Capturing screenshots, notes, audio, and videos during a session.
Exporting a session summary (PDF format) with rich details, including annotated screenshots and timestamped notes.

Real-world use cases for AI in testing

Use case 1: enhancing requirements analysis

One of the biggest challenges in software development is dealing with ambiguous or incomplete requirements. AI can help by:

Generating clarifying questions using heuristics like "What", "Where", "Why", "When", "Who", and "How";
Example prompt: "Analyze this user story and identify potential ambiguities around security and accessibility."

This technique can surface hidden assumptions, reduce misunderstandings, and ultimately lead to higher-quality software.

Use case 2: generating Test Charters

New team members or testers may struggle with generating diverse test ideas, especially under tight deadlines. AI can be a valuable partner here:

Prompt: "List test Charters focusing on security and accessibility for a login feature."
Output: AI provides actionable Charters like exploring case sensitivity, handling special characters, and ensuring accessibility compliance.

Use case 3: automating documentation

Documentation can be tedious, but AI simplifies this by drafting:

Bug reports with clear steps to reproduce;
User stories or scenarios for behavior-driven development;
Summaries of Exploratory Testing sessions for stakeholder reviews.

Key takeaways: AI in Exploratory Testing

From Sérgio’s experience, here are some key considerations if you’re thinking about using AI to support your Exploratory Testing efforts:

Be critical and cautious: AI tools can be great for brainstorming or generating initial drafts, but they should not be blindly trusted. Always review the AI outputs critically;
Human oversight is essential: AI can assist in summarizing sessions or drafting reports, but human expertise remains crucial to validate and refine the results;
Tailor AI usage to specific needs: rather than applying AI broadly, focus on specific use cases where it adds value. For example, using AI to generate test case ideas can be a good starting point, but detailed, context-specific test cases still require a human touch.

The future of AI in testing

The integration of AI into Exploratory Testing isn't about replacing human testers but rather augmenting their capabilities. By leveraging AI for routine tasks, testers focus on the critical thinking and creativity that drive quality software.

As AI technologies continue to evolve, we may expect even more sophisticated tools that empower testers to shift left, improving quality from the earliest stages of development.

View full post