Adapting AI Models to Detect Visual UI Issues in Mobile Apps

Mobile is the preferred channel to interact with a brand for 78% of users, and 94% of that interaction happens through mobile apps over mobile browsers. This makes optimizing the mobile app experience critical to business survival. Yet maintaining visual consistency across different devices, operating systems, and screen sizes remains a massive challenge for mobile app development teams.

After nine months of development and countless test iterations, we’ve developed an approach that makes AI visual testing reliable. In this article, we’ll walk through three critical challenges in AI-powered visual testing and how we’ve overcome them at Instabug.

‍

Why Traditional UI Testing Approaches Fail

‍

Mobile teams are intimately familiar with the frustration of trying to maintain visual consistency across the fragmented device landscape. The traditional approaches to UI testing simply weren’t built for today’s mobile reality:

Manual inspection is too time-consuming and labor-intensive to be feasible for any app of decent size.
Snapshot-based regression testing is too flaky and error-prone, requiring significant and continuous maintenance effort.
User-reported issues for UI issues are few, come after the damage is done, and rarely include enough information to debug.

UI inconsistencies like misaligned layouts, incorrect fonts, and missing elements might seem trivial and low-priority, but these papercut issues can quickly build up and cause enough frustration to drive away users. In fact, 73% of mobile users frequently or occasionally stop using apps due to visual and navigational issues alone.

Fortunately, the AI revolution has changed the rules of the game and is currently transforming mobile app testing. AI-powered visual testing promises to enable mobile teams to maintain a flawless UI with minimal manual intervention, promoting a seamless user experience.

‍

The Challenges of Visual UI Testing With AI

‍

AI models have become highly capable, but building an AI system that can reliably detect UI issues across the infinite variations of mobile apps wasn’t straightforward. Here are the three major challenges we had to overcome:

‍

The fragmentation problem

Not only do different applications have unique design patterns and structures, but variations exist even within the same application due to different device sizes, layouts, and OS-specific behaviors. For example, an e-commerce app might display a cropped tile on one device but not on another, leading to potential false positives — a design inconsistency flagged as an issue when it’s an intentional design choice. AI models might mistakenly flag elements as visual issues due to expected OS behaviors (e.g., system alerts).

‍

The false positive problem

Ensuring AI can differentiate between real UI issues and intentional design decisions requires continuous annotation and training to teach the AI model what to ignore. It depends on the AI model’s self-learning capabilities, leveraging feedback loops where developers confirm or dismiss detected issues, refining model accuracy.

Given the vast number of possible UI inconsistencies, the AI model must be both structured and flexible in how it detects and communicates with them. It needs to classify issues into pre-defined categories (such as color discrepancies) while also having the flexibility to recognize and report any unexpected issues that don’t neatly fit pre-defined labels.

‍

The infinite variations problem

AI-detected visual issues, such as layout misalignment, font inconsistencies, color discrepancies, and broken UI elements, can appear in countless variations. Even when broadly categorizing these issues, these problems can manifest in endless ways across different applications, devices, and operating systems. The same UI issue might be described in different ways depending on context, which adds additional complexity to how the AI describes its findings. For this, natural language processing systems were useful and delivered meaningful explanations.

‍

Solutions and Lessons Learned

‍

To fine-tune AI models, we needed to capture high-quality data, ensure responsible AI development, and balance the trade-offs between privacy and performance.

‍

Data quality is critical

Ensuring the reliability of AI-powered solutions in eliminating design inconsistencies in mobile apps is heavily dependent on the quality and classification of data. Because of this, off-the-shelf large language models (LLMs) with broad training data were not sufficient. Instead, we needed to fine-tune purpose-built AI models against specific use cases, curating a dataset of screenshots from real-world apps and adding manual and synthetic labeling to the collected dataset. This makes a huge difference in generating more accurate results.

‍

Balancing functionality and user privacy

Developers must balance the AI model’s need for substantial data with user privacy concerns. Therefore, ensuring user data is obfuscated and protected is a priority, even if it slightly impacts model performance.

For example, private information on the screenshots captured from the end-user application needed to be masked, which shows as a black box. During initial development, the black box was clocked as a visual inconsistency – a false positive. Developers must think ahead to solve issues resulting from obfuscation before deploying AI systems at scale in order to avoid false positives.

‍

Ensuring a flawless mobile UI experience

Training AI systems on both iOS and Android UI patterns is essential. They need to handle large-scale, multi-platform applications with varying UI designs and layouts. Once operational, workflows are triggered automatically once a screenshot of an issue is sent to our backend analysis infrastructure and teams. When an inconsistency is found, such as misaligned text or color mismatches, it’s vital that logged info is properly detailed and spot-checked for accuracy to ensure the AI system consistently receives credible data so the system can continually operate autonomously. Developers must ensure that each screenshot is tagged with session metadata, including timestamps and affected UI components, which allows for quick and convenient issue reviews in session replays, when necessary.

Device and resolution agnosticism are key, given the wide range of options available to users. This allows for a range of dimensions, ensuring the analysis adapts to the screenshot provided and evaluates positioning and sizing rather than pixel-based differences. Additionally, aspect ratio awareness helps to normalize layouts based on device type and orientation, further strengthening the system’s capabilities to avoid false positives.

‍

What’s Next for AI Visual Issues?

‍

Moving forward, we’re exploring the feasibility of running AI models directly on the client device, rather than our servers, for faster, more efficient screenshot analysis and to improve the handling of obfuscated/masked screenshots while maintaining user privacy when handling data.

Automated UI testing, AI-powered design validation, and real-time anomaly detection will soon become standard, and we’re confident that Instabug’s AI Visual Issues is a step in the right direction.

As you build your mobile testing strategy, remember that succeeding in the mobile app battleground is no longer about growth; it’s a matter of survival. Every pixel matters in the cut-throat world of mobile apps.

Want to try our latest AI features?

Thank you! We'll contact you once a spot opens up.

Oops! Something went wrong while submitting the form.

Learn more:

‍

Instabug empowers mobile teams to maintain industry-leading apps with mobile-focused, user-centric stability and performance monitoring.

Visit our sandbox or book a demo to see how Instabug can help your app

Blog & Resources

Adapting AI Models to Detect Visual UI Issues in Mobile Apps

Why Traditional UI Testing Approaches Fail