Image and text recognition make up the backbone of automating virtual desktop applications. This article covers some of the basics of working with image and text recognition. We'll also present best practices and solutions for tackling some of the challenges inherent to image and text recognition.
Leapwork comes with two categories of building blocks for automating with image and text recognition:
How does Leapwork image recognition work?
Image recognition is the "art" of finding one image within another image. Typically we will have one image that is defined at design time (captured into our Leapwork automation flows) and one image which is a screenshot of the actual application when the automation flow is running. What Leapwork will do when the automation flow is running, is to look for the captured image in the screenshots and act according to the defined flow.
Technically, image recognition compares a matrix of numbers with another matrix of numbers and returns if the first matrix is part of the second matrix. One of the challenges is that the two matrices can change if the screen resolution changes. E.g. if the automation flow is executed on another machine or the resolution has changed, then the accuracy in finding the captured image in the screenshot can decrease, which can lead to less robust automation flows.
Image recognition looks for an image within an image, or a matrix within a matrix.
How does Leapwork text recognition work?
Text recognition is based on pattern recognition, which means that Leapwork searches an area on the screen for a pattern that matches letters. Letters can come in different fonts, colors, and sizes, and a text's background can be an image or a gradient pattern which makes it harder to recognize the actual letters and numbers on the screen.
In the following, we'll present best practices for handling common challenges related to working with text and image recognition. The best practices address:
- How to capture icons when backgrounds change
- How to find images that move
- Defining part of the screen as an area of focus to speed up the recognition process
- Using image collections for more robust recognition
- Remote design and execution
- Adjusting the precision of image recognition
- Configuring the OCR engine for better text recognition
Following these tips will significantly improve the quality of our automation flows that rely on image and text recognition.
Capturing icons when backgrounds change
The background color behind an icon can change, so don’t include parts of the background when capturing an icon.
A “hover” effect can change how the icon looks when hovered by a mouse pointer, for instance showing a brighter or darker version. This can usually be handled by closing all open windows as part of the test run, by setting the 'Action' property on the Start building block to "Close all windows".
A “selected” or “opened” effect can change how the icon looks when selected. For instance, a Chrome icon in the Windows taskbar looks different before Chrome is opened compared to when browser instances are already open.
This can typically be solved by using the Image collection feature (see later) in two scenarios:
- No browsers open:
- At least one browser open:
Finding images that move:
One situation that can occur for all types of applications is that an image is shown first in one place and is then moved to another. For example, on some websites, all resources are first loaded into the page and are then "boot-strapped" into position. Another example could be a dialog box in a desktop application that is shown and then centered on the screen.
In both cases, Leapwork can find the image in the initial position and then continue the test flow. However, if the image changes position as part of the application the automation flow will fail.
Checking the “Await no movement” property on the Click image building block solves this problem. This will tell the image recognition engine to wait until the screen has not changed for a period of time before starting to search for the image.
Defining part of the screen as an area of focus
For image recognition - and especially for text recognition - it is best practice and highly recommended to use "Areas". An "Area" is a sub-section of the entire screen and is used to tell the image/text recognition engine to limit its search for the captured image or a specific text/text pattern in the specified area. Typically we will define an area at the part of the screen where we expect the image or text to appear, including some margin.
Specifying an area has two main purposes:
- We ensure that we are looking for the right instance of the captured image/text. If the word appears multiple times on a screen we could get a list of the occurrences instead of the "right" one.
- The speed of execution is considerably higher if the Leapwork image/text recognition engine only has to search a fraction of the screen instead of the whole screen.
More information about using and defining areas.
Image collections
The Image Collection feature allows us to capture two or more images into a collection and then use the collection when searching for an image. This means, for example, that we can capture the same button in different states (no focus, in focus, hovered, pressed, etc.), add all the captured images into one collection, and then just have the automation flow click or find the button regardless of the state of the button. This increases the robustness and tolerance for changes in inflow.
In the example below:
We have captured the search button - "Go" - from a Windows desktop application.
The button can have four different looks depending on the focus and hover effect:
All four states have been captured, and the images are now located as resources under the flow in the asset menu:
In the example above, the images are renamed to make it easier to identify the images. Hovering an image in the asset menu will popup a thumbnail view of the image.
To create an Image Collection:
Click "New" + "Capture" + "Image collection".
This will create a new, empty Image collection in the Asset Menu. It is also possible to simply right-click the folder where the Image Collection should be located and select "Capture" + "Image Collection".
Image Collections can be identified by this logo in the asset menu:
Once added, it is best practice to rename the image collection to something meaningful to make it easier to maintain and reuse the image collection across multiple flows.
Adding images to an Image Collection is really simple: Just drag-n-drop images from anywhere in the asset menu on top of the image collection. To view the images in the collection, double-click the collection to open the "Edit image collection" dialog.
In the dialog, it is possible to edit and change the images individually if needed.
We can now use the collection in a building block by dragging the collection onto the image field in the building block:
When the Find Image block is executed, it will search the screen for the images in the collection one by one. If it finds one of the images, it will click it and then stop the search and hand over the execution to the next building block in the flow.
The image collection can also be used to handle different resolutions of the same icon/image if we know will run the same automation flow in different resolutions. It can also be used to handle different states of icons.
The image resources are shared within a project, so the collections can be used in multiple automation flows. This means we can create e.g. a "Chrome icon" collection that contains all relevant states of the Chrome icon in the Windows taskbar, and then use this collection across all automation flows that operate with Chrome. This comes with the bonus that we only have to maintain the image collection in one place instead of in all the automation flows.
Remote design and execution
A typical setup of Leapwork consists of a number of workstations with Leapwork Studio installed, a Controller installed on a common/shared server to make sharing easy, and then one or more machines entirely used to execute the automation flows. When automation flows using image and text recognition run, they will interact with the actual screen. This means that if we run image and text recognition on our local machine we can't work on it at the same time. This is the reason for using "remote machines" for running automation flows.
To make our automation flows independent of differences in the screen resolution between machines where the flows can be executed, we can define an Environment pointing to a "remote machine". We can then use the "remote machine" to capture images instead of our local workstation. This way we will end up capturing images directly on the machine where we will execute the automation flow, securing that the screen resolution is always the same.
To create a "remote machine" we need to install the Leapwork Agent on a dedicated workstation that is accessible from both Leapwork Studio and the Leapwork Controller. Once the remote machine is up and running we can define an Environment in Studio pointing to this machine. we can find more info here.
When the environment is created we can select it in the 'Preview environment' on the design canvas. In the example below, "Amazon Cloud Remote" is an environment pointing to a cloud-hosted (Amazon) server where the Leapwork Agent is installed.
When the 'Preview Environment' points to a remote machine, a "terminal" window will pop up when we capture new images, allowing us to capture directly on the remote machine instead of on our local machine.
Adjusting the precision of image recognition
The building blocks using Image recognition have a property named Precision. This configuration is accessible by expanding the building blocks. The Precision property has two sub-properties:
- Pixels: The level of tolerated accuracy in image recognition.\
- Color: This property specifies the sensitivity to changes in color density. The color density of the same set of pixels can change due to the hardware used.
In this section, we can set the accepted level of accuracy for image recognition. Default is "Pixel perfect" which means that there has to be a perfect match, pixel by pixel before the captured image is considered found on the screen. In some cases, a higher level of tolerance is needed. The advice is to start with 'Pixel Perfect' for both properties and then change them one level at a time until the image recognition works as intended.
Configuring the OCR engine
For the building blocks using OCR (text recognition), we can change the settings for the OCR engine to optimize how the characters are recognized.
Choosing an engine
We can choose between two different built-in OCR engines in the building block configurations:
- "OCR 1.0": This is based on Tesseract version 3.5 which is an open-source engine used by literally all OCR engines.
- "OCR 2.0": This engine is based on Tesseract version 4.0 which uses a neural network architecture (LSTM) to optimize the engine. This architecture is considered to be the future within all types of recognition software (images, speech, video, text, etc.)
- "ABBYY": This is based on ABBYY version 12 which is a world-leading OCR engine.
Both "OCR 1.0" and "OCR 2.0" are working engines, but because of the different technologies, one engine might be a better fit for some applications. In case the OCR building blocks are not behaving as expected, one option is to try to change to the other engine.
In case the built-in OCR engine in Leapwork is not matching our requirements, it is possible to change the engine to ABBYY.
ABBYY is the world-leading OCR engine, but this requires a separate ABBYY license. Also, be aware that ABBYY itself requires some infrastructure work to be set up, so in most cases, the built-in engines are the best option.
Contact our Priority Support to get started with ABBYY.
Choosing OCR Mode:
We can choose between two different OCR Modes. In short, it's a choice between speed and quality.
- Fast Speed: The OCR engine performs two recognition runs in parallel: One in a normal color scheme (black text on white background) and one using inverted colors. This mode is faster than the "High quality" setting, so if the characters are found correctly, simply keep using this setting.
- High Quality: The OCR engine performs four recognition runs in parallel: Two in a normal color scheme and two in inverted colors. This setting is slower than the "Fast speed" setting but might be required if the OCR engine is not returning the characters correctly.
Adjusting OCR precision levels:
OCR precision sets the accuracy of the OCR results on a character level. This means a higher OCR precision level requires higher confidence in the OCR engine before a certain character is matched.
With high precision, we can be very confident that the characters found are the correct characters.
On the other side, high precision can result in some characters are not found. Setting a lower precision means that, in general, more characters are found, but the assurance that it’s the right characters is lower than with high precision. So, the right setting is a balance between finding all the right characters and not include too much that will pollute the results. The right setting will depend on the font, colors, background, and size of the text.
The precision can be set on a scale from 0 to 100. 0 will return everything that was recognized by the OCR engine and 100 will return the best possible recognized result.
The default Precision Levels are:
- High: This is the highest Confidence factor or precision in which the user is sure that the character is large and visible enough (not hazy or compacted) to be recognized by an OCR engine. The predefined value is 70.
- Medium: This is the medium Confidence factor that users can opt for when they think the character may or may not be recognized by an OCR engine, so they set this. This tells the engine to search look for the possible characters in the defined area. The predefined value is 50.
- Low: This is the Low Confidence factor that users can opt for when they are less sure that the character can be recognized by an OCR engine, so they set this. This tells the engine to search for relatively possible characters in and outside the dictionary in the defined area where the precision to identify is low. The predefined value is 30.
- Very Low: This is the Lowest Confidence factor that users can opt for when they are least sure that the character can be recognized by an OCR engine, so they set this. This tells the engine to search for relatively all possible characters in and outside the dictionary in the defined area where the precision to identify is least. The predefined value is 20.
- Custom: This can be used to set the custom Precision value/Confidence factor. It is ranging from 0-100.
For any clarification, please contact our Priority Support.
Comments
0 comments
Please sign in to leave a comment.