Tutorial: Extracting data from a web page and saving it as an excel file

(v6 of the guide - updated script according to Robin v 0.9.2)

GitHub: Robin examples: Web Data Extraction and Storing in Excel file


By utilizing UISpy we can extract various information not only from our desktop/apps but also from the web.
All (if not most) of web related actions are part of the WebAutomation module.
I am going to use the URL burque505 tried extracting data from.

Let’s begin…

Go to this website.
Then launch UISpy.
Click on Add Control.
Position your mouse over an element of the United States table.
Hold Ctrl + Shift and scroll down until you have selected the table element.

Then hold Ctrl and left click in order to capture the window element. Hit done.


Note that I have renamed the .appmask file, the application name, the window name and the selector for convenience.

Click on Edit Selectors option.


Note: You can always modify your selectors through the selector builder.
The default selector will be sufficient in most cases, such as this one.
The best approach is to first experiment with the default(initial selector generated by UISpy) and then modify it according to your needs.


Just to showcase the mods, I have included the whole path (from the body and downwards), and excluded all the class attributes except the one belonging to the table element at the bottom.
If you make any changes you can save them by clicking Update.
To follow the guide step-by-step leave the selector as it was generated.


Open Robin’s editor.
When using .appmask files we have to import them.
It is mandatory to use the import command at the top of your Robin script.
image

Now we have access to the selectors we have created.

To reference a selector the syntax goes as follows:
[name of the .appmask file].[name of the application you are extracting data from].[screen/window name].[control name]

In case you want to reference it multiple times in your script it is a good practice to store it in a variable.
image

Create a variable to hold the URL and then launch a browser instance.
image

Use the WebAutomation.LaunchChrome action and pass the Website variable as an input:
image

After that we can use the .ExtractHtmlTableInExcel action to extract the table we want and write the retrieved data to an excel instance.
image

Finally we save the excel file, terminate the excel application and close the browser.

By navigating to the folder you declared, you will see that a new .xlsx file has been created.
image

Open it and you will see the extracted data.

Here is a screenshot of the whole Robin script:


James Papadimitriou
Technical Evangelist

3 Likes

James, thank you very, very much! I was able to exactly duplicate your results, both on Windows 10 at 96dpi and on Win7 at 120dpi (although the selector outlining is skewed at 125% zoom, it works fine, both for UI automation and web automation). The tip about Ctrl-Shift + scroll for selecting is priceless - exactly what I needed.

I got the same duplication of headers shown in your example. Do you know if that might be avoided by tweaking the selector, possibly? As long as it’s consistent, it doesn’t really matter as the rows can be deleted either manually or (as I will attempt) by the robot, I suppose.

EDIT: This seems a little kludge-y, but definitely removes the offending rows :grinning:.
I put a delay in to make sure I could see it working, works fine without it. I’d be inclined to leave a small delay in there to make sure Excel keeps up.

Best regards,
burque505

Yes burque505.
The header is written in the excel multiple times due to the configuration I made to the selector.
It is not mandatory to modify the selector every time.
It depends on the case. Many times will be sufficient by default, just the way it was generated by UISpy.
This particular case is one of them.
So the best approach would be to first go with the default extractor and then modify it based on the result/need.

Generally speaking, based on the situation and the result you need, you might need to modify the extracted data (by meddling with the excel file, the list, whatever you might have stored the data into).

I am updating the guide.
Thanks for the feedback! :slight_smile:

Best regards,
J.


James Papadimitriou
Technical Evangelist

1 Like

Added link to GitHub repo.
There you can find the source code as well as the utilized .appmask file.
Note: You should modify the directory of the .appmask file for the example to work properly.

Best regards,
J.


James Papadimitriou
Technical Evangelist

1 Like

Updated script according to Robin v 0.9.2 changes.

Best regards,
J.


James Papadimitriou
Technical Evangelist

1 Like