Extract the data(in tabular form) from web page to Excel

Hi,

I referred this link (Tutorial: Extracting data from a web page and saving it as an excel file), it’s working properly. Here data is present in

element. but I need to extract the data from the below link where data present only in
element (https://www.ia.org.hk/en/supervision/reg_insurers_lloyd/register_of_authorized_insurers.html).

Can anyone suggest me how to edit the selectors to extract the data?Please find the attached SS

1 Like

Hello @Meghanaraj,

Try this logic:

Script:

Excel.LaunchAndOpen Path: “path to the excel file” \
Instance=> ExcelInstance
#you must get this info in order to make the range more dynamic
#so you don’t need to know the exact number of rows/columns each time
Excel.GetFirstFreeColumnRow Instance: ExcelInstance \
FirstFreeColumn=> FirstFreeColumn \
FirstFreeRow=> FirstFreeRow
#you read the cells from column 1 only, beginning from the first row until the last one filled #with data
Excel.ReadCells Instance: ExcelInstance \
StartColumn: 1 \
StartRow: 0 \
EndColumn: 1 \
EndRow: FirstFreeRow - 1 \
ReadAsText:False \
FirstLineIsHeader:False \
Value=> Value

Then you can store the data in an Excel file.

Best regards,
J.

1 Like

Hello @Meghanaraj, I have had a difficult time scraping this page. I believe there may be anti-scraping protection, because I can only run the code below once a day with success. Needless to say, I won’t be doing much scraping on this site. :grinning: Proxies may help - I don’t have any available.

The code below only selects all 162 File Numbers and then writes them to a Result object, using Javascript. At the end of the Javascript code you’ll see ‘div-col-0’. If you examine the page code with developer tools, you’ll see that each data pseudo-row also contains ‘div-col-1’, etc., up to ‘div-col-9’ (i.e. 10 rows, 1 for each header cell). If you clone this code and increase the ‘div-col’ number for each header cell, you’ll have 10 result objects. You can then write them into Excel with a loop.

(Not shown is code that writes the result to a file - in practice I would probably not do that.)

Code:

set myURL to 'https://www.ia.org.hk/en/supervision/reg_insurers_lloyd/register_of_authorized_insurers.html'

WebAutomation.LaunchChrome \
        Url: myURL \
        WindowState:WebAutomation.BrowserWindowState.Maximized \
        ClearCache: False \
        ClearCookies:False \
        BrowserInstance=> Browser

wait 5

MouseAndKeyboard.ClickAt \
        ClickType:MouseAndKeyboard.MouseClickType.LeftClick \
        MillisecondsDelay:0 \
        X: 400 Y: 750 \
        RelativeTo:MouseAndKeyboard.PositionRelativeTo.Screen \
        MovementStyle:MouseAndKeyboard.MovementStyle.Instant

wait 5

WebAutomation.ExecuteJavascript    \
    BrowserInstance:  Browser\
    Javascript:"""
    function ExecuteScript() 
        { 
            var array =[];
            var fileNo = document.getElementsByClassName("box-table-content-col box-style-wrapper div-col-0");
            for (i = 0; i < fileNo.length; i++) {
            array.push(fileNo.item(i).innerText);
            };

            return array;
        }
        """ \
        Result=> FileNoResult

Console.Write Message: FileNoResult

I hope this may be of some use to you.
Regards,
burque505

2 Likes