Intro guide: Extracting web data with JS - redux

Here is an expanded version of the data extraction code here. It queries the eBay page for looper pedals, grabs the looper item description, price, image URL and individual page link.

Script console output:

output

As a teaser, a shot of the resulting Excel spreadsheet (formatted post-processing - doing that in the code will have to wait for another day).

The code is too long for a screenshot, but it’s reproduced below.

N.B.: This is ALPHA code, and is basically just POC, the concept being that we can do complicated web scraping with Robin, even on sites that have plenty of capability built in to make that task difficult.

# Please see this page: 
# https://support.softomotive.com/support/solutions/articles/35000143406-return-an-array-list-variable-with-javascript

set myUrl to 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=looper%20pedal&_sacat=0&LH_TitleDesc=0&_udlo=1&_udhi=40&rt=nc'

WebAutomation.LaunchChrome                  Url:  myUrl\
                                            WindowState:WebAutomation.BrowserWindowState.Maximized \
                                            ClearCache:False \
                                            ClearCookies:False \
                                            BrowserInstance=> Browser

wait 2

# The following few blocks may need to go in a function, but this is still in development.
# Code is duplicated, but it is easier to see what is going on (at least for me)
# BLOCK ONE: Image list

WebAutomation.ExecuteJavascript             BrowserInstance:  Browser\
                                            Javascript:"""
                                            function ExecuteScript() 
                                            { 
                                                var array =[];
                                                var images = document.querySelectorAll('img[class="s-item__image-img"]');
                                                images.forEach(function(element){
                                                array.push(element.src);
                                                });

                                            return array;
                                            }
                                            """ \
                                            Result=> ImageResult

wait 2


Text.SplitWithDelimiter                     Text:  ImageResult \
                                            CustomDelimiter: ',' \
                                            IsRegEx:False \
                                            Result=> ImageTextList


Console.Write                               Message: "6th image URL: " + ImageTextList[5]
Console.Write                               Message: "Number of images: " + ImageTextList.Count

#BLOCK TWO: Links

WebAutomation.ExecuteJavascript             BrowserInstance:  Browser\
                                            Javascript:"""
                                            function ExecuteScript() 
                                            { 
                                                var array =[];
                                                var items = document.querySelectorAll('a[class="s-item__link"]');
                                                items.forEach(function(element){
                                                array.push(element.href);
                                                });

                                            return array;
                                            }
                                            """ \
                                            Result=> LinkResult

wait 2


Text.SplitWithDelimiter                     Text:  LinkResult \
                                            CustomDelimiter: ',' \
                                            IsRegEx:False \
                                            Result=> LinkTextList

Console.Write                               Message: "6th link result: " + LinkTextList[5]
Console.Write                               Message: "Number of links: " + LinkTextList.Count


#BLOCK THREE: Titles

WebAutomation.ExecuteJavascript             BrowserInstance:  Browser\
                                            Javascript:"""
                                            function ExecuteScript() 
                                            { 
                                                var array =[];
                                                var items = document.querySelectorAll('h3[class^="s-item__title"]');
                                                items.forEach(function(element){
                                                array.push(element.innerText);
                                                });

                                            return array;
                                            }
                                            """ \
                                            Result=> TitleResult

wait 2


Text.SplitWithDelimiter                     Text:  TitleResult \
                                            CustomDelimiter: ',' \
                                            IsRegEx:False \
                                            Result=> TitleTextList

Console.Write                               Message: "Last title: " + TitleTextList[TitleTextList.Count-1]
Console.Write                               Message: "Number of titles: " + TitleTextList.Count


#BLOCK Four: Titles

WebAutomation.ExecuteJavascript             BrowserInstance:  Browser\
                                            Javascript:"""
                                            function ExecuteScript() 
                                            { 
                                                var array =[];
                                                var items = document.querySelectorAll('span[class^="s-item__price"]');
                                                items.forEach(function(element){
                                                array.push(element.innerText);
                                                });

                                            return array;
                                            }
                                            """ \
                                            Result=> PriceResult

wait 2


Text.SplitWithDelimiter                     Text:  PriceResult \
                                            CustomDelimiter: ',' \
                                            IsRegEx:False \
                                            Result=> PriceTextList

Console.Write                               Message: "Last price: " + PriceTextList[PriceTextList.Count-1]
Console.Write                               Message: "Number of prices: " + PriceTextList.Count


# Close browser
WebAutomation.CloseWebBrowser               BrowserInstance: Browser





#***************************
# Post processing
#***************************

Excel.Launch                                Visible:True \
                                            LoadAddInsAndMacros:False \
                                            Instance=> ExcelInstance

# Write Headers

Excel.WriteCell                             Instance:  ExcelInstance \
                                            Value:  "Looper description"\
                                            Column:  'A'\
                                            Row: 1

Excel.WriteCell                             Instance:  ExcelInstance \
                                            Value:  "Price"\
                                            Column:  'B'\
                                            Row: 1

Excel.WriteCell                             Instance:  ExcelInstance \
                                            Value:  "Image source URL"\
                                            Column:  'C'\
                                            Row: 1
Excel.WriteCell                             Instance:  ExcelInstance \
                                            Value:  "Link to individual page"\
                                            Column:  'D'\
                                            Row: 1

# We only choose titles with images, count is set to count of ImageTextList

loop i from 2 to ImageTextList.Count

# Write the titles to Column A
Excel.WriteCell                             Instance:  ExcelInstance \
                                            Value:  TitleTextList[i-1]\
                                            Column:  'A'\
                                            Row: i

# Write the prices to Column B
Excel.WriteCell                             Instance:  ExcelInstance \
                                            Value:  PriceTextList[i-1]\
                                            Column:  'B'\
                                            Row: i

# Write the image source URL to Column C
Excel.WriteCell                             Instance:  ExcelInstance \
                                            Value:  ImageTextList[i-1]\
                                            Column:  'C'\
                                            Row: i

# Write the individual item links to Column D
Excel.WriteCell                             Instance:  ExcelInstance \
                                            Value:  LinkTextList[i-1]\
                                            Column:  'D'\
                                            Row: i

end

Next steps may be to format the Excel sheet in the Robin script and embed the image in the Excel sheet.

Regards,
burque505

3 Likes

Awesome !! Excellent job burque505!!! Congrats!

2 Likes

Thank you for the encouragement, @nldavila! Greatly appreciated.
Regards,
burque505

1 Like