RPA Challenge 'Invoice Extraction'

Thanks to community I am getting grasp of Robin.
Tried solving this RPA challenge without using any JS, Python etc to explore Robin functionalities in deep.
Only Tesseract OCR is the external requirement. Took ~30sec to beat.

#Import AppMask
import "C:\\Robin\RPAChallenge.appmask"
#StartTime to check at the end how much time script took.
DateTime.Local DateTimeFormat:DateTime.DateTimeFormat.DateAndTime CurrentDateTime=> StartTime

WebAutomation.LaunchAutomationBrowser Url:"https://rpachallengeocr.azurewebsites.net/"  WindowState:WebAutomation.BrowserWindowState.Maximized ClearCache:False ClearCookies:False BrowserInstance=> Browser
#set OCRPath to "C:\Users\Sahil.Adlakha\AppData\Local\Tesseract-OCR\tesseract.exe"
set OCRPath to "C:\Tesseract-OCR\tesseract.exe"
set Path to "C:\Captcha\Invoice"
set dVar to {["ID","DueDate","InvoiceNo","InvoiceDate","CompanyName","TotalDue"]}

#This file will be uploaded
File.WriteCSV VariableToWrite:dVar  CSVFile:Path+"\FileToUpload.csv" \
                                    CsvFileEncoding:File.CSVEncoding.UTF8 \
                                    IncludeColumnNames:false IfFileExists:File.IfFileExists.Append \
                                    ColumnsSeparator:File.CSVColumnsSeparator.SystemDefault

#Current date to match DueDate in Table
DateTime.Local DateTimeFormat:DateTime.DateTimeFormat.DateOnly CurrentDateTime=> CurrentDate
WebAutomation.Click BrowserInstance:Browser Control:RPAChallenge.InvoiceChallenge.Start
#Counter for 3 pages
loop LoopCounter from 1 to 3
    #Click Page Number. Can go either for PageNumber of Next. LoopCounter is variable being passed to Appmask file
   WebAutomation.Click BrowserInstance:Browser Control:RPAChallenge.InvoiceChallenge.Page(Number:'"'+LoopCounter+'"')  
        #Table extracted
   WebAutomation.DataExtraction.ExtractHtmlTable BrowserInstance:Browser  Control:RPAChallenge.InvoiceChallenge.Table  ExtractedData=> DataTable
        #For All rows in Table
    loop i from 0 to DataTable.rows.count-1
        
        DateTime.Subtract FromDate:CurrentDate  SubstractDate:DataTable[i][2]  TimeUnit:DateTime.DifferenceTimeUnit.Days TimeDifference=> TimeDifference
            if TimeDifference>=0 then
                   
                    WebAutomation.DataExtraction.GetDetailsOfElement BrowserInstance:Browser  \
                                                                     Control:RPAChallenge.InvoiceChallenge.Link(Number:i+1) \
                                                                     AttributeName:"href"  \
                                                                     AttributeValue=> AttributeValue
                    web.DownloadToFile Url:AttributeValue  \
                                        FilePath:"C:\Users\Sahil.Adlakha\Desktop\Captcha\Invoice\\"+DataTable[i][1]+".jpg" \
                                        ConnectionTimeout:30 \
                                        FollowRedirection:True \
                                        ClearCookies:False \
                                        Encoding:''\
                                        DownloadedFile=> DownloadedFile
                    #Convert to Text using OCR
                    set Arguments to ' ' + DownloadedFile + ' ' + "Invoice"
                    system.RunApplicationAndWaitToComplete           ApplicationPath:  OCRPath \
                                                                    CommandLineArguments: Arguments\
                                                                    WorkingDirectory: Path\
                                                                    WindowStyle:System.ProcessWindowStyle.Minimized \
                                                                    Timeout:0 \
                                                                    ProcessId=> \
                                                                    ProcessId ExitCode=> ExitCode
                    File.ReadText File:Path+"\Invoice.txt"  Encoding:File.TextFileEncoding.UTF8 Content=> Content
                    #Check which kind of invoice is extracted
                    #Data from text file is extracted using REGEX
                    if "Aenean" in Content then
                        #Getting InvoiceDate
                        Text.RegexParse Text:Content  TextToFind:'([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))'  \
                                        StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> InvoiceDate
                        #Getting Total
                        Text.RegexParse Text:Content  TextToFind:'(Total (\d){4}\.\d{2})'  StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> Total
                        #Getting InvoiceNumber
                        Text.RegexParse Text:Content  TextToFind:'#\d{6}'  StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> InvoiceNumber
                        Text.Replace Text:InvoiceNumber  TextToFind:'\r\n|#'  ReplaceWith:''  IsRegEx:true IgnoreCase:False ActivateEscapeSequences:False Result=> InvoiceNumber
                        Text.Trim Text:Total TrimOption:Text.TrimOption.Both TrimmedText=> Total
                        Text.Replace Text:Total  TextToFind:'Total '  ReplaceWith:''  IsRegEx:false IgnoreCase:False ActivateEscapeSequences:False Result=> Total
                        Text.Trim Text:InvoiceDate TrimOption:Text.TrimOption.Both TrimmedText=> InvoiceDate
                        text.ToDateTimeCustomFormat Text:InvoiceDate  CustomFormat:"yyyy-MM-d"  DateTime=> TextAsDateTime
                        text.FromCustomDateTime DateTime:TextAsDateTime  CustomFormat:"dd-MM-yyyy"  Result=> InvoiceDate
                        Console.Write Message:  InvoiceNumber+","+Total+","+InvoiceDate+","+DataTable[i][1]+","+DataTable[i][2]
                        set dVar to { [DataTable[i][1], DataTable[i][2],InvoiceNumber,InvoiceDate,"Aenean LLC",Total] }
                        #Writes to CSV
                        File.WriteCSV VariableToWrite:dVar  CSVFile:Path+"\FileToUpload.csv"  \
                                        CsvFileEncoding:File.CSVEncoding.UTF8 IncludeColumnNames:false \
                                      IfFileExists:File.IfFileExists.Overwrite ColumnsSeparator:File.CSVColumnsSeparator.SystemDefault
                    else if "Amet" in Content then
                        #Getting InvoiceDate
                        Text.RegexParse Text:Content  TextToFind:'(([A-z]* (\d{2}|\d{1}), ([12]\d{3})))'  StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> InvoiceDate
                        #Getting Total
                        Text.RegexParse Text:Content  TextToFind:'Total \$\d*,\d*\.\d{2}'  StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> Total
                        #Getting InvoiceNumber
                        Text.RegexParse Text:Content  TextToFind:'(#\s?\d{5})'  StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> InvoiceNumber
                        Text.Replace Text:InvoiceNumber  TextToFind:'\r\n|#| '  ReplaceWith:''  IsRegEx:true IgnoreCase:False ActivateEscapeSequences:False Result=> InvoiceNumber
                        Text.Replace Text:Total  TextToFind:'\r\n|\$|\,'  ReplaceWith:''  IsRegEx:true IgnoreCase:False ActivateEscapeSequences:False Result=> Total
                        Text.Replace Text:Total  TextToFind:'Total '  ReplaceWith:''  IsRegEx:false IgnoreCase:False ActivateEscapeSequences:False Result=> Total
                        Text.Replace Text:InvoiceDate  TextToFind:'\r\n'  ReplaceWith:''  IsRegEx:true IgnoreCase:False ActivateEscapeSequences:False Result=> InvoiceDate
                        text.ToDateTimeCustomFormat Text:InvoiceDate  CustomFormat:"MMM d, yyyy"  DateTime=> TextAsDateTime
                        text.FromCustomDateTime DateTime:TextAsDateTime  CustomFormat:"dd-MM-yyyy"  Result=> InvoiceDate
                        Console.Write Message:  InvoiceNumber+","+Total+","+InvoiceDate+","+DataTable[i][1]+","+ DataTable[i][2]
                        set dVar to { [DataTable[i][1], DataTable[i][2],InvoiceNumber,InvoiceDate,"Sit Amet Corp.",Total] }
                        #Writes to CSV
                        File.WriteCSV VariableToWrite:dVar  CSVFile:Path+"\FileToUpload.csv" \
                                        CsvFileEncoding:File.CSVEncoding.UTF8 IncludeColumnNames:false \
                                        IfFileExists:File.IfFileExists.Overwrite ColumnsSeparator:File.CSVColumnsSeparator.SystemDefault
                    end
                    
            end
    end
    
end


#Clicking without wait. With wait doesn't work for File upload button.
WebAutomation.ClickNoWait BrowserInstance:Browser  Control:RPAChallenge.Submit.Submit

#Gives file name to be uploaded
UIAutomation.FormFilling.FocusTextField TextField:RPAChallenge.Browser.Filename
UIAutomation.FormFilling.PopulateTextField TextField:RPAChallenge.Browser.Filename  Text:Path+"\FileToUpload.csv"

#Clicks open
UIAutomation.Click Element:RPAChallenge.Browser.Open  ClickType:UIAutomation.ClickType.LeftClick MousePositionRelativeToElement:UIAutomation.RectangleEdgePoint.MiddleCenter OffsetX:0 OffsetY:0

DateTime.Local DateTimeFormat:DateTime.DateTimeFormat.DateAndTime CurrentDateTime=> EndTime
#Calculates execution time
DateTime.Subtract FromDate:EndTime  SubstractDate:StartTime  TimeUnit:DateTime.DifferenceTimeUnit.Seconds TimeDifference=> TimeDifference
WebAutomation.CloseWebBrowser BrowserInstance: Browser
Console.Write Message: TimeDifference

image

Feel free to edit/enhance and proceed with other approaches. Looking forward for the ideas.

Regards
Sahil

2 Likes

@Sahil Please explain the this line:
WebAutomation.Click BrowserInstance:Browser Control:RPAChallenge.InvoiceChallenge.Page(Number:’"’+LoopCounter+’"’)

How to configure it in UiSpy?

Thank you,
Ranjith