RPA Challenge 'Invoice Extraction'

Thanks to community I am getting grasp of Robin.
Tried solving this RPA challenge without using any JS, Python etc to explore Robin functionalities in deep.
Only Tesseract OCR is the external requirement. Took ~30sec to beat.

#Import AppMask
import "C:\\Robin\RPAChallenge.appmask"
#StartTime to check at the end how much time script took.
DateTime.Local DateTimeFormat:DateTime.DateTimeFormat.DateAndTime CurrentDateTime=> StartTime

WebAutomation.LaunchAutomationBrowser Url:"https://rpachallengeocr.azurewebsites.net/"  WindowState:WebAutomation.BrowserWindowState.Maximized ClearCache:False ClearCookies:False BrowserInstance=> Browser
#set OCRPath to "C:\Users\Sahil.Adlakha\AppData\Local\Tesseract-OCR\tesseract.exe"
set OCRPath to "C:\Tesseract-OCR\tesseract.exe"
set Path to "C:\Captcha\Invoice"
set dVar to {["ID","DueDate","InvoiceNo","InvoiceDate","CompanyName","TotalDue"]}

#This file will be uploaded
File.WriteCSV VariableToWrite:dVar  CSVFile:Path+"\FileToUpload.csv" \
                                    CsvFileEncoding:File.CSVEncoding.UTF8 \
                                    IncludeColumnNames:false IfFileExists:File.IfFileExists.Append \
                                    ColumnsSeparator:File.CSVColumnsSeparator.SystemDefault

#Current date to match DueDate in Table
DateTime.Local DateTimeFormat:DateTime.DateTimeFormat.DateOnly CurrentDateTime=> CurrentDate
WebAutomation.Click BrowserInstance:Browser Control:RPAChallenge.InvoiceChallenge.Start
#Counter for 3 pages
loop LoopCounter from 1 to 3
    #Click Page Number. Can go either for PageNumber of Next. LoopCounter is variable being passed to Appmask file
   WebAutomation.Click BrowserInstance:Browser Control:RPAChallenge.InvoiceChallenge.Page(Number:'"'+LoopCounter+'"')  
        #Table extracted
   WebAutomation.DataExtraction.ExtractHtmlTable BrowserInstance:Browser  Control:RPAChallenge.InvoiceChallenge.Table  ExtractedData=> DataTable
        #For All rows in Table
    loop i from 0 to DataTable.rows.count-1
        
        DateTime.Subtract FromDate:CurrentDate  SubstractDate:DataTable[i][2]  TimeUnit:DateTime.DifferenceTimeUnit.Days TimeDifference=> TimeDifference
            if TimeDifference>=0 then
                   
                    WebAutomation.DataExtraction.GetDetailsOfElement BrowserInstance:Browser  \
                                                                     Control:RPAChallenge.InvoiceChallenge.Link(Number:i+1) \
                                                                     AttributeName:"href"  \
                                                                     AttributeValue=> AttributeValue
                    web.DownloadToFile Url:AttributeValue  \
                                        FilePath:"C:\Users\Sahil.Adlakha\Desktop\Captcha\Invoice\\"+DataTable[i][1]+".jpg" \
                                        ConnectionTimeout:30 \
                                        FollowRedirection:True \
                                        ClearCookies:False \
                                        Encoding:''\
                                        DownloadedFile=> DownloadedFile
                    #Convert to Text using OCR
                    set Arguments to ' ' + DownloadedFile + ' ' + "Invoice"
                    system.RunApplicationAndWaitToComplete           ApplicationPath:  OCRPath \
                                                                    CommandLineArguments: Arguments\
                                                                    WorkingDirectory: Path\
                                                                    WindowStyle:System.ProcessWindowStyle.Minimized \
                                                                    Timeout:0 \
                                                                    ProcessId=> \
                                                                    ProcessId ExitCode=> ExitCode
                    File.ReadText File:Path+"\Invoice.txt"  Encoding:File.TextFileEncoding.UTF8 Content=> Content
                    #Check which kind of invoice is extracted
                    #Data from text file is extracted using REGEX
                    if "Aenean" in Content then
                        #Getting InvoiceDate
                        Text.RegexParse Text:Content  TextToFind:'([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))'  \
                                        StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> InvoiceDate
                        #Getting Total
                        Text.RegexParse Text:Content  TextToFind:'(Total (\d){4}\.\d{2})'  StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> Total
                        #Getting InvoiceNumber
                        Text.RegexParse Text:Content  TextToFind:'#\d{6}'  StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> InvoiceNumber
                        Text.Replace Text:InvoiceNumber  TextToFind:'\r\n|#'  ReplaceWith:''  IsRegEx:true IgnoreCase:False ActivateEscapeSequences:False Result=> InvoiceNumber
                        Text.Trim Text:Total TrimOption:Text.TrimOption.Both TrimmedText=> Total
                        Text.Replace Text:Total  TextToFind:'Total '  ReplaceWith:''  IsRegEx:false IgnoreCase:False ActivateEscapeSequences:False Result=> Total
                        Text.Trim Text:InvoiceDate TrimOption:Text.TrimOption.Both TrimmedText=> InvoiceDate
                        text.ToDateTimeCustomFormat Text:InvoiceDate  CustomFormat:"yyyy-MM-d"  DateTime=> TextAsDateTime
                        text.FromCustomDateTime DateTime:TextAsDateTime  CustomFormat:"dd-MM-yyyy"  Result=> InvoiceDate
                        Console.Write Message:  InvoiceNumber+","+Total+","+InvoiceDate+","+DataTable[i][1]+","+DataTable[i][2]
                        set dVar to { [DataTable[i][1], DataTable[i][2],InvoiceNumber,InvoiceDate,"Aenean LLC",Total] }
                        #Writes to CSV
                        File.WriteCSV VariableToWrite:dVar  CSVFile:Path+"\FileToUpload.csv"  \
                                        CsvFileEncoding:File.CSVEncoding.UTF8 IncludeColumnNames:false \
                                      IfFileExists:File.IfFileExists.Overwrite ColumnsSeparator:File.CSVColumnsSeparator.SystemDefault
                    else if "Amet" in Content then
                        #Getting InvoiceDate
                        Text.RegexParse Text:Content  TextToFind:'(([A-z]* (\d{2}|\d{1}), ([12]\d{3})))'  StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> InvoiceDate
                        #Getting Total
                        Text.RegexParse Text:Content  TextToFind:'Total \$\d*,\d*\.\d{2}'  StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> Total
                        #Getting InvoiceNumber
                        Text.RegexParse Text:Content  TextToFind:'(#\s?\d{5})'  StartingPosition:1  IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> InvoiceNumber
                        Text.Replace Text:InvoiceNumber  TextToFind:'\r\n|#| '  ReplaceWith:''  IsRegEx:true IgnoreCase:False ActivateEscapeSequences:False Result=> InvoiceNumber
                        Text.Replace Text:Total  TextToFind:'\r\n|\$|\,'  ReplaceWith:''  IsRegEx:true IgnoreCase:False ActivateEscapeSequences:False Result=> Total
                        Text.Replace Text:Total  TextToFind:'Total '  ReplaceWith:''  IsRegEx:false IgnoreCase:False ActivateEscapeSequences:False Result=> Total
                        Text.Replace Text:InvoiceDate  TextToFind:'\r\n'  ReplaceWith:''  IsRegEx:true IgnoreCase:False ActivateEscapeSequences:False Result=> InvoiceDate
                        text.ToDateTimeCustomFormat Text:InvoiceDate  CustomFormat:"MMM d, yyyy"  DateTime=> TextAsDateTime
                        text.FromCustomDateTime DateTime:TextAsDateTime  CustomFormat:"dd-MM-yyyy"  Result=> InvoiceDate
                        Console.Write Message:  InvoiceNumber+","+Total+","+InvoiceDate+","+DataTable[i][1]+","+ DataTable[i][2]
                        set dVar to { [DataTable[i][1], DataTable[i][2],InvoiceNumber,InvoiceDate,"Sit Amet Corp.",Total] }
                        #Writes to CSV
                        File.WriteCSV VariableToWrite:dVar  CSVFile:Path+"\FileToUpload.csv" \
                                        CsvFileEncoding:File.CSVEncoding.UTF8 IncludeColumnNames:false \
                                        IfFileExists:File.IfFileExists.Overwrite ColumnsSeparator:File.CSVColumnsSeparator.SystemDefault
                    end
                    
            end
    end
    
end


#Clicking without wait. With wait doesn't work for File upload button.
WebAutomation.ClickNoWait BrowserInstance:Browser  Control:RPAChallenge.Submit.Submit

#Gives file name to be uploaded
UIAutomation.FormFilling.FocusTextField TextField:RPAChallenge.Browser.Filename
UIAutomation.FormFilling.PopulateTextField TextField:RPAChallenge.Browser.Filename  Text:Path+"\FileToUpload.csv"

#Clicks open
UIAutomation.Click Element:RPAChallenge.Browser.Open  ClickType:UIAutomation.ClickType.LeftClick MousePositionRelativeToElement:UIAutomation.RectangleEdgePoint.MiddleCenter OffsetX:0 OffsetY:0

DateTime.Local DateTimeFormat:DateTime.DateTimeFormat.DateAndTime CurrentDateTime=> EndTime
#Calculates execution time
DateTime.Subtract FromDate:EndTime  SubstractDate:StartTime  TimeUnit:DateTime.DifferenceTimeUnit.Seconds TimeDifference=> TimeDifference
WebAutomation.CloseWebBrowser BrowserInstance: Browser
Console.Write Message: TimeDifference

image

Feel free to edit/enhance and proceed with other approaches. Looking forward for the ideas.

Regards
Sahil

2 Likes

@Sahil Please explain the this line:
WebAutomation.Click BrowserInstance:Browser Control:RPAChallenge.InvoiceChallenge.Page(Number:ā€™"ā€™+LoopCounter+ā€™"ā€™)

How to configure it in UiSpy?

Thank you,
Ranjith

@Ranjith
Sorry for late reply.
below is selector in UiSpy. Variable/Parameter will go in angular brackets. In this case LoopCounter is going in Number.
a[class*="paginate_button "][data-dt-idx=<Number>]

2 Likes

Thank you @Sahil, How much time it takes to complete the challenge?

I completed the challenge in 41 seconds with a different approach.

@Ranjith It took around 30-35secs.

1 Like

can you please explain me what you have written in The TextToFind field.
I have gone through your code but Iā€™m not getting these part of regex.
please explain ā€¦

Thanks and Regards
Snehal Bhole

Hi Snehal

TextToFind is used in many places. I will try to explain one of those to give you the grasp of it.

Here data in Content will searched for date using this regex expression.
In TextToFind we are putting our required regex expression to find the required.

Text.RegexParse Text:Content TextToFind:'(([A-z]* (\d{2}|\d{1}), ([12]\d{3})))' StartingPosition:1 IgnoreCase:False OccurrencePositions=> OccurrencePositions Matches=> InvoiceDate

( - Grouping starts. Groups multiple tokens and creates a capture group.
[A-z] - Matching set. Matches any character in the set. Here it looks for any of the characters from A to Z(including both upper and lower case)

    • Matches 0 or more preceding token
      ( - Another groups start.
      \d - Matches any digit characted.
      {2} - Quantifier to match preceding token.
      | - Acts as Boolean OR
      So - (\d{2} | \d{1} ) - this will look for either 2digit number or 1digit number.
      , - Looks for Comma
    • Whiteline characted space will look for space in the content
      ( [12],\d{3}) - this express will find year starting with either 1 or 2 followed by 3digit number.
      ) - Grouping ends

So whole expression will look for something line
March 21, 1992 but will not find March 21, 3999 or March 222, 2000 or March or Hello etc

Hope I have clarified your doubt to some level. Let know if you are not able to understand something specific.

2 Likes

Thank You @Sahil . I understand the concept for Regex.