(Email) Email table extraction into excel

How can we Extract table from email content and save it into an excel file?
can anyone help please?
thanks in advance!

1 Like

First you need to retrieve the email and after that you can use the following logic to work on RetrivedEmails[0].Body variable.

You need convert this content into text by using Text module after you can write it into excel file.

In text module you can find many useful actions which will help you to build the proper text.

Plus you need to be hands on Regular Expression to work on such kind of use cases.

First of all you need to remove html tags from the body. To Remove the all html tags from table and convert it into text, please refer the attached code.

image

3 Likes

@abhi1100 we dont have any direct option availabe in Robin to extract table from email body. But we can build custom module to extract table from email body.

1 Like

@Ranjith @abhi1100 kindly refer the above approach, you can extract table data from email body with this approach. It is an direct approach.

Let me know if you need any clarifications.

2 Likes

Admin note: Always post in the appropriate category using a descriptive title.
Please respect the rules.

2 Likes

Thanks, @Suniel for your suggestion. When I try to print the mail body I’m getting a string like this:

The text replace code won’t work in this case. Correct me if I’m wrong

BlockquoteEmail.Retrieve
IMAPServer: “imap.gmail.com
Port:993
EnableSSL:True
Username: "@gmail.com"
Password: “”
MailsToRetrieve:Email.RetrieveMessagesMode.All
MarkAsRead:False
MailFolder : “inbox”
SubjectContains:‘Table Test’
RetrievedEmails=> RetrievedEmails

set mailString to RetrievedEmails[0].body

Text.Replace Text: RetrievedEmails[0].body TextToFind:’(<[^>]*>)’ ReplaceWith:"" IsRegEx:True IgnoreCase:False ActivateEscapeSequences:False Result=> ReplacedText

Console.Write Message: ReplacedText

OnConsole:

<div dir="ltr"><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>&lt;!DOCTYPE html&gt;<br>&lt;html&gt;<br>&lt;head&gt;<br>&lt;style&gt;<br>table {<br>  font-family: arial, sans-serif;<br>  border-collapse: collapse;<br>  width: 100%;<br>}<br><br>td, th {<br>  border: 1px solid #dddddd;<br>  text-align: left;<br>  padding: 8px;<br>}<br><br>tr:nth-child(even) {<br>  background-color: #dddddd;<br>}<br>&lt;/style&gt;<br>&lt;/head&gt;<br>&lt;body&gt;<br><br>&lt;h2&gt;HTML Table&lt;/h2&gt;<br><br>&lt;table&gt;<br>  &lt;tr&gt;<br>    &lt;th&gt;Company&lt;/th&gt;<br>    &lt;th&gt;Contact&lt;/th&gt;<br>    &lt;th&gt;Country&lt;/th&gt;<br>  &lt;/tr&gt;<br>  &lt;tr&gt;<br>    &lt;td&gt;Alfreds Futterkiste&lt;/td&gt;<br>    &lt;td&gt;Maria Anders&lt;/td&gt;<br>    &lt;td&gt;Germany&lt;/td&gt;<br>  &lt;/tr&gt;<br>  &lt;tr&gt;<br>    &lt;td&gt;Centro comercial Moctezuma&lt;/td&gt;<br>    &lt;td&gt;Francisco Chang&lt;/td&gt;<br>    &lt;td&gt;Mexico&lt;/td&gt;<br>  &lt;/tr&gt;<br>  &lt;tr&gt;<br>    &lt;td&gt;Ernst Handel&lt;/td&gt;<br>    &lt;td&gt;Roland Mendel&lt;/td&gt;<br>    &lt;td&gt;Austria&lt;/td&gt;<br>  &lt;/tr&gt;<br>  &lt;tr&gt;<br>    &lt;td&gt;Island Trading&lt;/td&gt;<br>    &lt;td&gt;Helen Bennett&lt;/td&gt;<br>    &lt;td&gt;UK&lt;/td&gt;<br>  &lt;/tr&gt;<br>  &lt;tr&gt;<br>    &lt;td&gt;Laughing Bacchus Winecellars&lt;/td&gt;<br>    &lt;td&gt;Yoshi Tannamuri&lt;/td&gt;<br>    &lt;td&gt;Canada&lt;/td&gt;<br>  &lt;/tr&gt;<br>  &lt;tr&gt;<br>    &lt;td&gt;Magazzini Alimentari Riuniti&lt;/td&gt;<br>    &lt;td&gt;Giovanni Rovelli&lt;/td&gt;<br>    &lt;td&gt;Italy&lt;/td&gt;<br>  &lt;/tr&gt;<br>&lt;/table&gt;<br><br>&lt;/body&gt;<br>&lt;/html&gt;<br></div></div></div></div>

Can you share me screenshot with output from the replace text action?

1 Like

Hi all, here’s a stab it at. I’m using a variable loaded with HTML snagged from w3schools.com instead of pulling in an HTML table from an email message, just for demo purposes.

set guineapig to """
<div class="w3-example">
<h3>Example</h3>
<div class="w3-white w3-padding notranslate w3-padding-16">
<table id="customers">
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td>Centro comercial Moctezuma</td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr>
  <tr>
    <td>Ernst Handel</td>
    <td>Roland Mendel</td>
    <td>Austria</td>
  </tr>
  <tr>
    <td>Island Trading</td>
    <td>Helen Bennett</td>
    <td>UK</td>
  </tr>
  <tr>
    <td>Laughing Bacchus Winecellars</td>
    <td>Yoshi Tannamuri</td>
    <td>Canada</td>
  </tr>
  <tr>
    <td>Magazzini Alimentari Riuniti</td>
    <td>Giovanni Rovelli</td>
    <td>Italy</td>
  </tr>
</table>
</div>
<a class="w3-btn w3-margin-top w3-margin-bottom" href="tryit.asp?filename=tryhtml_table_intro" target="_blank">Try it Yourself</a>
</div>
"""

text.Replace \
    Text: guineapig \
    TextToFind: '<[a-zA-Z\/][^>]*>' \
    ReplaceWith:  ""\
    IsRegEx:True \
    IgnoreCase:False \
    ActivateEscapeSequences:False \
    Result=> ReplacedText
    
Console.Write Message: ReplacedText

Output:

Example


  
    Company
    Contact
    Country
  
  
    Alfreds Futterkiste
    Maria Anders
    Germany
  
  
    Centro comercial Moctezuma
    Francisco Chang
    Mexico
  
  
    Ernst Handel
    Roland Mendel
    Austria
  
  
    Island Trading
    Helen Bennett
    UK
  
  
    Laughing Bacchus Winecellars
    Yoshi Tannamuri
    Canada
  
  
    Magazzini Alimentari Riuniti
    Giovanni Rovelli
    Italy
  


Try it Yourself


Execution completed successfully.

I hope this is helpful (by the way, I filched the regex from here, I didn’t write it myself).

HOWEVER: This doesn’t really answer the OP’s question - how to get it into an Excel file. It will be easy enough if you know in advance what the output will look like, but if the intent is to grab ANY HTML table from ANY message, this is really just the first part of a solution.

I would like to see native Robin functionality like that of combining Beautiful Soup with pandas (Python) for extracting table data that isn’t really structured as a table but appears as one, and there is a .NET library I just discovered that may at some point help: Pandas.NET, but all the docs and internal messages are in Mandarin at the moment. Microsoft is making a stab at dealing with dataframes also: see this intro to DataFrame.

I bet I’m not the only frustrated sometimes by trying to extract (as a table) something that looks like a table, walks like a table, quacks like a table, but isn’t really an HTML table. :grinning:

Regards,
burque505

1 Like

Can you try with the same code from my above screenshot with all indentation as it is?