±Forensic Focus Partners

Become an advertising partner

±Your Account


Username
Password

Forgotten password/username?

Site Members:

New Today: 0 Overall: 36595
New Yesterday: 0 Visitors: 126

±Follow Forensic Focus

Forensic Focus Facebook PageForensic Focus on TwitterForensic Focus LinkedIn GroupForensic Focus YouTube Channel

RSS feeds: News Forums Articles

±Latest Articles

±Latest Videos

±Latest Jobs

Ideas for:HTML Tables to spreadsheet. (Fun Side Project)

Computer forensics discussion. Please ensure that your post is not better suited to one of the forums below (if it is, please post it there instead!)
Reply to topicReply to topic Printer Friendly Page
Forum FAQSearchView unanswered posts
 
  

dpathan
Member
 

Ideas for:HTML Tables to spreadsheet. (Fun Side Project)

Post Posted: Oct 02, 18 23:05

I had a data that came across during my work and analysis has been completed.

For a side project, I am trying to combine a html export script and excel to put data in time line. The data is messages for all conversation between date range.

Link to sample html file: drive.google.com/file/...sp=sharing

As you can see that each message that was sent or received is formatted in HTML <table> tag. There are thousands of messages sent and received and each one of the message is in <table> tag. Further, the messages (see link example) between participants is grouped under <table> conversation</table>.

So far I am able to copy this data directly in excel and run a VBA macro (from stackexchange) to transpose the data in rows and columns.This method helped me accomplish the goal to finish the case.

However, I was thinking to automate this more and make it more dynamic. The VBA script is set to take the data in first three rows and place it in column(transpose) and continue until the last selected cell.

Link to VBA macro code: drive.google.com/file/...p=sharing.


If we can find a way to directly export the nested tables from html in to spreadsheet this would be more simpler. The current method only works if there are constant number of headings for each message such as Author, Time and Body. But it will fail if there are some messages in the conversation which has more than three headings for e.g. Author, Time, Body, Attachments.

Now we can set the macro to read either 3 headings or 4 or 5 and so on but it is not dynamic.

So far, I can see two ways to go forward with this
1) copy the data from html and improve the vba macro or
2) Write or Find or improvise a javascript or python to parse the html into excel.

I have also played around with TableExport javascript Link) but it is probably built to export simple tables and not nested such as the html example.

Having the conversation in spreadsheet would help creating timeline and analysis of multiple conversations. Once I have this spreadsheet I can experiment with gephi to create social graph based on persons unique id which is in the message.

Let me know if there are any other ideas or theories.  
 
  

jaclaz
Senior Member
 

Re: Ideas for:HTML Tables to spreadsheet. (Fun Side Project)

Post Posted: Oct 03, 18 21:03

From the example you posted, you have a set of tables, that can be summed up as a .ini file Shocked

<html>
<head>
</head><body>

<table><tr><th>Author</th><td>Person A (10001)<br /></td></tr></table>
<table><tr><th>Sent</th><td>2018-04-07 21:42:39 UTC<br /></td></tr></table>
<table><tr><th>Body</th><td>hello<br /></td></tr></table><br />
<table><tr><th>Author</th><td>Person A (10001)<br /></td></tr></table>
<table><tr><th>Sent</th><td>2018-04-07 18:08:12 UTC<br /></td></tr></table>
<table><tr><th>Body</th><td>hi there<br /></td></tr></table><br />
<table><tr><th>Author</th><td>Person B (20002)<br /></td></tr></table>
<table><tr><th>Sent</th><td>2018-04-07 18:05:33 UTC<br /></td></tr></table>
<table><tr><th>Body</th><td> great work <br /></td></tr></table>
</body>
</html>

i.e.:
Author=Person A (10001)
Sent=2018-04-07 21:42:39 UTC
Body=hello

etc.

So, you could use Nirsoft HTMLAsText to obtain the second format (or use a TAB or comma as a separator).

Then, you could use a plain set of formulas (without any VBA) to index the fields and perform a VLookUp on the contents, given that the first field of each set is always the same (i.e. "Author").

I'll see if I can put together such a spreadsheet.

jaclaz
_________________
- In theory there is no difference between theory and practice, but in practice there is. - 
 
  

jaclaz
Senior Member
 

Re: Ideas for:HTML Tables to spreadsheet. (Fun Side Project)

Post Posted: Oct 04, 18 08:50

Try this:
s000.tinyupload.com/in...3535332111

jaclaz
_________________
- In theory there is no difference between theory and practice, but in practice there is. - 
 
  

dpathan
Member
 

Re: Ideas for:HTML Tables to spreadsheet. (Fun Side Project)

Post Posted: Oct 04, 18 15:41

Thanks. That works perfectly. This will be a great help to analyze gmail and FB data as they are formatted this way in html.

Now, I have to update it according to the data that comes in from html. Also in future, as a procedure, I have to verify the format of extracted data before I can start analysis based on this.  
 

Page 1 of 1