±Forensic Focus Partners

Become an advertising partner

±Your Account


Username
Password

Forgotten password/username?

Site Members:

New Today: 0 Overall: 36296
New Yesterday: 2 Visitors: 101

±Follow Forensic Focus

Forensic Focus Facebook PageForensic Focus on TwitterForensic Focus LinkedIn GroupForensic Focus YouTube Channel

RSS feeds: News Forums Articles

±Latest Articles

±Latest Videos

±Latest Jobs

Extract indexed websites

Computer forensics discussion. Please ensure that your post is not better suited to one of the forums below (if it is, please post it there instead!)
Reply to topicReply to topic Printer Friendly Page
Forum FAQSearchView unanswered posts
 
  

LeGioN
Member
 

Extract indexed websites

Post Posted: Mar 25, 19 08:25

Hi,

This might be a really dumb question..
But here is the scenario:

Somebody creates a webpage.
It gets indexed by google.
It then gets deleted.

The webpage is no longer accessable, but you can still see bits of it through just good ol' fashion googling as it has been indexed.

Is there a way to extract everything that google has indexed?

If this even makes sense Smile


/LeGioN  
 
  

LeGioN
Member
 

Re: Extract indexed websites

Post Posted: Mar 25, 19 08:47

Additional info:
Have tried the wayback machine website unsuccesfully as the page needed was not captured.  
 
  

tootypeg
Senior Member
 

Re: Extract indexed websites

Post Posted: Mar 25, 19 09:46

not sure i fully understand the scenario. Maybe its still in the browser cache of a suspect? For example, make Chrome work offline and rebuild the page from the cache?  
 
  

jaclaz
Senior Member
 

Re: Extract indexed websites

Post Posted: Mar 25, 19 10:01

As I see it a page (not existing anymore) has EITHER been archived (on wayback machine or on other services) or not.
If not, and if it has been crawled by google (usually it has, since the google crawler is [email protected] efficient) it may be in the cache.
The google cache is temporary only, so you might (or might not) be "on time" to still get it.
Also, unlike archive.org/Wayback Machine the google cache is "last" time google visited it only, so if the page has been - even briefly - replaced by another page, you will find this latter in google cache.

To access easily the google cache you may want to try:
cachedview.com/

There are other archiving/caching resources, even if they are "tiny" when compared to Google or archive.org, it costs nothing to check if - by sheer luck - something of interest has been cached/archived by them, example:
www.waybackmachinedown...chive-org/

A "complete" list is here:
en.wikipedia.org/wiki/...nitiatives
(though most are dedicated to "institutional" websites)

jaclaz
_________________
- In theory there is no difference between theory and practice, but in practice there is. - 
 
  

LeGioN
Member
 

Re: Extract indexed websites

Post Posted: Mar 25, 19 10:42

- jaclaz
As I see it a page (not existing anymore) has EITHER been archived (on wayback machine or on other services) or not.
If not, and if it has been crawled by google (usually it has, since the google crawler is [email protected] efficient) it may be in the cache.
The google cache is temporary only, so you might (or might not) be "on time" to still get it.
Also, unlike archive.org/Wayback Machine the google cache is "last" time google visited it only, so if the page has been - even briefly - replaced by another page, you will find this latter in google cache.

To access easily the google cache you may want to try:
cachedview.com/

There are other archiving/caching resources, even if they are "tiny" when compared to Google or archive.org, it costs nothing to check if - by sheer luck - something of interest has been cached/archived by them, example:
www.waybackmachinedown...chive-org/

A "complete" list is here:
en.wikipedia.org/wiki/...nitiatives
(though most are dedicated to "institutional" websites)

jaclaz


This was the sort of stuff I was hoping you'd show up with!
Tried both cachedview and wayback with not much success, but I am going to give wayback another go.

I had some success with Google Index Retriever by elevenpaths, but I did not quite get me all the good stuff I was hoping to get.

Any my bad tootypeg, I did not specify the fact that there is no physical devices involved. Just a deleted URL. Smile

/LeGioN  
 

Page 1 of 1