±Forensic Focus Partners

Become an advertising partner

±Your Account


Username
Password

Forgotten password/username?

Site Members:

New Today: 0 Overall: 32784
New Yesterday: 0 Visitors: 134

±Follow Forensic Focus

Forensic Focus Facebook PageForensic Focus on TwitterForensic Focus LinkedIn GroupForensic Focus YouTube Channel

RSS feeds: News Forums Articles

±Latest Articles

RSS Feed Widget

±Latest Webinars

JPEG carving/identifying/recovering

Discussion of forensic issues related to all types of mobile phones and underlying technologies (GSM, GPRS, UMTS/3G, HSDPA, LTE, Bluetooth etc.)
Subforums: Mobile Telephone Case Law
Reply to topicReply to topic Printer Friendly Page
Forum FAQSearchView unanswered posts
Go to page 1, 2, 3, 4, 5, 6  Next 
  

JPEG carving/identifying/recovering

Post Posted: Sat Sep 27, 2014 5:56 am

For context this post originates from this thread:
www.forensicfocus.com/...c/t=12127/

- EvaMendis

Actually, It works with identifying JPGE file when Block begins :
- 0xff, 0xd8, 0xff, 0xe0
- 0xff, 0xd8, 0xff, 0xo1
- 0r 0xff, 0xd8, 0xff, 0xfe



This is incorrect. Please check the JPEG format specification the JPEG should start with 0xff, 0xd8 (according to its spec) the bytes that follow are common but other values are possible.  

Last edited by joachimm on Thu Oct 02, 2014 4:38 am; edited 1 time in total

joachimm
Senior Member
 
 
  

Re: manually deleted images

Post Posted: Sat Sep 27, 2014 9:41 am

- joachimm

This is incorrect. Please check the JPEG format specification the JPEG should start with 0xff, 0xd8 (according to its spec) the bytes that follow are common but other values are possible.


Just for the record that info is stated in the "introductory" page of photorec:
www.cgsecurity.org/wik...oRec_works
and on the "developers" page:
www.cgsecurity.org/wiki/Developers
and has been posted verbatim (but with a typo) by EvaMendis.

The pattern used in Photorec is definitely that one:
git.cgsecurity.org/cgi...file_jpg.c
most probably it derives by "observation of wild files in nature" Confused

CNWrecovery:
www.cnwrecovery.com/ma...rving.html
seemingly uses the same approach (but limitet to FFD8FFE0 and FFD8FFE1)

Possibly to avoid false positives?

The generic pattern like FFD8 might provide too many results:
www.ocf.berkeley.edu/~...pegrescue/

Trid's XML definition:
mark0.net/soft-tridscan-e.html
use instead FFD8FF (which possibly it is a "good compromise") Question

jaclaz
_________________
- In theory there is no difference between theory and practice, but in practice there is. - 

jaclaz
Senior Member
 
 
  

Re: manually deleted images

Post Posted: Sat Sep 27, 2014 11:00 am

- jaclaz

Just for the record that info is stated in the "introductory" page of photorec:


Taken out of context the documentation might give you the idea that you are correct but if you read on:


If PhotoRec has already started to recover a file, it stops its recovery, checks the consistency of the file when possible and starts to save the new file (which it determined from the signature it found).


Also if you look at the source code you see that photorec does much more to determine if it's dealing with a JPEG than the documentation indicates.

- jaclaz

The generic pattern like FFD8 might provide too many results:


Yes if the byte pattern is your only criteria the signal/noise rate is high.
But photorec also uses block alignment and format validation, which makes it produce higher quality results. Alas this technique is not suitable for every file system.

- jaclaz

use instead FFD8FF (which possibly it is a "good compromise")


This is indeed the longest unique byte signature of the start of a JPEG that conforms to the specification.
This does not mean you cannot use a longer signature if you know what you're looking for. For context it is not uncommon to see JPEG files that start wih: 0xff, 0xd8, 0xff, 0xe[2-9]  

joachimm
Senior Member
 
 
  

Re: manually deleted images

Post Posted: Sat Sep 27, 2014 1:21 pm

- joachimm

Also if you look at the source code you see that photorec does much more to determine if it's dealing with a JPEG than the documentation indicates.


Well, I was trying to be more accurate than the previous poster, and unless there are further "overrides" in other parts of the source code, this still seems to me pretty much accurate:

- jaclaz

The pattern used in Photorec is definitely that one:
git.cgsecurity.org/cgi...file_jpg.c


It seems to me like the patterns used are:
static const unsigned char jpg_header_app0[4]= { 0xff,0xd8,0xff,0xe0};
static const unsigned char jpg_header_app1[4]= { 0xff,0xd8,0xff,0xe1};
static const unsigned char jpg_header_app12[4]= { 0xff,0xd8,0xff,0xec};
static const unsigned char jpg_header_com[4]= { 0xff,0xd8,0xff,0xfe};


The rest are (seemingly Confused ) "further checks", ONCE the file header has been recognized as per above.

To make sure I ran photorec on a FAT12 floppy image to which I had written (and deleted) a .jpg image, several times hexediting each time the fourth byte.
Photorec found it when the fourth byte was E0, E1 EC and FE, BUT it failed to recover with fourth byte E2, E3 and E9. (did not test other values)

And while I don't doubt in the least that the "proper" way is the three bytes FFD8FF Smile (as TriD BTW uses), I was merely stating the fact that Testdisk does check for 4 bytes and that the fourth byte must be any of E0, E1, EC or FE in order for the file to be recognized and recovered, which is consistent with the provided quotes.

As such the documentation (in or out of context) seems like reflecting accurately what the tool actually does (which does not mean that the approach used is the "right" one, I was ONLY reporting what patterns were used in a few tools).

You should contact Cristophe Grenier about the "missing" patterns or about the approach photorec actually uses being incorrect.


jaclaz
_________________
- In theory there is no difference between theory and practice, but in practice there is. - 

jaclaz
Senior Member
 
 
  

Re: manually deleted images

Post Posted: Sat Sep 27, 2014 4:10 pm

- jaclaz

Photorec found it when the fourth byte was E0, E1 EC and FE, BUT it failed to recover with fourth byte E2, E3 and E9. (did not test other values)


Thx for testing. This is a very good objective approach to validate tooling and how it is working Wink

No idea why the author strayed from the spec here, looked up my notes on the matter of allowed first sections after the start of image (ff d8) (signatures are represented as binary string expressions):

application segment: "\xff[\xe3-\xef]"

Table segments:
"\xff\xc4" # Define Huffmann table (DHT)
"\xff\xcc" # Arithmetic coding condition table (DAC)
"\xff\xdb" # Define quantization table (DQT)

Reserved segments:
"\xff\xc8" # Start of Frame (JPG) (Reserved for JPEG extensions)
"\xff[\xf0-\xfd]" # Reserved for JPEG extensions
"\xff\xfe" # Comment (COM)
"\xff[\x02-\xbf]" # Reserved

- jaclaz

And while I don't doubt in the least that the "proper" way is the three bytes FFD8FF Smile (as TriD BTW uses), I was merely stating the fact that Testdisk does check for 4 bytes and that the fourth byte must be any of E0, E1, EC or FE in order for the file to be recognized and recovered, which is consistent with the provided quotes.


I assume you mean photorec here instead of testdisk. As indicated there is more to it.
To repeat the signature must be block aligned as well and will do format validation which is important by fragmentation e.g. by the file system itself. This has implications on when to use the tool or when not. So photorec might not find carve-able files if the situation is not favorable.

In the revit proof of concept carver the sequence "ff d8" was sufficient since file format validation is done, this should suffice for photorec as well. No idea why it was implemented in this manner in photorec.

- jaclaz

As such the documentation (in or out of context) seems like reflecting accurately what the tool actually does (which does not mean that the approach used is the "right" one, I was ONLY reporting what patterns were used in a few tools).


No worries, the remark regarding the documentation is largely to point out the missing important line that follows the highlighted section.

- jaclaz

You should contact Cristophe Grenier about the "missing" patterns or about the approach photorec actually uses being incorrect.


I can drop him a mail ask to improve the JPEG format support and to follow the spec.

I think this is a nice illustration of assumptions about tools Wink To be verbose I'm NOT of opinion the technique photorec uses is incorrect (in the sense of the word). IMO the cons described are a side effect of the technique used. I agree that the JPEG format support can be improved.

There are pros and cons to techniques tools use and you (in general) as the user will need to know favorable and unfavorable circumstances. Carving and recovery are particular tricky matters because sometimes recall matters, sometimes precision.

The remark regarding the incorrectness regarding the EvaMendis post is in various aspects:

The photorec wiki gives it as an example:

For example, PhotoRec identifies a JPEG file when a block begins with:


the EvaMendis post

Actually, It works with identifying JPGE file when Block begins :


The typos aside; I hope you can see the semantic difference. I'm missing the reasoning here when "for example" became "actually". And as you and I have pointed out there is significant more to it then the poster indicates. Which IMO a nice example that it is important to look under the hood and do cross checking Wink  

joachimm
Senior Member
 
 
  

Re: manually deleted images

Post Posted: Sat Sep 27, 2014 6:19 pm

- joachimm

I assume you mean photorec here instead of testdisk.


Yep, my bad Embarassed , I meant Photorec and not Testdisk, of course Smile .

My personal opinion is - as said - that the right way to check is for the three bytes FFD8FF (and being more "flexible" about the fourth byte), as, considering also the added mechanisms of check that Photorec has (as you pointed out), i.e. block alignment and I may add "footer" check it should be enough to avoid the largest part of "false positives".

We have to put however into account how different tools may have (even if through the same "function") a different use.

Photorec is essentially a Photo Recovery tool and not properly a "forensic" (or however "pure") carver, so it makes a lot of sense that it has "beginning of block check" (which independently from the three or four bytes header patterns will exclude a number of "embedded into other files images", including most "preview images" or "thumbnails" inserted in the EXIF data ).

TriD, being a "file identifier" has the "advantage" that it needs not such a check (since what you feed it with is an actual file and not a "random address on a disk image") and more than that it's output is "probable" file type.

For the record the "file" *nix utility has seemingly the much more "generic" two bytes pattern recognition of FFD8 :
darwinsys.com/file/
github.com/file/file/b...agdir/jpeg

About the semantics, to be picky, as I am Wink , the "proper" description should probably have been something *like*:
As an example, for JPEG images, Photorec first checks if the four bytes at the beginning of a block is any among FFD8FFE0, FFD8FFE1. FFD8FFEC or FFD8FFEF, and IF any of these conditions is met, it tentatively identifies the block as the beginning of a JPEG image and then makes a number of further checks to make sure that the block belongs to a valid JPEG image, the size of the image, etc. in order to actually recover the file.


And we have to note how here:
www.cgsecurity.org/wik...oRec_works
the text is:
For example, PhotoRec identifies a JPEG file when a block begins with:

While here:
www.cgsecurity.org/wiki/Developers
it is:
If the file format specifications aren't available, compare several samples to identify constant fields. In example, PhotoRec identifies a JPEG file when a block begins with:

possibly the distinction/misunderstanding is between "identifying" as in "tentatively identify" and "identify" as "identify and recover without further checks".

But yes, we are both on the same side when it comes to "assumptions" and how frequent they are Smile .

jaclaz
_________________
- In theory there is no difference between theory and practice, but in practice there is. - 

jaclaz
Senior Member
 
 
  

Re: manually deleted images

Post Posted: Mon Sep 29, 2014 2:47 pm

Probably it's just me, it is well possible that I am particularly unlucky, but it is strange how every single time I touch a can, it pops open Shocked and a zillion worms get on the loose out of it Sad .

While still pertaining to the carving approach, this is slightly bent towards data recovery, but still IMNHO intriguing.


Test.

Taken a small JPEG ("normal" JFIF one with header FFD8FFE0) named Base_hexE0.jpg I made 256 copies of it, named from modded_0x00.jpg to modded_0xFF.jpg hexediting on each the 4th byte to the corresponding value in name.

Then I ran on the whole set of 256 images from modded_0x00.jpg to modded_0xFF.jpg the jpegsnoop:
www.impulseadventure.c...snoop.html
in batch mode.

A large number of these modded images were considered "non-valid" JPEG's by the tool that stopped scanning just as soon as, passed the SOI "FFD8", it found an invalid set of 3rd and 4th byte.

A few images crashed the tool.

Results:
Values that produced a "valid" log (i.e. that continued the parsing after the header):
01, C4, C8, CC, DE, E0-FE

Values that produced an "invalid" log (i.e. that stopped the parsing after the header):
00, 02-BF, D0-DD, DF, FF


Values that crashed jpegsnoop:
C0-C3, C5-C7, C9-CB, CD-CF

ALL the images that crashed jpegsnoop are (I woudl say obviously) NOT viewable.

Now the "interesting part".

Among the "valid" values, were "normally" seen in an Explorer window in "preview mode" AND could be double clicked and displayed correctly with Microsoft Photo Editor (on XP SP2) ONLY:
01, CC, E0-EF, FE
Whilst these were NOT viewable Confused :
C4, C8, DE, F0-FD
(but jpegsnoop did display them fine)

BUT among the "invalid" values (that jpegsnoop could NOT display), these were viewable in Explorer/Photo Editor as above Shocked :
00,D0-D7,DC, FF

I made a small batch to replicate.
You need in the same directory you put/run it a "base image" (I suggest a small one) called Base_hexE0.jpg and HExAlter:
kuwanger.net/misc/hexalter.shtml

If you invoke the batch with the /ALL parameter it will create the 256 jpeg's in the same directory, whilst if you invoke it without parameters it will make the images already divided into three subdirectories Valid, Invalid and Crash.

jaclaz

Code:
@ECHO OFF
SETLOCAL ENABLEEXTENSIONS
SET Base=Base_hexE0.jpg

IF NOT %1.==/ALL. GOTO :makesets


FOR /L %%? IN (0,1,255) DO (
call :changeHex %%?
) 
GOTO :EOF

:makesets

::Valid
SET TargetDir=Valid
IF EXIST .\%TargetDir% RD /S /Q.\%TargetDir%
MD .\%TargetDir%
FOR %%? in (
   01
            C4          C8          CC 
                                          DE 
E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE
) DO CALL :make_jpgs %%?

::Invalid
SET TargetDir=Invalid
IF EXIST .\%TargetDir% RD /S /Q .\%TargetDir%
MD .\%TargetDir%
FOR %%? in (
00    02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF

D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD    DF

                                             FF
) DO CALL :make_jpgs %%?

::Crash
SET TargetDir=Crash
IF EXIST .\%TargetDir% RD /S /Q .\%TargetDir%
MD .\%TargetDir%
FOR %%? in (
C0 C1 C2 C3    C5 C6 C7    C9 CA CB    CD CE CF
) DO CALL :make_jpgs %%?

GOTO :EOF

:make_jpgs
ECHO %1
copy %base% .\%TargetDir%\modded0x%1.jpg>nul
hexalter .\%TargetDir%\modded0x%1.jpg 3=0x%1
GOTO :EOF

:changehex
CMD /C EXIT /B %1
SET "Line=%=ExitCode%"
SET "Line_hex=0x%Line:~-2%"
ECHO copy %base% modded%Line_hex%.jpg
copy %base% modded%Line_hex%.jpg>nul
hexalter modded%Line_hex%.jpg 3=%Line_hex%
GOTO :EOF

_________________
- In theory there is no difference between theory and practice, but in practice there is. - 

jaclaz
Senior Member
 
 

Page 1 of 6
Go to page 1, 2, 3, 4, 5, 6  Next