Notifications
Clear all

Let's talk about MD5

26 Posts
12 Users
0 Likes
2,870 Views
Chris_Ed
(@chris_ed)
Posts: 314
Reputable Member
Topic starter
 

Good morning,

I'd like to talk about MD5.

MD5 is a large part of digital forensic life. The 3 most popular commercial imaging tools (EnCase, FTK, XWF) use MD5 by default to verify the results of the forensic imaging process. Hashsets used to identify file groups are commonly MD5 (in fact I'm not sure EnCase v6 even lets you use a different algorithm to hash individual files).

Now, it is well known that MD5 is "broken". In 2008, in fact, the "US-CERT" group specifically asked software developers to "avoid using the MD5 algorithm in any capacity" (they even put "Do not use the MD5 algorithm" in bold). It is relatively straightforward to produce an MD5 collision using tools readily available on the internet.

Does this mean we, as responsible Digital Forensicators, should throw MD5 out the window? Is it even possible, given some software reliance on it?

Personally, I'm not sure it matters. I have yet to see data manipulated in a significant way and yet still produce the same MD5 - for example, taking an image and completely altering it. Or changing a text file from reading "hey, i love that guy!" to "OH MAN I WISH I COULD MURDER HIM". Or injecting an incriminating JPEG into an E01 file and having it still verify correctly.

I'm not an arrogant person, and I'm more than happy to change my mind as long as the reasons are valid. So I ask you, FF, should we dispose of MD5 forever, or is it still a valid way of verifying file integrity?

(This post was inspired by Jon Stewart's excellent blog post, by the way - read it!)

 
Posted : 05/09/2012 2:37 pm
(@alexc)
Posts: 301
Reputable Member
 

Thinking outside the box for a moment - the fact that you can craft two executables with different functionality but the same MD5 is more worrying to me (and more impressive and more useful).

http//www.mscs.dal.ca/~selinger/md5collision/

And even with signed code http//blog.didierstevens.com/2009/01/17/playing-with-authenticode-and-md5-collisions/

Hiding malicious activity in executables marked in a hash set as being benign… that's neat.

 
Posted : 05/09/2012 3:02 pm
(@mscotgrove)
Posts: 938
Prominent Member
 

Most use of MD5 is as a digital signature to indicate that a file has not been changed. For this I would argue it is fine.

The problem is when someone very able creates files with benign MD5 values (eg to match a distributed microsoft file). This would be a very deliberate act and not a chance collision. If discovered it would raise questions very quickly.

If an investigator is looking for files based on just MD5 values then they must be aware of possible problems and should currently use SHA-1 or better. If it is just to detect corruption in a disk image, then MD5 should be fine.

 
Posted : 05/09/2012 3:29 pm
Chris_Ed
(@chris_ed)
Posts: 314
Reputable Member
Topic starter
 

It is neat! And I can see how it compromises software which relies on md5s for security. But even then, you can't generate a "targetted" MD5 collision - in the comments he specifically mentions this.

I am asking because I have seen talk that recently, a defence attorney successfully argued that the digital evidence could not be relied upon because the md5 algorithm is compromised. My feeling is that this is wrong, in a digital forensics context - once you have acquired your data (and produced an MD5 checksum), even with what we know about MD5 collisions, you cannot significantly change this data and produce the same checksum.

However, I recognise that I may be totally wrong on this. )

 
Posted : 05/09/2012 3:30 pm
(@alexc)
Posts: 301
Reputable Member
 

Most use of MD5 is as a digital signature to indicate that a file has not been changed. For this I would argue it is fine.

I agree, I guess my question would be (call me Mr. D. Advocate) "What if during the course of your investigation you had to deal with one of the situations where the "broken-ness" of MD5 mattered, why not use a hash which isn't substantially broken in any regard anyway?"

The answer to which I suspect would run along the lines of "All of our hash sets use MD5, so that's totally impractical"

In which case "Well that's fair enough…we better start making SHA-2 hash sets then…"

Another interesting angle is one of storage and efficiency SHA-512 hashes take up more space than MD5s, for particularly large hash sets this could perceivably become an issue; I also wonder if there would be a meaningful increase (or decrease) in processing time if you had to hash a well populated file system using SHA-512 vs. MD5, because, as we are all aware, time=money.

 
Posted : 05/09/2012 4:14 pm
(@alexc)
Posts: 301
Reputable Member
 

I am asking because I have seen talk that recently, a defence attorney successfully argued that the digital evidence could not be relied upon because the md5 algorithm is compromised.

Reminds me of this case http//www.thenewspaper.com/news/10/1033.asp

Where, of course, the defence argument is totally insane, but worked.

 
Posted : 05/09/2012 4:17 pm
(@jonathan)
Posts: 878
Prominent Member
 

I have seen talk that recently, a defence attorney successfully argued that the digital evidence could not be relied upon because the md5 algorithm is compromised)

That sounds unlikely - can you provide a link to the case? If not I'm putting it down as forensic folklore! wink

Back to the main discussion. Most examinations benefit from hashing every file in an image, and then checking them against a list of known 'irrelevant' hashes in order to disregard those files from further analysis

If there is a straightforward/automated method to alter the MD5 hashes of a group of 'illegal' files to the MD5 hashes of known 'irrelevant' files then this would fool the vast majority of forensic examiners who filter for known 'irrelevant' files. If you think that this is a reasonable possibility then do not use MD5.

 
Posted : 05/09/2012 4:36 pm
(@athulin)
Posts: 1156
Noble Member
 

Does this mean we, as responsible Digital Forensicators, should throw MD5 out the window? Is it even possible, given some software reliance on it?

Not without very careful argumentation, perferrably well quantified.

Me, I can't do that kind of math and probability analysis. I just look at those I consider authorities, and follow their advice. In 2003 (Bruce Schneier and Niels Ferguson Practical Cryptography) said 'Do no use MD5' – their recommendation was SHA-256/512. A string of successful attacks follows in 2005 to 2007. In 2009, Xie and Feng practically shoots MD5 out of the water by showing that the one in 2^64 probability for collision is more like one in 2^21 (which for a cryptographic hash function is a catastrophe). In 2011, RFC 6151 says, essentially, 'Do not use MD5 where collision resistance is required'.

Add to that, that most calculations of collision resistance assume random message content – and we already know that computer files are very rarely random. Yet the correction term for that seems to remain to be estimated. Anyone who comes up with a well derived correction term will automatically end up on my list of 'MD5 authorities'.

To me, it seems pretty clear that MD5 (as a cryptographic hash algorithm) is dead. Hence my vote.

I have yet to see data manipulated in a significant way and yet still produce the same MD5 - for example, taking an image and completely altering it. Or changing a text file from reading "hey, i love that guy!" to "OH MAN I WISH I COULD MURDER HIM". Or injecting an incriminating JPEG into an E01 file and having it still verify correctly.

That's a new set of criteria. Do they apply instead of or in combination with the criteria an algorithm should fulfill to be classed as a cryptographic hash function? Also, are they 'real', can they be expected to be achieved? I mean, it seems you could equally well say 'I have yet to see data manipulated in a significant way, and yet produce the same CRC32'.

… is it still a valid way of verifying file integrity?

Well, 'Do not use MD5 where collision resistance is required' is the threshold of acceptance lower than a collision probability of one in 2 ^ 21? That for CRC32 is at 2 ^ 16 … And here's where the well quantified argument comes in what is the acceptable threshold? As far as I know, noone in the computer forensic area has suggested one …

Also, why spend time on brinkmanship? I erase hard drives, not because I think it is necessary, but because it is almost always faster than to argue against someone claiming risk for contamination with previous content. So why waste time on this? I just do a SHA-512 in addition to whatever else I get from the acquiry tools– that's it.

 
Posted : 05/09/2012 5:46 pm
azrael
(@azrael)
Posts: 656
Honorable Member
 

One would assume that there, somewhere, is a set of files that these hashes have been created from - re-processing them with a new hash would be time consuming, but not impossible. I suspect though that the exclusion usage is largely irrelevant in comparison to the "alteration of evidence" argument. I, too, have my doubts about what I suspect is a Forensic urban myth. Whilst it is theoretically possibleto craft excecutables that match MD5 sums, it would substantially alter an image or other file, and unless it was a very specific case such a defence would be limited.

I suspect that in a majority of cases an MD5 hash is still sufficent, but, given the ease of generation of SHA hashes, why not do both? Manipulating _two_ hashes is incredibly difficult, it will also allow you to create a hash set to replace/translate your MD5 and be unassailable in court. MD5 has a place still, but should be weighed against the risks.

From a security perspective, in a high impcat environment, with skilled attackers, I would be concerned ( and indeed, don't use it! ) but in a majority of commercial applications it is acceptable.

 
Posted : 05/09/2012 6:47 pm
PaulSanderson
(@paulsanderson)
Posts: 651
Honorable Member
 

Well, 'Do not use MD5 where collision resistance is required' is the threshold of acceptance lower than a collision probability of one in 2 ^ 21? That for CRC32 is at 2 ^ 16 … And here's where the well quantified argument comes in what is the acceptable threshold? As far as I know, noone in the computer forensic area has suggested one …

Lies, damn lies and statistics

I have not read up on any of the background of MD5 collisions for a while, actually not since it came up in court last. But the above caught my attention.

Based on the quote above that the probability of a collision is 2^21 this, taken at face value, means in any set of slightly over 2 million files we should expect to get a collision. In the real world how often does that happen? taking a case that I am currently working on with approx 3 million files and sorting by hash I can see that the only duplicate hashes are filenames where I would expect to see the same content. So where is the expected collision? OK a very quick and unscientific test but one that Is born out by real world experience is that a) I have never seen (noticed) a collion in the wild and b) I can only remember hearing about one case (and that may have been a myth - and so long ago I cant remember where).

So as we all (should) know we can make anything we want of statistics and we need to treat any with a bit of common sense.

So, I am looking for illegal images based on a huge table of MD5's of known illegal images, should I trust/use MD5? hell, yes - I am not going to just say that their are 15,000 illegal images based on hash - I am going to actually look at them and make sure that they are illegal. MD5 in this case is the means to an end, not the end itself.

So, I am using MD5 signatures to exclude a load of "known" executables/files/whatever so that I can reduce the number of files I need to review in a hacking/malware/whatever case, should I use MD5? hell, yes - it's a means to an end and if a review of what remains finds nothing this approach won't be the only string to my bow.

So, I am using MD5 to authenticate an encase image, is it safe? hell, yes - the CRC's and MD5 tell me that the file hasn't changed, nothing here tells me that it hasn't been tampered with (I could easily write a program to plant evidence in an encase image and adjust the affected CRC's and MD5 to match - but there is no point as there are easier ways to do the same thing and they would equally defeat SHa1 etc.).

Not an exhaustive list but my point is for most (if not all) forensic work I would continue to trust MD5 but as always keep an open mind.

 
Posted : 05/09/2012 7:30 pm
Page 1 / 3
Share: