Reverse engineering CRC?

G

Gregory Ewing

Given some known data/crc pairs, how feasible is it to
figure out the polynomial being used to generate the crc?

In the case I'm looking at, it appears that the crc
size may be at least 24 bits, so just trying all possible
polynomials probably isn't doable.

An article I found hints at the possibility of using
GCDs to make the search more efficient, but doesn't go
into any details. Anyone know of any literature about
this?

If it helps, I have the ability to generate test cases
with known message contents to some extent, although
I don't have complete control over the contents. Also
it's a manual process, so generating large numbers of
them automatically isn't an option.
 
S

Steven D'Aprano

Given some known data/crc pairs, how feasible is it to figure out the
polynomial being used to generate the crc?

Can you just ask the application developer what CRC is being used? Or
look at the source code? Disassemble the binary?

In the case I'm looking at, it appears that the crc size may be at least
24 bits, so just trying all possible polynomials probably isn't doable.

"At least"? Can't you tell by looking at them?

A good place to start is here:

http://en.wikipedia.org/wiki/Cyclic_redundancy_check
http://en.wikipedia.org/wiki/List_of_checksum_algorithms

You can at least eliminate known CRCs. There doesn't appear to be any 24-
bit CRCs in common use, and very few other 24-bit checksums either, so
you're probably looking at a custom CRC.

An article I found hints at the possibility of using GCDs to make the
search more efficient, but doesn't go into any details. Anyone know of
any literature about this?

If you're reduced to trying to determine the polynomial from a sample of
checksums, that's a curve fitting problem. There are various ways to fit
polynomials through a set of known points, but as far as I know none of
them are designed for reverse-engineering CRCs.

You may be able to find something about curve-fitting using discrete
maths (i.e. where all values are limited to integers) or with constraints.

You should probably take this to somewhere like sci.crypt. Here's a
thread from a few years back which might give you some hints:

http://www.derkeiler.com/Newsgroups/sci.crypt/2008-07/msg00035.html

Or here:

http://stackoverflow.com/questions/401231/determining-crc-algorithm-from-
data-crc-embedded-application
 
D

Dave Angel

Steven said:
That page was interesting to read, especially since I've implemented the
three algorithms - CRC16, CRC32, and the reversed version of CRC16, all
in the long-distant past.

However, if there's anything in there about how to derive the polynomial
algorithm from (a few) samples I missed it entirely. Instead, what it
calls reverse engineering is figuring out how to modify a message to
force it to have a desired CRC value (when the CRC polynomial is already
known).

DaveA
 
G

Gregory Ewing

Steven said:
Can you just ask the application developer what CRC is being used? Or
look at the source code? Disassemble the binary?

There's no source, and the binary is enormous. I could ask,
but I wouldn't hold out much hope of them being willing to
tell me.
"At least"? Can't you tell by looking at them?

It's not entirely clear exactly which bytes are part of the
CRC. There are 3 adjacent bytes in the header of the file
that change when I modify the contents, which led me to
think it was a 24-bit CRC. But I now believe that one of
them is not part of the CRC, and it's actually 16 bits.

Using pycrc, I've now tried all possible 16-bit polynomials,
with various combinations of bit and byte reversal, but I
haven't found one that works consistently, so I'm wondering
whether it's using some non-standard algorithm.
 
D

Dave Angel

Gregory said:
There's no source, and the binary is enormous. I could ask,
but I wouldn't hold out much hope of them being willing to
tell me.


It's not entirely clear exactly which bytes are part of the
CRC. There are 3 adjacent bytes in the header of the file
that change when I modify the contents, which led me to
think it was a 24-bit CRC. But I now believe that one of
them is not part of the CRC, and it's actually 16 bits.

Using pycrc, I've now tried all possible 16-bit polynomials,
with various combinations of bit and byte reversal, but I
haven't found one that works consistently, so I'm wondering
whether it's using some non-standard algorithm.
Or even some other standard algorithm. If you know so little about the
value, how do you even know it's a CRC ? Could it be a ones-complement
sum, such as used in Ethernet?

Is the problem really worth it? The possibilities are practically
unbounded. And if the developer is really determined to make it
difficult, they could be doing multiple passes over the data, in which
case probably disassembly (or subtle debug tracing) may be your best bet.

DaveA

Dave
 
G

Gregory Ewing

Dave said:
If you know so little about the
value, how do you even know it's a CRC ? Could it be a ones-complement
sum, such as used in Ethernet?

I'm going by the fact that the application reports a
"CRC mismatch" when it's wrong. I can't be sure that what
it calls a "CRC" is really a true CRC, but it's more than
a simple sum, because changing one bit in the file results
in a completely different value.
Is the problem really worth it?

Probably not -- it looks like fixing the problem I'm trying
to fix by hand will be faster in the long run. I just thought
it might turn out to be a solved problem and someone could
point me to an algorithm for it, but it seems not.
 
D

Dave Angel

Gregory said:
I'm going by the fact that the application reports a
"CRC mismatch" when it's wrong. I can't be sure that what
it calls a "CRC" is really a true CRC, but it's more than
a simple sum, because changing one bit in the file results
in a completely different value.


Probably not -- it looks like fixing the problem I'm trying
to fix by hand will be faster in the long run. I just thought
it might turn out to be a solved problem and someone could
point me to an algorithm for it, but it seems not.
If you assume it's done in a single pass, and you know which byte is the
end of the buffer, I'd think you could learn a lot by just tweaking that
last byte. But I still think you'd want to automate your testing,
presumably by some keystroke-stuffer.

DaveA
 
G

Gregory Ewing

Dave said:
If you assume it's done in a single pass, and you know which byte is the
end of the buffer, I'd think you could learn a lot by just tweaking that
last byte.

I'm sure I would, but unfortunately I can't control the
last byte. The bytes that I can influence are some distance
back from the end of the data.
 
S

Steve Howell

Given some known data/crc pairs, how feasible is it to
figure out the polynomial being used to generate the crc?

In the case I'm looking at, it appears that the crc
size may be at least 24 bits, so just trying all possible
polynomials probably isn't doable.

An article I found hints at the possibility of using
GCDs to make the search more efficient, but doesn't go
into any details. Anyone know of any literature about
this?

If it helps, I have the ability to generate test cases
with known message contents to some extent, although
I don't have complete control over the contents. Also
it's a manual process, so generating large numbers of
them automatically isn't an option.


Hi Greg. I would at least flip one bit at a time on the first byte of
your data to see if the transformation is bitwise.
 
G

Gregory Ewing

Steve said:
Hi Greg. I would at least flip one bit at a time on the first byte of
your data to see if the transformation is bitwise.

I'm actually making good progress on this -- it turns out
there *is* a way of deducing the polynomial by looking at
the effect of single-bit flips. It's actually quite simple,
with no brute-force searching needed at all.

Things get a bit tricky when you don't quite know all
of the data that goes into the CRC, though, which seems
to be the case here...

I'm writing up an essay on my experiences. I'll post a
link when it's finished.
 
L

Lawrence D'Oliveiro

Dave Angel said:
However, if there's anything in there about how to derive the polynomial
algorithm from (a few) samples I missed it entirely.

Given that CRC is all just a sequence of xor operations, what happens if you
xor various pairs of CRCs together, wouldn’t that cancel out at least parts
of the operations?
 
L

Lawrence D'Oliveiro

I'm going by the fact that the application reports a
"CRC mismatch" when it's wrong. I can't be sure that what
it calls a "CRC" is really a true CRC, but it's more than
a simple sum, because changing one bit in the file results
in a completely different value.

They could be using a strong cryptographic hash and truncating it to 16 bits
or something.

In which case you’ve got your work cut out for you...
 
G

Gregory Ewing

Lawrence said:
They could be using a strong cryptographic hash and truncating it to 16 bits
or something.

In which case you’ve got your work cut out for you...

Nope, I've determined that it's actually a pretty standard
CRC, and it's even using one of the standard polynomials,
0x8005. I'll explain the details of how I figured that
out in my essay.

What confused me initially is that it seems to be adding
a few extra bytes to the checked data that aren't present
in the file. Figuring out what they're supposed to contain
is proving to be quite a headache...
 
E

Emile van Sebille

On 3/12/2010 3:24 AM Gregory Ewing said...
What confused me initially is that it seems to be adding
a few extra bytes to the checked data that aren't present
in the file. Figuring out what they're supposed to contain
is proving to be quite a headache...

Length?

Emile
 
A

Albert van der Horst

Given some known data/crc pairs, how feasible is it to
figure out the polynomial being used to generate the crc?

In the case I'm looking at, it appears that the crc
size may be at least 24 bits, so just trying all possible
polynomials probably isn't doable.

An article I found hints at the possibility of using
GCDs to make the search more efficient, but doesn't go
into any details. Anyone know of any literature about
this?

If it helps, I have the ability to generate test cases
with known message contents to some extent, although
I don't have complete control over the contents. Also
it's a manual process, so generating large numbers of
them automatically isn't an option.

If it is really a CRC, it is doable.

You can have an indication, if the intention is to detect
machine errors (transmission or disk errors) or they want
you to prevent tampering with the file.
In the latter case it may be a one-way hash. Then it is near
impossible, as this is the design criterion for a one-way hash.

Groetjes Albert
 
G

Gregory Ewing

I've solved the problem now.

It turned out to be a very standard CRC algorithm, complicated
by the presence of a few extra bytes of data being checked that
didn't appear explicitly in the file anywhere.

In the process I developed some very general techniques for
solving this kind of problem, which I've written about here
if anyone's interested:

http://www.cosc.canterbury.ac.nz/greg.ewing/essays/CRC-Reverse-Engineering.html

Thanks for everyone's help,
Greg
 
J

jkn

Hi Greg
Just to say thanks for taking the time to write up your work on
this interesting topic.

Cheers
J^n
 
G

geremy condra

I've solved the problem now.

It turned out to be a very standard CRC algorithm, complicated
by the presence of a few extra bytes of data being checked that
didn't appear explicitly in the file anywhere.

In the process I developed some very general techniques for
solving this kind of problem, which I've written about here
if anyone's interested:

http://www.cosc.canterbury.ac.nz/greg.ewing/essays/CRC-Reverse-Engineering.html

Thanks for everyone's help,
Greg

Nice writeup, thanks.

Geremy Condra
 
G

Gabriel Genellina

En Mon, 15 Mar 2010 07:29:51 -0300, Gregory Ewing
I've solved the problem now.

It turned out to be a very standard CRC algorithm, complicated
by the presence of a few extra bytes of data being checked that
didn't appear explicitly in the file anywhere.

In the process I developed some very general techniques for
solving this kind of problem, which I've written about here
if anyone's interested:

http://www.cosc.canterbury.ac.nz/greg.ewing/essays/CRC-Reverse-Engineering.html

A good solution to an interesting problem - and very nicely explained too!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top