Pattern Search...?

cybit friendly · Feb 5, 2004

Hi

I'm stuck due to lack of knowledge.

Problem: I'm scanning through Array1[1000][1000] to find the closest match
to Array2[50][50] within Array1. At the moment I'm using a method that can
best be described as "brute force"..I'm literally scanning through the
entire Array1 using the entire Array2. The routine quits if a match conforms
to a threshhold cut-off. The problem is: *it's slow!*

Please could someone point me in the right direction? A name of a method or
algorythm that I could implement?

Appreciated!

Claudio Puviani · Feb 5, 2004

cybit friendly said:
Hi

I'm stuck due to lack of knowledge.

Problem: I'm scanning through Array1[1000][1000] to find the closest match
to Array2[50][50] within Array1. At the moment I'm using a method that can
best be described as "brute force"..I'm literally scanning through the
entire Array1 using the entire Array2. The routine quits if a match conforms
to a threshhold cut-off. The problem is: *it's slow!*

Please could someone point me in the right direction? A name of a method or
algorythm that I could implement?

Since you don't define "closest match", my advice is likely to be off base, but
bits are cheap, so...

Try to create a "digest" of Array2 such that if the digest of another array of
the same size is numerically close, the two arrays are probably close matches.
You can then compute digests of the subarrays of Array1 and compare them to the
pre-computed digest of Array2. If you design your digest such that you can
recompute it by removing elements and adding new ones, you can then quickly scan
Array1 by doing S-shaped sweeps, thereby minimizing the changes to the current
digest.

Claudio Puviani

cybit friendly · Feb 5, 2004

Hi Claudio

The problem is that Array1[1000][1000] is for all intents and purposes an
array of random numbers, I have neither preconception of nor control over
the contents of Array1. On the other hand, Array2[50][50] is also an array
of random numbers, the only difference is that I know the value and location
of each number in that matrix. My goal is to determine whether the pattern
of numbers in Array2 occurs somewhere in Array1, and if it does not, which
50x50 region in Array1 will contain an array which is most similar to
Array2. That is to say, of all the possible 50x50 arrays in the original
1000x1000 array, which is the most similar to Array2. It can be safely
assumed that a perfect 100% match will never occur.

The most obvious way to determine this is to physically break Array1 up into
50x50 arrays. Starting at (0,0)(50,50), then to (1,0)(51,50)...all the way
to (950,950)(1000,1000), I compare each one od these 50x50 'subarrays' with
Array2. The one with the least difference is considered the 'closest match'.
If, however, the difference is more than a certain threshhold (say more than
25%), it is not considered similar enough and then no match is defined.

I am looking for a better, more optimized method of performing this search.
One way (which is best avoided) is to do what I do, but using assembly.

ciao

cybit

Claudio Puviani · Feb 5, 2004

cybit friendly said:
Hi Claudio

The problem is that Array1[1000][1000] is for all intents and purposes an
array of random numbers, I have neither preconception of nor control over
the contents of Array1. On the other hand, Array2[50][50] is also an array
of random numbers, the only difference is that I know the value and location
of each number in that matrix. My goal is to determine whether the pattern
of numbers in Array2 occurs somewhere in Array1, and if it does not, which
50x50 region in Array1 will contain an array which is most similar to
Array2. That is to say, of all the possible 50x50 arrays in the original
1000x1000 array, which is the most similar to Array2. It can be safely
assumed that a perfect 100% match will never occur.

The most obvious way to determine this is to physically break Array1 up into
50x50 arrays. Starting at (0,0)(50,50), then to (1,0)(51,50)...all the way
to (950,950)(1000,1000), I compare each one od these 50x50 'subarrays' with
Array2. The one with the least difference is considered the 'closest match'.
If, however, the difference is more than a certain threshhold (say more than
25%), it is not considered similar enough and then no match is defined.

I am looking for a better, more optimized method of performing this search.
One way (which is best avoided) is to do what I do, but using assembly.

ciao

cybit

Actually reading my previous answer might help.

Claudio Puviani

cybit friendly · Feb 5, 2004

Firstly, what do you mean by "digest"?

If I understand you correctly, you're suggesting that I create a unique
numeric "ID" for Array2 (by using its contents), and then creating similar
IDs for similarly sized arrays in Array1, and then comparing IDs...?
However, unless I scan every 50x50 subarray in Array1, I might miss a
possible match.

Please elaborate!

Christoph Rabel · Feb 5, 2004

cybit said:
The most obvious way to determine this is to physically break Array1 up into
50x50 arrays. Starting at (0,0)(50,50), then to (1,0)(51,50)...all the way
to (950,950)(1000,1000), I compare each one od these 50x50 'subarrays' with
Array2. The one with the least difference is considered the 'closest match'.
If, however, the difference is more than a certain threshhold (say more than
25%), it is not considered similar enough and then no match is defined.

I am looking for a better, more optimized method of performing this search.
One way (which is best avoided) is to do what I do, but using assembly.

Hi!

Maybe a stupid idea, but I mention it anyway:

Calculate sums of the rows and columns in the (50,50) Matrix.

Then compare these sums with sums over the 50 elements of
the big matrix. Since the sum over each 50 element range
differ only in the the values of two elements it should be
rather cheap to calculate.

If the sums are nearly equal the matrix will be "equal" with
a high probability. So you can maybe do a presearch and get
several small areas that you can search afterwards.

In case your values are truly random this wont get you far.

And probably sum isn't the right thing to do. But, even
applying a more complex metric should be much cheaper than
comparing all values...

hth

Christoph

P.S.: Please dont toppost.

David Fisher · Feb 6, 2004

cybit friendly said:
Problem: I'm scanning through Array1[1000][1000] to find the closest match
to Array2[50][50] within Array1. At the moment I'm using a method that can
best be described as "brute force"..I'm literally scanning through the
entire Array1 using the entire Array2. The routine quits if a match conforms
to a threshhold cut-off. The problem is: *it's slow!*

I am not sure what the definition of "closest match" is; do you mean
something like this:

Find (x, y) such that the sum of the absolute values of (Array2[a] -
Array1[x+a][y+b]) for all 0 <= a < 50, 0 <= b < 50 is minimised.

What is the application ? If this is for image recognition, there may be
better ways ...

David F

Pyautogui, cv2 and cannot find image	0	Feb 7, 2023
Loop in a loop?	24	Jan 17, 2008
[SUMMARY] Word Search (#107)	0	Jan 4, 2007
pre-PEP: Standard Microthreading Pattern	4	May 1, 2007
prefix search on a large file	8	Oct 12, 2006
Regexp discovery - using ^ with /m is a time sink	5	Feb 14, 2009
Search DOM elements with XPath (getElementById too slow)	4	Mar 29, 2007
Sun Grid Engine / NFS and Python shell execution question	1	Jul 22, 2010

Pattern Search...?

cybit friendly

Claudio Puviani

cybit friendly

Claudio Puviani

cybit friendly

Christoph Rabel

David Fisher

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads