Finding a bit sequence in another

J

jacob navia

Hi

I am learning C, and to improve my skills, I am rewriting the STL in C.
One of the functions I need to write is "Contains", that finds a sequence
in another container, for instance, an easy one is

strstr(source, sequence);

That finds a substring ("sequence") in "source".

I am writing bit strings now, and the problem is to find if a
sequence of bits is inside a larger sequence of bits.
Let's say the signature is

bool bit_strstr(char *source, size_t source_len,
char *sequence, size_t sequence_len);

The length is given in BITS.

The more I think about it, the more complicated this thing looks.

Of course I suppose that in this forum of C specialists many people
could give me good hints.

Thanks in advance.

A newbie
 
C

Chris McDonald

jacob navia said:
I am learning C, and to improve my skills, I am rewriting the STL in C.
One of the functions I need to write is "Contains", that finds a sequence
in another container, for instance, an easy one is

.....

Of course I suppose that in this forum of C specialists many people
could give me good hints.

Not a direct answer, but there are many interesting and helpful bit-based
functions here:
http://www.jjj.de/bitwizardry/
 
B

bartc

jacob navia said:
Hi

I am learning C, and to improve my skills, I am rewriting the STL in C.
One of the functions I need to write is "Contains", that finds a sequence
in another container, for instance, an easy one is

strstr(source, sequence);

That finds a substring ("sequence") in "source".

I am writing bit strings now, and the problem is to find if a
sequence of bits is inside a larger sequence of bits.
Let's say the signature is

bool bit_strstr(char *source, size_t source_len,
char *sequence, size_t sequence_len);

The length is given in BITS.

The more I think about it, the more complicated this thing looks.

Of course I suppose that in this forum of C specialists many people
could give me good hints.

If you have the means to do bit-indexing on the source and sequence, then
this shouldn't be any different from finding a ordinary list or string
inside a longer one. Unless you want it fast...

What have you got so far?
 
U

user923005

Hi

I am learning C, and to improve my skills, I am rewriting the STL in C.
One of the functions I need to write is "Contains", that finds a sequence
in another container, for instance, an easy one is

strstr(source, sequence);

That finds a substring ("sequence") in "source".

I am writing bit strings now, and the problem is to find if a
sequence of bits is inside a larger sequence of bits.
Let's say the signature is

bool bit_strstr(char *source, size_t source_len,
                char *sequence, size_t sequence_len);

The length is given in BITS.

The more I think about it, the more complicated this thing looks.

Of course I suppose that in this forum of C specialists many people
could give me good hints.

Probably, the question is better matched on http://www-igm.univ-mlv.fr/~lecroq/articles/PosterSofsem09.pdf
http://www.chu-rouen.fr/l@stics/fichiers/Faro2009.pdf
http://user.it.uu.se/~kostis/Papers/esop04.pdf
http://goanna.cs.rmit.edu.au/~ldesilva/nextj.pdf
http://www.movsd.com/bm.htm
 
A

Alan Curry

Hi

I am writing bit strings now, and the problem is to find if a
sequence of bits is inside a larger sequence of bits.
Let's say the signature is

bool bit_strstr(char *source, size_t source_len,
char *sequence, size_t sequence_len);

When you write the documentation for this function, please clarify the bit
ordering. Bit endianness is not obvious enough to leave unstated. Some people
think the least-significant bit naturally comes first, ordering the bits from
(1<<0) to (1<<7). Others think the most-significant bit naturally comes
first, since that matches the order in which multi-digit numbers are usually
written. And when a byte is partially filled, it's also not obvious which end
will be unused.

Is the 4-bit sequence 1101 encoded as:
"\x0d" (high bits unused, sequence ends at low bit)
"\x0b" (high bits unused, sequence starts at low bit)
"\xd0" (low bits unused, sequence starts at high bit)
"\xb0" (low bits unused, sequence ends at high bit)
 
B

bartc

Alan Curry said:
When you write the documentation for this function, please clarify the bit
ordering. Bit endianness is not obvious enough to leave unstated. Some
people
think the least-significant bit naturally comes first, ordering the bits
from
(1<<0) to (1<<7). Others think the most-significant bit naturally comes
first, since that matches the order in which multi-digit numbers are
usually
written. And when a byte is partially filled, it's also not obvious which
end
will be unused.

Is the 4-bit sequence 1101 encoded as:
"\x0d" (high bits unused, sequence ends at low bit)
"\x0b" (high bits unused, sequence starts at low bit)
"\xd0" (low bits unused, sequence starts at high bit)
"\xb0" (low bits unused, sequence ends at high bit)

Wouldn't it be better if this information was not disclosed? You seem to
have explained why it isn't a good idea to directly access the bit data
rather than going through the interface.

If you imagine these bitstrings can be manipulated like arrays, and indexed
from 0 to N-1 bits, then it may not be necessary to know the bit order
within a char value. (And a slice of such an array could even start part-way
through a char value.)

Whatever the order, the chances are that a 32-bit slice of a such a
bitstring, even if it was aligned to a char, would not have the bits laid
out in memory the same way as an equivalent 32-bit int value, assuming you
even knew which end was the least significant bit.
 
A

Alan Curry

Alan Curry said:
bool bit_strstr(char *source, size_t source_len,
char *sequence, size_t sequence_len);
[...]
Is the 4-bit sequence 1101 encoded as:
"\x0d" (high bits unused, sequence ends at low bit)
"\x0b" (high bits unused, sequence starts at low bit)
"\xd0" (low bits unused, sequence starts at high bit)
"\xb0" (low bits unused, sequence ends at high bit)

Wouldn't it be better if this information was not disclosed? You seem to
have explained why it isn't a good idea to directly access the bit data
rather than going through the interface.

I assumed the bitstring representation wasn't meant to be opaque since the
pointer and length were passed separately, instead of being wrapped up in a
struct which would discourage direct access.
 
J

jacob navia

Alan Curry a écrit :
Alan Curry said:
bool bit_strstr(char *source, size_t source_len,
char *sequence, size_t sequence_len); [...]
Is the 4-bit sequence 1101 encoded as:
"\x0d" (high bits unused, sequence ends at low bit)
"\x0b" (high bits unused, sequence starts at low bit)
"\xd0" (low bits unused, sequence starts at high bit)
"\xb0" (low bits unused, sequence ends at high bit)
Wouldn't it be better if this information was not disclosed? You seem to
have explained why it isn't a good idea to directly access the bit data
rather than going through the interface.

I assumed the bitstring representation wasn't meant to be opaque since the
pointer and length were passed separately, instead of being wrapped up in a
struct which would discourage direct access.

They are (of course!) part of the bitstring container. I did not want to
clutter the question with extra definitions so I did not mention that.
 
N

Nick Keighley

Alan Curry said:
Wouldn't it be better if this information was not disclosed? You seem to
have explained why it isn't a good idea to directly access the bit data
rather than going through the interface.

If you imagine these bitstrings can be manipulated like arrays, and indexed
from 0 to N-1 bits, then it may not be necessary to know the bit order
within a char value. (And a slice of such an array could even start part-way
through a char value.)

While I'm all for hiding stuff that shouldn't be seen, this seesm to
be going too far. Consider a bit oriented protocol. I read a bunch of
bits off a channel and then want to find a pattern in that bit
sequence (frame marker or some such). The order the bits appear in is
kind of important!
Whatever the order, the chances are that a 32-bit slice of a such a
bitstring, even if it was aligned to a char, would not have the bits laid
out in memory the same way as an equivalent 32-bit int value, assuming you
even knew which end was the least significant bit.

I'm not sure if this matters
 
B

bartc

Nick said:
While I'm all for hiding stuff that shouldn't be seen, this seesm to
be going too far. Consider a bit oriented protocol. I read a bunch of
bits off a channel and then want to find a pattern in that bit
sequence (frame marker or some such). The order the bits appear in is
kind of important!

If they're read one bit at a time and stored in the bitstring type one bit
at a time, then the order is maintained.

If they are instead delivered from the device a byte or word at a time, then
the order is significant, but I don't know that knowing which of a dozen
different ways the bitstring type orders it's bits (or even whether it
stores a full 8 bits per byte) is going to be helpful, especially as the
device may have it's own bit order.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top