substring finding problem!

B

blmblm

(e-mail address removed) wrote:

"Whatever." [*]

[*] Normally I would write <shrug> here, but that usage now has
unfortunate associations.

HUH? I don't think I got that email. Could you explain further?

Well, I'd rather not give complete details, but I *thought* that
this usage had been employed by someone I would not want to be
perceived as emulating. But in reviewing the record, I think
I'm probably mistaken about that.

<shrug>
 
B

blmblm

Exactly. He's plonked, because that way I see only the funny parts.

Ah, but do you? At least one of the people still interacting with
him (Richard Heathfield) snips non-technical content. I won't say
you're missing anything you'd enjoy, but there's plenty that a person
inclined to perceive and react to insult would -- react to.
Yeah. I still remember:
i << 3 + i + i
but that doesn't mean I'd ever use it.

Yipes. Where did you encounter this one? Multiplication by ten,
right?
It appears that he meant, not matrix multiplication, but multiplication of
every element in a matrix by a fixed value. For some reason, he seems to
think that multiplying everything in a matrix by a compile-time constant is
a likely operation. For some other reason, probably a less plausible one,
he seems to think that the shift-as-multiply trick actually buys you
something.

Well, to be fair, aren't there circumstances in which it might?
in which the programmer knows that one of the operands of the
multiply operator is a power of 2, but the compiler wouldn't be
able to detect that? It does seem like the kind of microoptimization
that one would hesitate to do without a compelling reason, though.
Interesting. I've never heard the term used that way.

I hadn't either. Is there where I brandish my educational
credentials (undergrad degree in math)? Nah. I think that's
rather gauche. Besides, that undergrad degree is pretty well-aged
by now, and it sometimes surprises me how much of what I presumably
learned is no longer retrievable from memory.
And I have to say, your code certainly did a great job of undermining the
idea that strings were predictably null-terminated. At least in your code.


...

Wait, does this imply that he thinks the << thing is some kind of secret
that not everyone knows? As an idle curiousity, I've asked my coworkers
to see whether any of them *haven't* seen that.

Well, I interpreted the "messing with [his] head" remark to indicate
that he was trying to taunt you or provoke you in some way. But
I could be mistaken about that.
 
B

blmblm

Sure. I just thought it might be fun to try to come up with a
semi-formal specification that *doesn't* involve narratives about
what the program is doing. I like that sort of thing, but I guess
not everyone does. "Whatever." [*]

Actually, that's a good point. It's clearer in that it tells you the
answer without requiring you to think it through.

I'm not sure I understand what you mean by this -- I think you do still
have to think through how the function's output relates to its input,
but you can do this without thinking about how you would implement it.
I think of this as a sort of static perspective on specification, as
opposed to a dynamic one that involves thinking in terms of "first
the code does this, then it does that", and viewing things from that
perspective -- it's something I was taught to do in graduate school,
and once I caught on I found it amazingly powerful. But I'm a former
math major, which I think may bias me in favor of formal approaches.
Thanks to spinoza1111,
we are now aware that at least some people can implement a solution to
the problem without ever thinking about how it works enough to realize
that there is no question of overlapping strings. Clarifying it explicitly
is probably beneficial.

Agreed -- except I'd say there's no "probably" about it. :)?
 
B

blmblm

in that message above there is one error
.e: push edi
call _free
.e0: xor eax, eax

has to be written like

.e: push edi
call _free
add esp, 4
.e0: xor eax, eax

Huh -- the copy I have of your code actually has that fix ....
i forget to clear the stack;
but why not use this
news:[email protected]

Apparently that's the one I actually worked from. Sorry about
putting in the wrong message ID -- I think I only decided *after*
retrieving code that it might be nice to also have matching message
IDs, and obviously in the process of looking up people's code a
second time .... Oops.
yes there is a little improve
and the replace routine has different arguments
char* __stdcall Replacer_m(unsigned* len,
char* origin, char* whatSost, char* sost);

but it is possible write
char* replace_iox(char* origin, char* whatSost, char* sost)
{unsigned len;
return Replacer_m(&len, origin, whatSost, sost);
}

it should be always 3 times slowrer, at last, in compare with yours.

Why should it be slower? I don't know x86 assembler and so am not
going to try to figure out what it does, but why should it be slower?

(I'm also curious about why you chose assembler. ? )
But for this time, it is enought for here.

Yeah. :)?

[ snip ]
 
B

blmblm

Idle curiousity: How's mine do? I haven't checked to see what the official
interface is, but I'm pretty sure this is adequately obvious. It presumably
suffers from double-scanning, but I don't know how much that matters. It
doesn't do a lot of mallocs.

char *
rep(const char *in, const char *out, const char *target) {

[ snip ]

It looks like Ben has already run a more-complete set of benchmarks,
but I ran yours through my testing/benchmarking code as well.

For the record, it passes all of the spinoza1111 tests (well,
it did after I realized that you were using a different parameter
order from everyone else and fixed that).

Times .... I'll put them all in again, but this time only for
compiling with -O2.

About my six versions .... v1 is a naive implementation that scans
the input once to count matches, to allow computing the right
length for the output string, and then again to actually do the
replacement. v2 scans once and builds a list of matches, which is
then used to do the replacement, but it makes no attempt to avoid
calling strlen repeatedly on the to_replace/replacement strings.
v3 avoids that by passing around string lengths too, where needed.
"lib" versus "user" is which implementation of the string.h
functions I used.

The two versions that build a list call malloc for each list
element. I thought that would slow things down, but apparently
not -- the versions that are really slow are the ones that make
lots of calls to my implementation of strlen. The library version
of that appears to be *much* faster.

On the old/slow system:

bacarisse 4.47 seconds
blmblm-v1-lib 10.78 seconds
blmblm-v1-user 35.97 seconds
blmblm-v2-lib 9.60 seconds
blmblm-v2-user 35.10 seconds
blmblm-v3-lib 7.86 seconds
blmblm-v3-user 8.58 seconds
harter-1 5.67 seconds
harter-2 5.94 seconds
io_x 18.05 seconds
nilges 7.72 seconds
seebach 8.75 seconds
thomasson 4.08 seconds
willem 7.18 seconds

On the newer/faster system:

bacarisse 1.74 seconds
blmblm-v1-lib 3.33 seconds
blmblm-v1-user 12.73 seconds
blmblm-v2-lib 2.73 seconds
blmblm-v2-user 11.20 seconds
blmblm-v3-lib 2.52 seconds
blmblm-v3-user 3.33 seconds
harter-1 2.58 seconds
harter-2 2.27 seconds
io_x 9.82 seconds
nilges 2.36 seconds
seebach 3.16 seconds
thomasson 1.69 seconds
willem 2.77 seconds
 
B

blmblm

No, you're too cowardly to talk to me. You haven't plonked jackshit.

What makes you think that? The complete refusal to respond to any
questions or taunts -- to me that's a good sign that your posts are
not even being read. I suppose cowardice is an alternate explanation,
but so is "I refuse to dignify these insults by responding".

I'm tempted to quote the rest of your post as a sample in case I'm
right about the explanation. But -- nah, "let's you and him fight"
is really not very attractive behavior, is it?

[ snip ]
 
B

blmblm

spinoza1111 said:
On Feb 23, 10:38 pm, (e-mail address removed) <[email protected]>
wrote:
And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
STRINGS. If you use it to construct a table of replace points you're
gonna have an interesting bug-o-rama:
replace("banana", "ana", "ono")
IF you restart one position after the find point, and not at its end.
Why would you do that, though? only if you *wanted* to detect
Search me. But that's what the code I was discussing actually did.

What code is that? I've traced back through predecessor posts, and
the only one that comes close to including code is the one in which
Chris Thomasson references his code in

http://clc.pastebin.com/f62504e4c

which on a quick skim doesn't seem to me to be looking for
overlapping strings.

My code handled string overlap after the bug was pointed out to me,
BEFORE any other code.

So when you said "the code I was discussing", you meant *your*
code? Oh! I understood you to be saying that using strstr() is
dangerous because it finds (or doesn't find?) overlapping strings,
and interpreted "the code I was discussing" to be someone else's
code, someone who was using strstr(). Faulty communication.
I'm too sick of the shit that goes on here to
make a collection of all solutions and find what probably are many
failures, but one of my contributions was to pass on the test case.

There's a lot of claims and counterclaims here and at least two
discussants are complete shitheads. However, we KNOW that other
posters used the test suite I created AFTER my code worked with that
test data.

Is this some kind of race to find out who can post a solution first?
If so, um, haven't you expressed disapproval of boasting about speed
of coding? Or do you suspect others of cribbing from your solution?
I can tell you that I didn't -- reading code is not one of my best
things anyway, and I thought it would be more fun to write my own
code before looking at others'.
overlapping strings, and -- if you did detect them, what would
you do with them? I can't think of any sensible definition of
"replace" that does anything with overlapping strings [*], so
replace(banana, ana, ono) could equal
bonona going left to right without overlap
banono going right to left without overlap
bonono going both ways with overlap

There's a semi-sane answer here in the last case, but isn't

HOW.DARE.YOU. How DARE you start talking about sanity? It isn't
collegial, and it is libel and completely insensitive. It's talking
like those thugs and shitheads here, Seebach and Heathfield.

The word "sane" was meant to apply to the answer, not to a person.
I don't have enough information to form an opinion I'd want to share
publicly about your sanity.

I could offer to substitute "sensible" for "sane" in what I wrote,
but that might not be any better received.

Whether the apparent lack of communication here is due to poor
writing on my part or something else -- I don't know. At least
one other person appears to have interpreted my words in the
intended way (and replied to that effect).

[ snip ]
The fact that there is a group of answers does not make the question a
question of a crazy man! In fact, it makes it a good scientific
question, albeit over the heads of the creeps here.

My point was that I don't think that there's an obvious most-sensible
choice here. How about if you just answer my question -- should
replace(banana, ana, xno) be bxnono or bxnxno? If you aren't sure,
how do you decide what your code should do?

[ snip ]
I gave it to you: a hypothetical but possible natural language in
which adjacent lexemes must be split and modified.

You've posited a scenario in which attempting to replace
overlapping strings would be useful or meaningful. What I'm
not getting is an exact specification of how you think it should
work. What should replace(banana, ana, xno) be? Or are there
restrictions on input that would exclude it from consideration?
And what's this "whatever"?

It means I couldn't think of a graceful way to express my intended
meaning and decided to just bail out of the sentence. Trying here:

Without a clear specification of what should be done about
overlapping matches, I don't think it makes sense try to come up
code or even an algorithm.

[ snip ]
Well, my greater experience with object oriented development in C
Sharp and VB has taught me that given either an adequate OO language,
or sufficient intelligence and patience, concat can work either way
without much drama. In C, the direction has to be a crummy parameter
that is easy to get wrong.

My usage of "concat" here was meant to indicate a mathematical/formal
operation on strings, not a call to a function in some programming
language, real or imagined. How can *that* imply a direction? As
I said, it seems to me that considered as a mathematical operation
string concatenation is associative. Maybe there's something I'm
not getting, though.
Don't patronize me.

That was not my intent. (And really, I don't think you're in the
strongest position to talk about not patronizing other posters.)

[ snip ]
In my opinion the functions declared in string.h include some
that are very useful in writing a replace() function as specified
here -- I think of them as useful abstractions for the problem
domain. Could one define similar functions if strings were *not*
represented as null-terminated contiguous sequences of characters ....

Well, certainly one could, but whether they'd be unacceptably
inefficient might depend on how strings were represented.
C's approach to representing strings allows one to have multiple
strings in a single character array and to easily regard a suffix
of a string as a string in its own right, both of which strike me
as useful in context. One could (I think) get a similar effect
by defining strings to be objects consisting of a length and a
pointer into an array of characters. If strings were represented
as a length immediately followed by a sequence of characters --
not so much.

I'm not sure I'm explaining this very well or thinking it through
carefully, but perhaps it will advance the discussion a bit anyway.
[*] My objection to this constraint is that any minimally competent
programmer should be able to write functions that implement the
same API, so just avoiding use of the library functions doesn't
seem to me to make the problem more interesting.
No. The API locks us into bad thoughts.

I'd say "how so?" but I'm not optimistic about getting an answer
I'd find useful.

I've already told you. Strings terminated ON THE RIGHT with a Nul is a
bad thought for two reasons:

* It prevents Nul from occuring in a string
* It mandates Eurocentric left to right processing

What I'm still not getting is how either of these things .... Let
me try to explain:

To me what makes the functions in string.h useful in dealing
with strings is the operations they perform, not the string
representation they operate on. I can easily imagine rewriting
most if not all of them to operate on strings that are represented
in some other way (as an object containing or pointing to a
sequence of characters and a length, say, where "characters" might
be elements of the ASCII character set or elements of some other
set). I can also imagine adding something that allows specifying
whether processing should be left to right or right to left.

So, what exactly is in string.h .... A partial list of functionality
provided:

* copy characters (memcpy, memmove, etc.)

* copy a string (strcpy)

* compare strings (strcmp)

* concatenate strings (strcat)

* duplicate a string (strdup)

* search for a character in a string (strchr)

* search for a string in a string (strstr)

* get a string's length (strlen)

Sounds like a reasonable assortment to me -- perhaps not including
everything anyone would want, but these all sound useful to me.
My respect o meter in your case is diminishing.

Probably good; I think it was miscalibrated at some point.
As to a length code limiting string length, do the math. 2^63 - 1 or
2^64 - 1 is a big number, and run length codes can be used, especially
in OO programming. Furthermore, even if the string is longer, you can
still process it with an unknown length, which OO programming handles
quite nicely.

That's not the point I was making -- what I meant was that if the
representation requires that the length and the actual characters
be contiguous, you can't get a substring simply by pointing into
an existing string, as you can in C, and I think that has some
semi-obvious disadvantages. There might be other reasons not
to use such a representation, and it might not be one you were
considering in any case. But I have some vague recollection of
hearing about *some* implementation of strings that works that way.
I could be mistaken.
 
B

Ben Bacarisse

Seebs <[email protected]> wrote:

Yipes. Where did you encounter this one? Multiplication by ten,
right?

No, though it is a detail. I thought Seebs was making a point ("if
you code like this you'll make mistakes like this one I remember") but
I could be wrong about that.

You could add it to your examples for your students to have to debug
with and without -Wall: + binds more tightly than << and >> in C.
Aside: this is a hard one to remember unless you know C++:

cout << 3 + i + i;

Well, I interpreted the "messing with [his] head" remark to indicate
that he was trying to taunt you or provoke you in some way.

There's no doubt in my mind that he is. Given the abuse and invective
hurled at him, it is to Seebs's great credit that he has been able to
sit on his hands.
 
W

Willem

Ben Bacarisse wrote:
)>< snip >
)> willem (O2) 2.77 seconds
)> willem (O3) 4.16 seconds [ can this be right?! ]
)
) Looks wacky to me! Is it repeatable?

Bwahahaha! I love it!

) Here are my times (also gcc 4.4.1 and libc 2.10.1). I seem to have a
) faster machine. The first number are your times (for reference) and
) the second are mine (in seconds). The third column is the ratio of
) the two. You can see that there is more going on than just the speed
) of the machine.
)
) <snip>
) willem (O2) 2.77 0.813 3.41
) willem (O3) 4.16 0.885 4.70
)
) If we are now measuring the same things, it seems that some code is
) favoured by my system (yours for example) and some does not do so
) well. I suspect interactions with the various caches but that is a
) huge guess.

This is quite interesting! I would really like to see the generated
assembly for -O2 and -O3 for my code. I guess I can retrieve my code
from the usenet archive and compile it, but I don't know which of the
two solutions I posted was tested here. (The iterative or the recursive
one ?)

PS: For testing you would also need different match patterns, including
some that contain repeated strings or stuff like that, especially
if you're comparing 'smart' against 'dumb' algorithms.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
S

Seebs

Ah, but do you? At least one of the people still interacting with
him (Richard Heathfield) snips non-technical content. I won't say
you're missing anything you'd enjoy, but there's plenty that a person
inclined to perceive and react to insult would -- react to.

You have a point. Maybe I should unplonk him so I can delight in the
madness. I am an aficionado of Usenet kookery. Honestly, I was a little
sad to come back here and find Scott Nudds gone. :(
Yipes. Where did you encounter this one? Multiplication by ten,
right?

In the C library for Aztec C for the Amiga. Which leads me to assume either
that they knew that their compiler didn't always make that optimization (or
possibly that it shouldn't always), or that they were too clever for their
own good.
Well, to be fair, aren't there circumstances in which it might?
in which the programmer knows that one of the operands of the
multiply operator is a power of 2, but the compiler wouldn't be
able to detect that? It does seem like the kind of microoptimization
that one would hesitate to do without a compelling reason, though.

Hmm.

Here's the thing. If it's a constant, the compiler can obviously tell whether
it's a power of two. If it's not a constant, but I know it's a power of two,
figuring out which power of two it is will almost certainly cost more than
multiplication. Furthermore, it's not at all obvious to me that I should
assume that a given modern CPU will shift that much faster than it multiplies.
Well, I interpreted the "messing with [his] head" remark to indicate
that he was trying to taunt you or provoke you in some way. But
I could be mistaken about that.

I have no idea. He's really challenged my assumption that other things which
use language are generally volitional actors which engage in goal-directed
behavior, certainly.

-s
 
S

Seebs

No, though it is a detail. I thought Seebs was making a point ("if
you code like this you'll make mistakes like this one I remember") but
I could be wrong about that.

Whoops. No, I just made the mistake. It was probably right in the original.
There's no doubt in my mind that he is. Given the abuse and invective
hurled at him, it is to Seebs's great credit that he has been able to
sit on his hands.

Not really:
1. He's plonked, so I only see a few of them.
2. I'm autistic. Insults only communicate data to me in most cases. In his
case, they communicate that he's angry but deeply incompetent.

Mostly I just figure he's cheap entertainment. My usenet feed is cheaper
than cable, and spinoza1111's funnier than most comedy shows.

-s
 
S

Seebs

The two versions that build a list call malloc for each list
element. I thought that would slow things down, but apparently
not -- the versions that are really slow are the ones that make
lots of calls to my implementation of strlen. The library version
of that appears to be *much* faster.
Interesting!

nilges 7.72 seconds
seebach 8.75 seconds

That, I must admit, totally surprises me. I would have thought that the
cost of strstr() would be trivial compared to the cost of malloc(). I guess
not for some input data!

Which suggests that, if this would be run often enough, by enough people, who
would be waiting on the output, it is conceivable that it could be worth
spending the extra 8-10 hours of programming effort, plus the lifetime
maintenance effort, for the more complicated code.

Or, alternatively, that it would at least make sense to consider one of the
"don't rescan" options.

-s
 
B

Ben Bacarisse

Willem said:
Ben Bacarisse wrote:
)>< snip >
)> willem (O2) 2.77 seconds
)> willem (O3) 4.16 seconds [ can this be right?! ]
)
) Looks wacky to me! Is it repeatable?

Bwahahaha! I love it!

) Here are my times (also gcc 4.4.1 and libc 2.10.1). I seem to have a
) faster machine. The first number are your times (for reference) and
) the second are mine (in seconds). The third column is the ratio of
) the two. You can see that there is more going on than just the speed
) of the machine.
)
) <snip>
) willem (O2) 2.77 0.813 3.41
) willem (O3) 4.16 0.885 4.70
)
) If we are now measuring the same things, it seems that some code is
) favoured by my system (yours for example) and some does not do so
) well. I suspect interactions with the various caches but that is a
) huge guess.

This is quite interesting! I would really like to see the generated
assembly for -O2 and -O3 for my code. I guess I can retrieve my code
from the usenet archive and compile it, but I don't know which of the
two solutions I posted was tested here. (The iterative or the recursive
one ?)

I can't help because, for whatever reason, I don't see the difference
that B L Massingill sees.
PS: For testing you would also need different match patterns, including
some that contain repeated strings or stuff like that, especially
if you're comparing 'smart' against 'dumb' algorithms.

Sure. I've used a wide variety of test strings but I've seen no point
in posting the results because they are, by and large, rather
predictable but also rather hard to summarise.
 
S

spinoza1111

spinoza1111  said:
On Feb 23, 10:38 pm, (e-mail address removed) <[email protected]>
wrote:
[ snip ]
And note that "using strstr" has its own dangers. IT FINDS OVERLAPPING
STRINGS. If you use it to construct a table of replace points you're
gonna have an interesting bug-o-rama:
replace("banana", "ana", "ono")
IF you restart one position after the find point, and not at its end.
Why would you do that, though?  only if you *wanted* to detect
Search me. But that's what the code I was discussing actually did.
What code is that?  I've traced back through predecessor posts, and
the only one that comes close to including code is the one in which
Chris Thomasson references his code in
http://clc.pastebin.com/f62504e4c
which on a quick skim doesn't seem to me to be looking for
overlapping strings.
My code handled string overlap after the bug was pointed out to me,
BEFORE any other code.

So when you said "the code I was discussing", you meant *your*
code?  Oh!  I understood you to be saying that using strstr() is
dangerous because it finds (or doesn't find?) overlapping strings,
and interpreted "the code I was discussing" to be someone else's
code, someone who was using strstr().  Faulty communication.
I'm too sick of the shit that goes on here to
make a collection of all solutions and find what probably are many
failures, but one of my contributions was to pass on the test case.
There's a lot of claims and counterclaims here and at least two
discussants are complete shitheads. However, we KNOW that other
posters used the test suite I created AFTER my code worked with that
test data.

Is this some kind of race to find out who can post a solution first?
If so, um, haven't you expressed disapproval of boasting about speed
of coding?  Or do you suspect others of cribbing from your solution?
I can tell you that I didn't -- reading code is not one of my best
things anyway, and I thought it would be more fun to write my own
code before looking at others'.

I believe I express mostly disapproval about speed of coding as a way
of dodging issues, not speed of coding per se. For example, I didn't
like it at all when Brian Kernighan, in the recent O'Reilly collection
Beautiful Code, praised Rob Pike for only taking an hour to write a
"regular expression processor" because:

* Pike's code wasn't a full or true regular expression processor
* The fact that it took an hour doesn't change the above

Seebie bragged about taking "ten minutes before breakfast" in response
to questions about whether he was solving a problem correctly or in
depth, because in fact, in corporations, programmers, in contrast to
real engineers, seem to believe that in all cases speedy coding makes
up for almost any failing.
overlapping strings, and -- if you did detect them, what would
you do with them?  I can't think of any sensible definition of
"replace" that does anything with overlapping strings [*], so
replace(banana, ana, ono) could equal
bonona going left to right without overlap
banono going right to left without overlap
bonono going both ways with overlap
There's a semi-sane answer here in the last case, but isn't
HOW.DARE.YOU. How DARE you start talking about sanity? It isn't
collegial, and it is libel and completely insensitive. It's talking
like those thugs and shitheads here, Seebach and Heathfield.

The word "sane" was meant to apply to the answer, not to a person.
I don't have enough information to form an opinion I'd want to share
publicly about your sanity.

That manages to be rather snide, in my view.
I could offer to substitute "sensible" for "sane" in what I wrote,
but that might not be any better received.

It would have been an improvement. But there are plenty of words such
as "correct" which have zero personal connotation.  
Whether the apparent lack of communication here is due to poor
writing on my part or something else -- I don't know.  At least
one other person appears to have interpreted my words in the
intended way (and replied to that effect).

[ snip ]
The fact that there is a group of answers does not make the question a
question of a crazy man! In fact, it makes it a good scientific
question, albeit over the heads of the creeps here.

My point was that I don't think that there's an obvious most-sensible
choice here.  How about if you just answer my question -- should
replace(banana, ana, xno) be bxnono or bxnxno?  If you aren't sure,
how do you decide what your code should do?

Whoa. I'm not sure. "Science" is about possibility as well as fact.

However, I do think that for the same reason your notion of "concat"
is cool since it is independent of direction, I think that a "flat" or
one-time application of "replace" is one of those phony notions that
only seem useful. The basic notion is not replace once, it is replace
until no change, as in macro replacement. I think we can prove that
there's no instance of a replace that always changes the string when
applied.

Let us call an implementation of replace(master, target, replacement)
"kewl" when and only when it is "independent of left to right or right
to left order". I claim that the only form of replace that is "kewl"
is nondeterministic. To simulate it you'd have to apply the
replacement rule randomly. It would sometimes return bonona, and other
times it would return banono.

(Chorus of you say tomayto I say tomahto).

This is an interesting NEGATIVE point. It means that there are
probably bugs out there.

It's a CORRECT result without being, of course, a reasonable
SPECIFICATION for real code. But that don't mean it's not useful.
Turing's Halting Problem is True, and created software, but it's not a
spec.

The truth is something which can be applied immanently as a critical
tool to some spec, but in corporate life, the central idea is that the
employee is idemnified if she works to rule or spec. Requirements
definition is depressing because it excludes critical thought in favor
of applied Postivism.

I mean, ask a kid.

"Why a four year old could do this! Get me a four year old!"

One thing I find terribly amusing and at the same time rather sad in
this intellectual slum is the contrast between the hackneyed,
conventional, authoritarian and "mature" thinking of grownups here,
and my elementary students in my real job (I teach a range of students
from primary to uni).

I think a child would have a great deal of difficulty learning how to
manually do a replace, and would ask if clever about "banana". I think
a real mathematician thinks like a child and would not be satisfied
with a replace() applied once, deterministically and left to right. I
don't know, however, if there is any "real" work on this.

And despite the arcane flavor of this material, the failure to even
consider cases thought "imprecise" because they are non-deterministic
creates real bugs, as when the user says, "oh no, in THAT case you
need to change banana to banono." "Oh, really? Why?" "Because our
customer in Antigua wants it that way."

Children, in making "mistakes", make discoveries, that it's just the
sort of thing programmers miss. For example, Peter didn't ask himself
what would be the case if %s was in the substitution string and would
probably consider the question so quirky as to make it safe to gravely
infer that the asker of the question is a nutbar, and to Call
Security.
[ snip ]




I gave it to you: a hypothetical but possible natural language in
which adjacent lexemes must be split and modified.

You've posited a scenario in which attempting to replace
overlapping strings would be useful or meaningful.  What I'm
not getting is an exact specification of how you think it should
work.  What should replace(banana, ana, xno) be?  Or are there
restrictions on input that would exclude it from consideration?

No, I am not trying to come up with an exact specification, only a
general approach.
It means I couldn't think of a graceful way to express my intended
meaning and decided to just bail out of the sentence.  Trying here:

Without a clear specification of what should be done about
overlapping matches, I don't think it makes sense try to come up
code or even an algorithm.

Why is it that in the corporation
The so-called clear specification
Is so often very dear
Costing loads of megabucks,
And never, almost never, ever clear?

The Germans had a Schlieffen Plan
A sort of military requirements definition:
But Tuchman the histori-an
Said that the plan was typically Teuton:
Everything was she said perfectly laid on
Only to fail at the critical point:
The plan violated the neutrality of Belgium.
To "focus", to be obedient, administered and precise
Is to be inhuman, and not very nice:
Yet we fear, even in something so rigid
As computer programming, so precise and so frigid,
Our queer humanity.

And why oh why is actual clarity
So seldom treated with charity
Euclid was as clear as day,
But seems to some a fussbudget and gay
Because he was precise in what he had to say.

But as it happens...as it turns out,
In what Adorno called the administered world,
Merely extending common sense,
Is unrewarded, unsung, and without recompense.
It transgresses in the name of truth
And so it's treated without ruth.

Requirements are ersatz and miss the point
Requirements, I say, please just go away:
We don't need no steenking requirements
Let's do a daily build and use extended common sense
That is, in fact, nothing more or less than science.
[ snip ]




Well, my greater experience with object oriented development in C
Sharp and VB has taught me that given either an adequate OO language,
or sufficient intelligence and patience, concat can work either way
without much drama. In C, the direction has to be a crummy parameter
that is easy to get wrong.

My usage of "concat" here was meant to indicate a mathematical/formal
operation on strings, not a call to a function in some programming
language, real or imagined.  How can *that* imply a direction?  As
I said, it seems to me that considered as a mathematical operation
string concatenation is associative.  Maybe there's something I'm
not getting, though.

I agree. Concat is independent of direction in the abstract. But as
Dijkstra saw, there's a problem when the theory becomes reality. The
fact is that to real developers of the corporate-slave class, the
connotation of concatenation is left to right, they being uneducated
in linguistics.
That was not my intent.  (And really, I don't think you're in the
strongest position to talk about not patronizing other posters.)

It's not patronizing when there is a genuine difference, in abilities
that is belied by the patronizing behavior. So don't speak to me of
what is sensible until you have established enough credibility. I
appreciate your collegiality but as yet I don't see any vast
difference in ability that would make "patronizing" inapplicable.

Real "patronizing" is only crudely inferable from a tone of voice. It
is relative to whether the patronizer is without merit adopting an *ex
cathedra* style. I think in my case it is sometimes appropriate to do
so. I would it were not so, for I would prefer to meet better
programmers here.
 
S

spinoza1111

Whoops.  No, I just made the mistake.  It was probably right in the original.


Not really:
1.  He's plonked, so I only see a few of them.
2.  I'm autistic.  Insults only communicate data to me in most cases.  In his
case, they communicate that he's angry but deeply incompetent.

Mostly I just figure he's cheap entertainment.

Well, seebie: only in America, my experience as an expat tells me, are
people so proud of watching TV and instead of finding sermons in
stones, or books in running books, find pleasure without instruction
in laughing at supposed inferiors.

If you think what you say above, and if you are accessing these
discussions at work, I hope it's a termination offense to waste your
company's bandwidth on your entertainment. Most of us are here for
bonafide reasons.

When you say you plonk, you lie.
 
S

spinoza1111

Exactly.  He's plonked, because that way I see only the funny parts.

Ah, but do you?  At least one of the people still interacting with
him (Richard Heathfield) snips non-technical content.  I won't say
you're missing anything you'd enjoy, but there's plenty that a person
inclined to perceive and react to insult would -- react to.
Yeah.  I still remember:
   i << 3 + i + i
but that doesn't mean I'd ever use it.

Yipes.  Where did you encounter this one?  Multiplication by ten,
right?
It appears that he meant, not matrix multiplication, but multiplication of
every element in a matrix by a fixed value.  For some reason, he seems to
think that multiplying everything in a matrix by a compile-time constant is
a likely operation.  For some other reason, probably a less plausible one,
he seems to think that the shift-as-multiply trick actually buys you
something.

Well, to be fair, aren't there circumstances in which it might?
in which the programmer knows that one of the operands of the
multiply operator is a power of 2, but the compiler wouldn't be
able to detect that?  It does seem like the kind of microoptimization
that one would hesitate to do without a compelling reason, though.
Interesting.  I've never heard the term used that way.

I hadn't either.  Is there where I brandish my educational
credentials (undergrad degree in math)?  Nah.  I think that's
rather gauche.  Besides, that undergrad degree is pretty well-aged
by now, and it sometimes surprises me how much of what I presumably
learned is no longer retrievable from memory.




And I have to say, your code certainly did a great job of undermining the
idea that strings were predictably null-terminated.  At least in your code.

Wait, does this imply that he thinks the << thing is some kind of secret
that not everyone knows?  As an idle curiousity, I've asked my coworkers
to see whether any of them *haven't* seen that.

Well, I interpreted the "messing with [his] head" remark to indicate
that he was trying to taunt you or provoke you in some way.  But
I could be mistaken about that.

Not without cause. Although accusations of "trolling" made against me
are false because I'm truth based, I'm not above teasing people by
asking questions they probably can't answer, especially when those
people have created careers by way of the politics of personal
destruction.

However, nobody but me seems to notice that part of the general
culture of these newsgroup is amnesia about traditional ethics owing
to its replacement by an ironic hacker ethos. In that ethos, it seems
to be OK to destroy individual standing, because in what Jared Lanier
calls "digital Maoism" all that matters is group "consensus".

Components of the hacker pseudo-ethos:

* Running code (and the "rough" consensus of the Lynch law) as
opposed to correct software
* Treatment of artifacts such as computers and abstractions as more
important than human beings
* Autism
* A fundamental lack of decency
* A mythos in which the hacker fantasizes himself as uniquely
valuable to his company when he's in fact a dime a dozen
* Majoritarian tyranny
* Disrespect for midlevel authorities that make safe targets
 
N

Nick

Seebs said:
That, I must admit, totally surprises me. I would have thought that the
cost of strstr() would be trivial compared to the cost of malloc(). I guess
not for some input data!

Which suggests that, if this would be run often enough, by enough people, who
would be waiting on the output, it is conceivable that it could be worth
spending the extra 8-10 hours of programming effort, plus the lifetime
maintenance effort, for the more complicated code.

Or, alternatively, that it would at least make sense to consider one of the
"don't rescan" options.

I've just come up with another way to do this that might be interesting
and - IIRC - fits the specification. I don't think anyone has proposed
this one.

Calculate the maximum size the output string could possibly be. As
there are no overlaps then this is the length of the input string,
divided by the length of the match string, multiplied by the length of
the replacement (unless the replacement string is shorter than the
match, in which case it is just the length of the input). A minor tweak
for rounding will be needed here (probably just by adding one to the
result of the divide).

Allocate a temporary buffer that size (VLA anyone?). Do a single
scan-and-replace pass, collecting the actual length as you go along.
Allocate a buffer the right size, copy the results into it.

If I didn't hav real coding to do today, I might try to implement it.
 
B

Ben Bacarisse

Nick said:
I've just come up with another way to do this that might be interesting
and - IIRC - fits the specification. I don't think anyone has proposed
this one.

Calculate the maximum size the output string could possibly be. As
there are no overlaps then this is the length of the input string,
divided by the length of the match string, multiplied by the length of
the replacement (unless the replacement string is shorter than the
match, in which case it is just the length of the input). A minor tweak
for rounding will be needed here (probably just by adding one to the
result of the divide).

Allocate a temporary buffer that size (VLA anyone?). Do a single
scan-and-replace pass, collecting the actual length as you go along.
Allocate a buffer the right size, copy the results into it.

Yes, someone did propose this. In fact I recall seeing it (or
something very similar to it) implemented. I can't recall who it was,
for which I apologise. It's a neat idea though probably not practical
in some cases (replace(war-and-peace, "\n", "<paragraph>\n")).

<snip>
 
C

Chris M. Thomasson

[...]
However, nobody but me seems to notice that part of the general
culture of these newsgroup is amnesia about traditional ethics owing
to its replacement by an ironic hacker ethos. In that ethos, it seems
to be OK to destroy individual standing, because in what Jared Lanier
calls "digital Maoism" all that matters is group "consensus".

Components of the hacker pseudo-ethos:

* Running code (and the "rough" consensus of the Lynch law) as
opposed to correct software
* Treatment of artifacts such as computers and abstractions as more
important than human beings

* Autism
^^^^^^^^^^^^^^^^^


What exactly does autism have to do with any of this? I know some higher
functioning autistic people, and quite frankly, they are some of the most
brilliant individuals I have ever had the pleasure to be around. They can
tear a problem apart by visualizing every aspect of it in there mind. They
seem to be able to think in highly detailed pictures. It's extremely neat
and I wish I had a fraction of their abilities.
 
N

Nick Keighley

[...] The API locks us into bad thoughts.
I could [swear] you told me there were no bad books. And yet there are
bad thoughts? double plus ungood.

Sure. Freedom of speech is completely consistent with criticism of
published thought, but NOT by poseurs. Seebach, in my opinion, is a
poseur. Therefore, my freedom of speech enables me to call him a
poseur.

Your own thought seems to proceed in true regular guy mode:

have I been insulted? Is it good or bad to be a "regular guy"?
Chomsky 3 substitution of strings,
[...] Strings terminated ON THE RIGHT with a Nul is a
bad thought for two reasons:
*  It prevents Nul from occuring in a string
a count preceeded string is bad because it limits the maximum size of
the string. I suppose a chain of blocks doesn't have this limitation.

I suppose not. However, the length may be inexpressible if it exceeds
what in C is called "long long" precision. OO systems handle this
cleanly: it gives C the willies.

OO isn't magic. If there is a way to do this then C will be able to do
it as well. Yeah, I know I'm drifting into "all useful programming
languages are turing complete and therefore in a sense equivalent".
But I'll argue that this particular problem doesn't require the C
programmer to implement an interpreter.

The turing tarpit, where everything is possible but nothing is
tractable
nope. You are confusing representation and presentation. The nul isn't
at the right hand end it's at the largest address. If the display
device chooses to print r-to-l instead of l-to-r it makes not [the]
blindest bit of difference.

What on EARTH does the word "print" mean?

to make marks on a piece of paper.

[...]
I'm aware that left to right can be reversed by layering software,
something at which C programmers suck because C sucks at it.

Your solution forces the wogs to get wog devices that print backwards.

or a device that isn't biased. Like a laser printer. They basically
don't care what direction they go in. I suspect it wouldn't take much
to make a dot-matrix or ink-jet do similar stuff. I can print pictures
on my dot-matrix and pictures aren't euro-centric. Those of us that
stayed in the industry are aware that the chain printer is no more.
(not that those were direction biased either!)
However, it still forces developers to think eurocentrically with the
result that the non-Latin output is necessarily a second choice.

we're repeating ourselves. But I think my point about representation
and presentation (or display) is worth thinking about.
Boo hoo. Like I said, people are thinking in comic book terms:

Alice: But Dr Nilges, you fool! A long long takes eight bytes to
store!
Nilges: (Evil laugh) I care not! Storage is almost free! I vill use 8
bytes per string and rule zee verld! Nyahh ha ha! Nice jugs! Nyah ha
ha!
Timmy: Mommy I'm scared!
Ruff: Woof woof!

I have to confess I smiled the first time I saw one of your little
dialogues. But the joke doesn't really last.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,104
Messages
2,570,643
Members
47,246
Latest member
rangas

Latest Threads

Top