J
Joe keane
So zero isn't a number? ;-)
It is; it's in octal.
So zero isn't a number? ;-)
*Like* Ethernet MTU constraints... or UDP, or whatever. I could
have worded that better.
Badly. I would avoid this if I were you. Google for "Fragmentation
considered harmful"
I am sure there are PHY layers such that 64 k byte as an MTU
isn't a problem. At that point, go for it.
But this is *also* problematic. If you use blocking sockets, then
you are subject to the whims of whatever it is that is
unreliable between you and the far end.
If you use nonblocking sockets, you get a piece at a time
and get to do your own reassembly.
And when you use *blocking* sockets, you may well *hang*
on a read() ioctl() at embarrassing times. This may well
require a full-on reboot.
Hence "when it's a buffer"....
Not in actuality.
SO you have two choices - either treat "unlimited stream size" as a
natural right, and then have to go fix it when this assumption fails,
or understand the lower layers and plan accordingly.
I know which one I do...
You have to be sure you're not opening a security hole for an exploit.On Sun, 2013-12-15, Les Cargill wrote:
People want to say something like 'char buf[5000]' and get away with
it. That includes me -- I don't want to optimize for rare and silly
scenarios every time I read a string.
I don't think the HTTP RFC puts a limit to the line lengths, or the
total size of the request -- but in reality it would be foolish to
allow a client to sit for hours feeding in more and more data; the
only valid reason to do so is a DoS attack.
So yes, I agree that it's usually silly to handle a multi-megabyte
string. But the lower layers are not the reason.
Hey all,
(My recent post on this question on stackoverflow put on hold as being 'opinion-based': stackoverflow.com/questions/20556729/is-c-an-unsuitable-choice-as-a-string-parser)
I am considering C as a candidate for implementing a string parser.
+ first specialized on English, but can be extended to parse arbitrarily any character encoding
+ tokenizes strings which may then be used to search a data store
+ allows optional embedding of tools like Natural Language Tool Kit (python) for more lexical analytic power
My task feels simple and limited -- not bad for a C program -- but I keeprunning across comments about C like 'prefer a language with first class string support' (e.g., stackoverflow.com/a/8465083/3097472)
I could prefer something like Go Lang, which seems to have good string processing, but C appeals because of the performance offered for the relatively reduced complexity of the task. Moreover, it seems that a library like ICU may help...
Can readers suggest any prima facie reason(s) not to use C as a string parser?
But I have little doubt that there will be cases in which
either will outperform the other. "I use 'C' because it's
faster" is pretty weak tea and is possibly a signal
of premature optimization. ...
On Sun, 2013-12-15, Les Cargill wrote:
People want to say something like 'char buf[5000]' and get away with
it. That includes me -- I don't want to optimize for rare and silly
scenarios every time I read a string.
You have to be sure you're not opening a security hole for an exploit.
In a lot of programming environments, it's not an issue. But where it is,
the consequences can be serious.
Uh.
Uploads. I have used more than one page which allows file uploads,
and those are implemented as HTTP requests. Pretty sure that can in
at least some cases imply an HTTP request which is in fact going
to be feeding in data for a long time, and if there's a slow link,
that could be minutes, certainly.
You have to be sure you're not opening a security hole for an exploit.On Sun, 2013-12-15, Les Cargill wrote:
People want to say something like 'char buf[5000]' and get away with
it. That includes me -- I don't want to optimize for rare and silly
scenarios every time I read a string.
In a lot of programming environments, it's not an issue. But where it is,
the consequences can be serious.
Very.
You have to be sure you're not opening a security hole for an exploit.On Sun, 2013-12-15, Les Cargill wrote:
People want to say something like 'char buf[5000]' and get away with
it. That includes me -- I don't want to optimize for rare and silly
scenarios every time I read a string.
In a lot of programming environments, it's not an issue. But where it is,
the consequences can be serious.
Yes; I like that the GNU compiler will warn you about some unsafe
practices. Buffer overflow is insidious.
Case in point: ....
If I'm reviewing someone else's code, and I see something like
"char buf[5000]", alarm bells go off.
Buffer overflows. Not even once.
It's better to get into the way of thinking that the program will perform theIf I'm reviewing someone else's code, and I see something like
"char buf[5000]", alarm bells go off.
That's certainly a place in the code you need to examine, but what I'm
arguing is it doesn't have to be a bug. If e.g. you document "input
lines may not be larger than 4999 characters or the program will abort
with an error message" it's fair and sane and noone will complain.
(Assuming of course that you don't introduce an overflow.)
Buffer overflows. Not even once.
Yes, but not accepting infinite inputs and buffer overflows are
separate issues.
Sure. In some situations letters with accents are the same letter as lettersC allows you to manipulate chars easily. But once you look at it closer
(Unicode standard is a good start), a "letter" is quite a different matter.
UTF8 is just the start (where code points consist of 1 to 4 chars). But then
you have letters that are made up from multiple code points.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.