Regular expression 0 - number range.

M

Mike Duffy

... 2062-02-29 is not a date, since 2062 isn't a leap year.

Yes, but you said "considered as a date". When the javascript engine
"considers" the invalid date string you supplied "as a date", it
corrects it if possible.


alert(new Date('Feb 29 2001'));

Will pop up the value:

Thu Mar 01 2001 00:00:00
 
T

Thomas 'PointedEars' Lahn

Dr said:
Thomas 'PointedEars' Lahn posted:

I am commenting on what you wrote and I quoted.

You have ripped my statement out of its specific context (in which it was
correct), and then stated that my statement was in general, and utterly
wrong. I will leave it to you to look up the name of that fallacy.
Note also that, for testing a user-supplied YYYY-MM-DD string for value
being between two given YYYY-MM-DD strings, it is not sufficient to do
only a straight string comparison. The string 2062-02-29 undoubtedly
lies between the strings 2040-11-21 and 2069-12-31, but, considered as a
date, it does not.

In my book it does. Proof: If progression of linear time in our universe is
the ordering relation to check against, and if we assume today is
2062-02-29, then 2040-11-21 is most certainly a date in the past, and
2069-12-31 is most certainly a date in the future. And IIRC, it was one of
the intentions behind ISO 8601 (and regional standards based upon it) to
make sure that string comparison of date representations worked, for example
to be able to easily sort a file listing by date of modification.

But that does not change the fact that Regular Expressions are not the
suitable means to validate date *ranges* in general.


PointedEars
 
D

Dr J R Stockton

Tue said:
The problem is that 2062-02-29 is not a date at all, since 2062 isn't
a leap year.

Being in the numeric/alphabetic range isn't sufficient validation.

Rem acu tetigisti.
While I'm sure you can make a RegExp that matches only valid dates too,
it's simply not the most effective way of doing that.

I seem to be on the way to disproving that, or nearly so. I have
acquired a single RegExp for 1900-2000 which seems about as fast in
Firefox 3 as is my best Date Object method; and I have written a RegExp
which will properly validate all 8-digit Gregorian Dates YYYY-MM-DD,
disregarding those which are not February 29th (that case could be
copied from the acquired code, but I intend to write anew).

The RegExp method is at least free of Date Object bugs; and the Safari
Date Object appears to have a 3 in its algorithm where a 4 might work
better.

I had thought that one could nowadays rely on the Danes for sanity.
This seems to be a disproof :
<http://copenhagensuborbitals.com/campaignjune2011.php> -- <G>

P.S. Yes, one RegExp, written in 5 lines with whitespace to be ignored;
Gregorian YYYY-MM-DD, years 0000-9999, months 00-13, days 00-32, test
takes 9 seconds in Chrome 11.0,
 
L

Lasse Reichstein Nielsen

Dr J R Stockton said:
I have acquired a single RegExp for 1900-2000 which seems about as
fast in Firefox 3 as is my best Date Object method; and I have
written a RegExp which will properly validate all 8-digit Gregorian
Dates YYYY-MM-DD, disregarding those which are not February 29th
(that case could be copied from the acquired code, but I intend to
write anew).

I have something like:
/^(?:\d{4}-(?:(?:0[469]|11)-(?:0[1-9]|[12]\d|30)|(?:0[13578]|1[02])-(?:0[1-9]|[12]\d|3[01])|02-(?:0[1-9]|1\d|2[0-8]))|(?:\d\d(?:0[48]|[2468][048]|[13579][26])|(?:[02468][048]|[13579][26])00)-02-29)$/

(should accept exactly valid dates in the range 0000-01-01 .. 9999-12-31)

It could be smaller, but it's attempted written for speed over compactness.
I'd love to see your regexp :)


Is this only testing string inputs ... because that would penalize the
Date methods that need to parse the numbers back out - whereas having
the numbers but no string would favor the Date methods over regexps.
P.S. Yes, one RegExp, written in 5 lines with whitespace to be ignored;
Gregorian YYYY-MM-DD, years 0000-9999, months 00-13, days 00-32, test
takes 9 seconds in Chrome 11.0,

That's quick for ~4.5M tests. Just building the strings seems like it would
take a some time.

/L
 
T

Thomas 'PointedEars' Lahn

Lasse said:
Dr said:
I have acquired a single RegExp for 1900-2000 which seems about as
fast in Firefox 3 as is my best Date Object method; and I have
written a RegExp which will properly validate all 8-digit Gregorian
Dates YYYY-MM-DD, disregarding those which are not February 29th
(that case could be copied from the acquired code, but I intend to
write anew).

I have something like:
/^(?:\d{4}-(?:(?:0[469]|11)-(?:0[1-9]|[12]\d|30)|(?:0[13578]| 1[02])-(?:0[1-9]|[12]\d|3[01])|02-(?:0[1-9]|1\d|2[0-8]))|(?:\d\d(?:0[48]|
[2468][048]|[13579][26])|(?:[02468][048]|[13579][26])00)-02-29)$/

(should accept exactly valid dates in the range 0000-01-01 .. 9999-12-31)

That is a bit pointless as there was no year 0 CE/AD (1 BCE/BC was followed
by 1 CE/AD), and the Gregorian Calendar (which introduced leap years) was
not established before 1581 years after what Dionysius Exiguus defined as
the start of the Christian calendar.
It could be smaller, but it's attempted written for speed over
compactness. I'd love to see your regexp :)with Date instances

Is this only testing string inputs ... because that would penalize the
Date methods that need to parse the numbers back out - whereas having
the numbers but no string would favor the Date methods over regexps.

But isn't this comparing apples and oranges? It is to be expected that a
prebuilt Regular Expression can be faster than a dynamic numeric comparison,
just as a static numeric comparison is likely faster than a dynamic one.

It is the process of *building* the expression *automatically* that is the
expensive part of RegExp-based *range* validation (not general date
validation as you attempted), arguably more expensive (in terms of memory
and speed) than comparing numbers (which can be done implicitly, or
explicitly by use of the getTime() method, with Date instances).


PointedEars
 
D

Dr J R Stockton

Thu said:
Dr J R Stockton said:
I have acquired a single RegExp for 1900-2000 which seems about as
fast in Firefox 3 as is my best Date Object method; and I have
written a RegExp which will properly validate all 8-digit Gregorian
Dates YYYY-MM-DD, disregarding those which are not February 29th
(that case could be copied from the acquired code, but I intend to
write anew).

I have something like:
/^(?:\d{4}-(?:(?:0[469]|11)-(?:0[1-9]|[12]\d|30)|(?:0[13578]|1[02])-(?
:0[1-9]|[12]\d|3[01])|02-(?:0[1-9]|1\d|2[0-8]))|(?:\d\d(?:0[48]|[2468][
048]|[13579][26])|(?:[02468][048]|[13579][26])00)-02-29)$/

(should accept exactly valid dates in the range 0000-01-01 .. 9999-12-31)

It could be smaller, but it's attempted written for speed over compactness.
I'd love to see your regexp :)

Well, I've already mentioned my page in this thread ...
<http://www.merlyn.demon.co.uk/$tmp.htm>. It is only necessary to go
to its Form, paste in to TextArea 3 what you wrote above between the
outer / characters, select TextArea 3 radiobutton, and press the Test
button.

Yours works in my Firefox 3.6.17, and Safari 5.0.5, and Chrome 11.0.

There are considerable resemblances between our RegExps, but ISTM that
you have gained speed by using the more advanced construct ?: .

Mine was written for being piece-wise obviously correct and for doing
its tests in the best order.

I can put yours in my page, either in the ordinary JavaScript manner as
for Pawelek's or in a PRE for copy'n'pasting into TextArea 3 (or
higher), in which case spaces, newlines, and <...> comment can be
included for readability.

Is this only testing string inputs ... because that would penalize the
Date methods that need to parse the numbers back out - whereas having
the numbers but no string would favor the Date methods over regexps.

Agreed. Purely validating an ISO 8601 string. If the only requirement
is to validate a Y-M-D string, then it will be better to write the year
in Base 20, the month in at least 12, and the day in at least 31. In
practice, it will often be the case that a Date Object is needed
subsequently, so the obvious thing to do is to create one then check its
Month.


In my first attempt with testing a RegExp (Pawelek's), the RegExp was
faster than my Date Object validation. So I upgraded :-

function GoodDate(y, m, d) { // m = 1..12 ; y m d ints, y != -10000
// return new Date(y, --m, d).getMonth() == m } // much slower than
return new Date(Date.UTC(10000+y, --m, d)).getUTCMonth() == m }

Without the 10000+, the page always rejects 0000-02-29, which does
usefully test the error-reporting code.


The time comparison that my page could do, if Windows were more stable,
is between REOK = RE.test(Str2) and MyOK = GoodDate(Y, M, D)
and is therefore not very meaningful in practice.

That's quick for ~4.5M tests. Just building the strings seems like it would
take a some time.

Agreed. And that was not fully optimised; e.g. it used my default
ZeroTo & LZ functions.



Those aforementioned Danes may be mad, but they have been reasonably
successful : <http://www.cphpost.dk/news/news-latest/51762-bornholm-
rocket-flies-high.html>.
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
That is a bit pointless as there was no year 0 CE/AD (1 BCE/BC was followed
by 1 CE/AD), and the Gregorian Calendar (which introduced leap years) was
not established before 1581 years after what Dionysius Exiguus defined as
the start of the Christian calendar.

It validates ISO-8601 dates, so it's reasonable for it to apply to the
proleptic Gregorian calendar.
But isn't this comparing apples and oranges? It is to be expected that a
prebuilt Regular Expression can be faster than a dynamic numeric comparison,
just as a static numeric comparison is likely faster than a dynamic one.

I have no idea what a dynamic numeric comparison is, but numeric comparisons
are quite likely to be faster than string operations.
It is the process of *building* the expression *automatically* that is the
expensive part of RegExp-based *range* validation (not general date
validation as you attempted),

Why bringing this up when I was only talking about RegExp based date
validation, which I did as a response to a message that mentioned date
validation?
arguably more expensive (in terms of memory
and speed) than comparing numbers (which can be done implicitly, or
explicitly by use of the getTime() method, with Date instances).

Yes, string operations are slower than number comparisons (but not
necessarily slower than parsing numbers out of a string, if that's
all you have).

/L
 
L

Lasse Reichstein Nielsen

Asen Bozhilov said:
I would like to have function which automatically generates ranges.
For example range 0 - 25 could be written as:

(?:2[0-5]|1[0-9]|[0-9]) ....
toRange(255); //(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[1-9]|0)

toRange(26); //(?:2[0-6]|1[0-9]|[1-9]|0)

Any suggestions or improvements are welcome.

I finally got around to writing the code I had an idea for.
The idea is to make the regexp as fast as possible by making all
alternatives mutually exclusive, so there won't need to be any
backtracking.
It also tries to generate non-stupid regexps (i.e., use '\d' instead
of '[0-9]' and '?' instead of '{0,1}'). That's not perfect yet :)


function toRange(limit) {
function range(a,b) {
if (a == b) return String(a);
if (a == b - 1) return "["+a+b+"]";
if (a == 0 && b == 9) return "\\d";
return "["+a+"-"+b+"]";
}
function repeat(re, from, to) {
if (to <= 0) return "";
if (to == 1){
if (from == 0) return re + "?";
if (from == 1) return re;
}
return re + "{" + from + "," + to +"}";
}

function rangeExp(limit, digit, rest) {
var d = (limit % 10);
var last_digit = (d == limit);
if (d == 9 && typeof rest == "number") {
if (last_digit) {
return "^(?:0|[1-9]" + repeat("\\d", 0, rest) + ")$";
}
return rangeExp((limit - d) / 10, digit + 1, rest + 1);
}
var res = [];
var min_digit = (last_digit) ? 1 : 0
if (typeof rest == "number") {
res.push(range(min_digit, d) + repeat("\\d", 0, rest));
} else {
if (d > min_digit) {
res.push(range(min_digit, d - 1) + repeat("\\d", 0, digit));
}
res.push(d + rest);
}
if (d < 9 && digit > 0) {
res.push(range(d + 1, 9) + repeat("\\d", 0, digit - 1));
}
if (last_digit) return "^(?:0|" + res.join("|") + ")$";
res.push('');
res = (res.length == 1) ? res[0] : "(?:" + res.join("|") + ")";
return rangeExp((limit - d) / 10, digit + 1, res);
}

if (limit < 10) return "^"+range(0,limit)+"$";
return rangeExp(limit, 0, 0);
}


I think it works correctly (testing up to 10000 works).
Obviously arguments should be non-negative integers.
/L
 
M

Michael Haufe (\TNO\)

function toRange(limit) {
  function range(a,b) {
    if (a == b) return String(a);
    if (a == b - 1) return "["+a+b+"]";
    if (a == 0 && b == 9) return "\\d";
    return "["+a+"-"+b+"]";
  }
  function repeat(re, from, to) {
    if (to <= 0) return "";
    if (to == 1){
      if (from == 0) return re + "?";
      if (from == 1) return re;
    }
    return re + "{" + from + "," + to +"}";
  }

  function rangeExp(limit, digit, rest) {
    var d = (limit % 10);
    var last_digit = (d == limit);
    if (d == 9 && typeof rest == "number") {
      if (last_digit) {
        return "^(?:0|[1-9]" + repeat("\\d", 0, rest) + ")$";
      }
      return rangeExp((limit - d) / 10, digit + 1, rest + 1);
    }
    var res = [];
    var min_digit = (last_digit) ? 1 : 0
    if (typeof rest == "number") {
      res.push(range(min_digit, d) + repeat("\\d", 0, rest));
    } else {
      if (d > min_digit) {
        res.push(range(min_digit, d - 1) + repeat("\\d", 0, digit));
      }
      res.push(d + rest);
    }
    if (d < 9 && digit > 0) {
      res.push(range(d + 1, 9) + repeat("\\d", 0, digit - 1));
    }
    if (last_digit) return "^(?:0|" + res.join("|") + ")$";
    res.push('');
    res = (res.length == 1) ? res[0] : "(?:" + res.join("|") + ")";
    return rangeExp((limit - d) / 10, digit + 1, res);
  }

  if (limit < 10) return "^"+range(0,limit)+"$";
  return rangeExp(limit, 0, 0);

}

I think it works correctly (testing up to 10000 works).
Obviously arguments should be non-negative integers.

Perhaps this could be helpful:
http://utilitymill.com/utility/Regex_For_Range

If I can find the time, I'll try to see if I can come up with a
functional solution in Haskell/SML then try to port it over
 
T

Thomas 'PointedEars' Lahn

Lasse said:
It validates ISO-8601 dates, so it's reasonable for it to apply to the
proleptic Gregorian calendar.

Hm, OK.
I have no idea what a dynamic numeric comparison is, but numeric
comparisons are quite likely to be faster than string operations.

By "dynamic numeric comparison" I mean, for example, to have a Date instance
and call its getTime() method to retrieve the time value for comparison on
runtime. "Static numeric comparison" would then be to retrieve the time
value before runtime (like in the JavaScript console) and store it in a
variable used for comparison directly.
Why bringing this up when I was only talking about RegExp based date
validation, which I did as a response to a message that mentioned date
validation?

Because it was in this thread; but OK, you seem to have changed the topic in
the meantime (BTW, without changing the Subject header field value).
Yes, string operations are slower than number comparisons (but not
necessarily slower than parsing numbers out of a string, if that's
all you have).

If it was a simple string operation I would agree, but it is not. It
involves creating a RegExp instance with a number of nested alternations,
and matching that against a string value. I wonder how that could be faster
or better than using a much simpler RegExp for parsing, or than simply
passing on the string value verbatim to the Date constructor (which it must
accept and interpret properly according to ES5, section 15.9.3.2), and then
comparing the numeric time values (implicitly or by calling getTime()).

As for parsing, there is a catch indeed that I have come across recently:
the Date constructor is specified (ES5, 15.9.3.1) and implemented (tested in
V8 of Chromium 11, details available) to interpret a !NaN-year argument less
than 100 to be year + 1900. So you will have to call
dateInstance.setFullYear(1) to store a date-time in 1 CE.


PointedEars
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]>,
Lasse said:
Dr said:
I have acquired a single RegExp for 1900-2000 which seems about as
fast in Firefox 3 as is my best Date Object method; and I have
written a RegExp which will properly validate all 8-digit Gregorian
Dates YYYY-MM-DD, disregarding those which are not February 29th
(that case could be copied from the acquired code, but I intend to
write anew).

I have something like:
/^(?:\d{4}-(?:(?:0[469]|11)-(?:0[1-9]|[12]\d|30)|(?:0[13578]| 1[02])-(?:0[1-9]|[12]\d|3[01])|02-(?:0[1-9]|1\d|2[0-8]))|(?:\d\d(?:0[48]|
[2468][048]|[13579][26])|(?:[02468][048]|[13579][26])00)-02-29)$/

(should accept exactly valid dates in the range 0000-01-01 .. 9999-12-31)

That is a bit pointless as there was no year 0 CE/AD (1 BCE/BC was followed
by 1 CE/AD), and the Gregorian Calendar (which introduced leap years) was
not established before 1581 years after what Dionysius Exiguus defined as
the start of the Christian calendar.

You should find out about the Proleptic Gregorian Calendar - after all,
it is what ECMAScript uses - and about Astronomers' Notation (ditto).
While you are there, find out also about the modern uses of Julian Year.
In date work, it is inconvenient to use a scale with a step in value and
in slope; it is better to use proleptic Gregorian (or the Julian Year,
or some form of JD) and to translate the inputs and results as required.

It was, by the way, not Gregory, Clavius et al who introduced our Leap
Years; it was Julius, Sosigenes the Alexandrian et al. Our present-day
Leap Years are all in accordance with the Julian Rule; Gregory merely
abolished three instances of February 29th per four hundred calendar
years. That is explained, authoritatively, in an Opera Mathematica
Tomus V.

It is true that 1 BC was followed by 1 AD (though no-one below Deity
grade knew that at the time); but it is equally true that the Year Zero
was followed by the Year One (ditto). However, the Year Minus One was
not followed by the Year One.

Sosigenes was not the first to invent the Leap Year. He, no doubt,
being learned, knew of the Ptolemaic decree of 238 BC. But Julius was
probably the first to make an effective introduction, though marred
after his death by administrative incompetence.

But isn't this comparing apples and oranges? It is to be expected that a
prebuilt Regular Expression can be faster than a dynamic numeric comparison,
just as a static numeric comparison is likely faster than a dynamic one.

That will depend on the complexity of the RegExp.

Given a known-good RegExp, it could be of interest to consider it as the
only definition of the required algorithm, and to translate it to use
arithmetic, and to compare that with other arithmetical implementations
based on the rules stated in ISO 8601:2004.

Merlyn's $.htm was updated at 17:19 GMT June 4, and probably since, too.
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
Lasse Reichstein Nielsen wrote:

If it was a simple string operation I would agree, but it is not. It
involves creating a RegExp instance with a number of nested alternations,
and matching that against a string value. I wonder how that could be faster
or better than using a much simpler RegExp for parsing, or than simply
passing on the string value verbatim to the Date constructor (which it must
accept and interpret properly according to ES5, section 15.9.3.2), and then
comparing the numeric time values (implicitly or by calling getTime()).

I wouldn't create a RegExp for a single test. However, if I were to do the
same test repeatedly on string inputs, the overhead of compiling the RegExp
would likely be worth it. A compiled RegExp can be quite quick.

But then, so can a properly written custom parser.

If the range isn't static, it's probably faster to parse the date (if
speed is of the essence, I would probably use charCodeAt and some
arithmetic rather than parseInt/Number/+/etc. [1]) and do numeric
comparisons.
As for parsing, there is a catch indeed that I have come across recently:
the Date constructor is specified (ES5, 15.9.3.1) and implemented (tested in
V8 of Chromium 11, details available) to interpret a !NaN-year argument less
than 100 to be year + 1900. So you will have to call
dateInstance.setFullYear(1) to store a date-time in 1 CE.

Yes, bloody annoying Netscape legacy behavior.

/L
[1] http://jsperf.com/parsing-dates
 
D

Dr J R Stockton

Sat said:
I finally got around to writing the code I had an idea for.
The idea is to make the regexp as fast as possible by making all
alternatives mutually exclusive, so there won't need to be any
backtracking.
It also tries to generate non-stupid regexps (i.e., use '\d' instead
of '[0-9]' and '?' instead of '{0,1}'). That's not perfect yet :)

Let your imperfect string be RE.

Such optimisation should be easy to do by repeatedly using
RE = RE.replace(/RegExp/g, String)
to change each of those and similar examples. '[.]' -> '.' ?
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]>,
If it was a simple string operation I would agree, but it is not. It
involves creating a RegExp instance with a number of nested alternations,
and matching that against a string value. I wonder how that could be faster
or better than using a much simpler RegExp for parsing, or than simply
passing on the string value verbatim to the Date constructor (which it must
accept and interpret properly according to ES5, section 15.9.3.2), and then
comparing the numeric time values (implicitly or by calling getTime()).

There is also a dependence on whether the actual job requires testing
one number to be within one range or testing a lot of numbers to be
within one range. In the latter case the overhead of creating a RegExp
for the range becomes unimportant.

As for parsing, there is a catch indeed that I have come across recently:
the Date constructor is specified (ES5, 15.9.3.1) and implemented (tested in
V8 of Chromium 11, details available) to interpret a !NaN-year argument less
than 100 to be year + 1900.

Many of us here have known that for a rather long time. It is by no
means a new behaviour, though ES5 might be the first standard to specify
it. However (a) NaN is not less than 100, so did not need a mention -
but, OTOH, it is not >= 100, but NaN+1900 would be treated just like
NaN; (b) negative year numbers, though less than 100, are handled
correctly.
So you will have to call
dateInstance.setFullYear(1) to store a date-time in 1 CE.

/Non sequitur/ : "have to". In numeric Gregorian date validation, it is
sufficient to add a multiple of 400 to the year argument of new Date(,,)
[if in a hurry, of new Date(Date.UTC(,,))]. For string validation with
new Date(), one can just prepend 1 to YYYY.

Moreover, if one knows from use of a RegExp that the year matches \d{4}
(and therefore is not in -400 .. -301), then instead of new Date(Y, M-1,
D) one can use new Date(Y+400, M-4801, D).

You could have discovered such things by paying sufficient attention to
earlier versions of the newsgroup FAQ - for example, "8.1 - 2005-11-05"
but not "9.91 - 2008-01-19".

And, of course, our stringy date validations can be extended from Year
9999 indefinitely upwards by prepending \d* to the year parts.
 
T

Thomas 'PointedEars' Lahn

Dr said:
Thomas 'PointedEars' Lahn posted:

There is also a dependence on whether the actual job requires testing
one number to be within one range or testing a lot of numbers to be
within one range. In the latter case the overhead of creating a RegExp
for the range becomes unimportant.

How often either method is used makes no difference; optimization
considerations apply.
Many of us here have known that for a rather long time.

Not all of us specialize in date and numeric processing only as you do.
It is by no means a new behaviour,

I did not say it was.
though ES5 might be the first standard to specify it.

Further research shows that it is not (ES had it already), however that it
is specified is the reason why I mentioned it.
However (a) NaN is not less than 100, so did not need a mention - but,
OTOH, it is not >= 100, but NaN+1900 would be treated just like NaN; (b)
negative year numbers, though less than 100, are handled correctly.
So you will have to call dateInstance.setFullYear(1) to store a date-time
in 1 CE.

/Non sequitur/ : "have to".
Wrong.

In numeric Gregorian date validation, it is sufficient to add a multiple
of 400 to the year argument of new Date(,,)
[…]

Ignoratio elenchi. I was not talking about date validation, but about
storing a date-time in a year before 100 CE.
And, of course, our stringy date validations can be extended from Year
9999 indefinitely upwards by prepending \d* to the year parts.

Great, so you can still validate when the computer systems that support it
and the calendar that is used by it fell out of common use millenia ago :->


PointedEars
 
D

Dr J R Stockton

Thu said:
I have something like:
/^(?:\d{4}-(?:(?:0[469]|11)-(?:0[1-9]|[12]\d|30)|(?:0[13578]|1[02])-(?:0[1-9]|[12]\d|3[01])|02-(?:0[1-9]|1\d|2[0-8]))|(?:\d\d(?:0[48]|[2468][0
48]|[13579][26])|(?:[02468][048]|[13579][26])00)-02-29)$/

(should accept exactly valid dates in the range 0000-01-01 .. 9999-12-31)

It does. And it still does after removing all ?: pairs.

The code which I found tested February 29th first, which must slow it
down.

It could be smaller, but it's attempted written for speed over compactness.
I'd love to see your regexp :)


Its first complete form follows. This is the most readable version;
I've not yet factored out the multiple \d{4}. Ignore whitespace. The
lines respectively handle
Count per 400 years
Days 01 to 30, except February 132000
February, days 01 to 28 11200
The 31st of any month that has one 2800
February 29th, years xx01 to xx99 96
February 29th, years xx00 1
Total 146097 = 400*365.2425

^
( \d{4} - (0[13-9]|1[012]) - (0[1-9]|[12]\d|30) ) |
( \d{4} - 02 - (0[1-9]|1\d|2[0-8]) ) |
( \d{4} - (01|03|05|07|08|10|12) - 31 ) |
( \d\d (0[48]|[2468][048]|[13579][26]) - 02 - 29 ) |
( ([02468][048]|[13579][26]) 00 - 02 - 29 )
$

For speed, one wants to minimise the sum of the counts, each weighted by
the complexity of the code preceding that count. Since the lines are of
similar complexity, the lines are in the right order.

One might instead start by doing all days 01-28, then days 29 & 30 of
eleven months - 133400 + 8800 = 143200 = 132000 + 11200 - so that might
be a little faster. OTOH, those two RegExps would apparently be longer.
 
T

Thomas 'PointedEars' Lahn

Thomas said:
Great, so you can still validate when the computer systems that support it
and the calendar that is used by it fell out of common use millenia ago
:->

Sorry, that's _millennia_. I did not mean a thousand buttocks ;-)


PointedEars
 
D

Denis McMahon

I would like to have function which automatically generates ranges. For
example range 0 - 25 could be written as:

(?:2[0-5]|1[0-9]|[0-9])

It might be easier to extract the digits, parseInt and check that the
numeric result is within your limits.

function getNumFromStr (s) {
var m, n;
m = s.match(/\d+/);
if (m===null) return null;
if (m.length < 1) return null;
n = parseInt(m[0]);
if (isNaN(n)) return null;
return n;
}

You may need to expand the regex to handle sign characters, decimal
fractions and exponents, especially if you're parsing user input.

/[-+]?\d+([.,]\d*)?([eE][-+]?\d+)?/

Rgds

Denis McMahon
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]>,
I would like to have function which automatically generates ranges. For
example range 0 - 25 could be written as:

(?:2[0-5]|1[0-9]|[0-9])

It might be easier to extract the digits, parseInt and check that the
numeric result is within your limits.

function getNumFromStr (s) {
var m, n;
m = s.match(/\d+/);
if (m===null) return null;
if (m.length < 1) return null;
n = parseInt(m[0]);
if (isNaN(n)) return null;
return n;
}

I don't immediately see how your n can be NaN.
You may need to expand the regex to handle sign characters, decimal
fractions and exponents, especially if you're parsing user input.

/[-+]?\d+([.,]\d*)?([eE][-+]?\d+)?/


That matches "1." which the best authorities dislike.
It does not match ".1" which ditto, but ECMA and Firefox accept.
The comma (with parseFloat, no doubt) will, I think. make it accept
French numbers but return only their integer parts.

It's people like you that stop me kill-ruling all of @gmail.com. :-(
 
S

Scott Sauyet

Lasse said:
Asen said:
I would like to have function which automatically generates ranges.
For example range 0 - 25 could be written as:
(?:2[0-5]|1[0-9]|[0-9])
...
toRange(255); //(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[1-9]|0)
toRange(26); //(?:2[0-6]|1[0-9]|[1-9]|0)
Any suggestions or improvements are welcome.

I finally got around to writing the code I had an idea for.
The idea is to make the regexp as fast as possible by making all
alternatives mutually exclusive, so there won't need to be any
backtracking.
It also tries to generate non-stupid regexps (i.e., use '\d' instead
of '[0-9]' and '?' instead of '{0,1}'). That's not perfect yet :)
[ code elided ]

And I finally got around to writing the one I had in mind. It's
different enough from the previous two solutions to be worth
mentioning.

The idea is to work recursively with ranges that have both a start and
an end, with a tree structure for, say 267 - 4301, of

267-4301
+- 267-999
| +- 267-299
| | +- 267-269: 26[7-9]
| | `- 270-299: 2[7-9]\d
| `- 300-999: [3-9]\d\d
`- 1000-4301
+- 1000-3999: [1-3]\d\d\d
`- 4000-4301
+- 4000-4299: 4[0-2]\d\d
`- 4300-4301: 43[0-1]

to create a partial regex that looks like: "(?:26[7-9]|2[7-9]\d|[3-9]\d
\d|[1-3]\d\d\d|4[0-2]\d\d|430[0-1])"

Mine has very minimal testing, and I would not be surprised to hear
that it's buggy. You can see an example of it at <http://
scott.sauyet.com/Javascript/Test/Range/2011-06-13c/>. The relevant
code is

var toRange = (function() {
function range(vals) {
var parts = vals.split("-"),
start = parts[0], end = parts[1], match;
if (start === end) {
return start;
} else if (start.length < end.length) {
return range(start + "-" + start.replace(/\d/g, "9")) + "|" +
range("1" + start.replace(/\d/g, "0") + "-" + end);
} else if (match = /^(\d*)(\d)(0*)\-\1(\d)9*$/.exec(vals)) {
return match[1] + ((match[2] === "0" && match[4] === "9") ?
"\\d" : ("[" + match[2] + "-" + match[4]) + "]" +
match[3].replace(/0/g, "\\d"));
} else if (match = /^(\d*)(\d)\d*\-\1(\d)9(9*)$/.exec(vals)) {
return range(start + "-" + match[1] + match[2] + "9" +
match[4]) + "|" +
range(match[1] + (1 + Number(match[2])) + "0" +
match[4].replace(/9/g, "0") + "-" + end);
} else if (match = /^(\d*)(\d+)\-\1(\d)(\d*)$/.exec(vals)) {
return range(start + "-" + match[1] + (Number(match[3]) - 1)
+
match[4].replace(/\d/g, "9")) + "|" +
range(match[1] + match[3] + match[4].replace(/\d/g,
"0") +
"-" + end)
} else if (match = /^(\d*)(\d)\-\1(\d)$/.exec(vals)) {
return match[1] + "[" + match[2] + "-" + match[3] + "]";
}
return ""; // throw error?
}

return function(start, end) {
if (arguments.length < 1) {return "";} // throw error?
if (arguments.length == 1) {
return "(?:" + range("0-" + start) + ")";
}
if (start > end) {return "";} // throw error?
return "(?:" + range("" + start + "-" + end) + ")";
};
}());

(This is a little ugly here, as I try to keep the line length down.
The online version is somewhat clearer.)

Here again, the arguments must be non-negative integers, and if two
are supplied, the second must be no smaller than the first.

-- Scott
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top