Regular expression 0 - number range.

A

Asen Bozhilov

I would like to have function which automatically generates ranges.
For example range 0 - 25 could be written as:

(?:2[0-5]|1[0-9]|[0-9])

I don't want to do that every time and I wrote a function which
generate that sequence.

function toRange(num) {
var n, size,
digit, i,
j, reg,
arr = [];

while (num > 0) {
n = num;
size = Math.floor(Math.log(num) / Math.LN10);
j = size;
i = 10;
reg = '';

while ((digit = n % 10) == 9) {
if (i < num) {
reg += '[0-9]';
}
else {
reg = '[1-9]' + reg;
}

n = Math.floor(n / 10);
i *= 10;
j--;
}

if (j > 0) {
reg = ('' + num).slice(0, j) + (digit ? '[0-' + digit +
']' : '0') + reg;
}
else if (digit) {
reg = (digit > 1 ? '[1-' + digit + ']' : '1') + reg;
}

if (i >= num) {
num = Math.pow(10, size) - 1;
}
else {
num -= (num % i) + 1;
}

arr.push(reg);
}

arr.push(0);
return '(?:' + arr.join('|') + ')';
}

toRange(255); //(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[1-9]|0)

toRange(26); //(?:2[0-6]|1[0-9]|[1-9]|0)

Any suggestions or improvements are welcome.
 
G

Gregor Kofler

Am 2011-05-26 13:10, Asen Bozhilov meinte:
I would like to have function which automatically generates ranges.
For example range 0 - 25 could be written as:

(?:2[0-5]|1[0-9]|[0-9])

I don't want to do that every time and I wrote a function which
generate that sequence.
[snip]

toRange(255); //(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[1-9]|0)

toRange(26); //(?:2[0-6]|1[0-9]|[1-9]|0)

Any suggestions or improvements are welcome.

What's the point of this function (or rather this regexp)? If it is just
for checking whether some value is within a certain range something like
if(foo >= 0 && foo <= limit) {..} would do the job just as well.

Gregor
 
A

Asen Bozhilov

Gregor said:
What's the point of this function (or rather this regexp)? If it is just
for checking whether some value is within a certain range something like
if(foo >= 0 && foo <= limit) {..} would do the job just as well.

It depends. While I prefer that approach you are not able to use it
every time. If you have to match for example "Gregor" followed by
number in range 0-20 and you want to get all matches in the text how
would you implement it?

RegExp approach would be:

var str = 'Gregor1 Gregor23 Gregor10 Gregor9 Gregor99';
str.match(new RegExp('Gregor' + toRange(20) + '\\b', 'g')); //
Gregor1, Gregor10, Gregor9

Regards.
 
L

Lasse Reichstein Nielsen

Asen Bozhilov said:
I would like to have function which automatically generates ranges.
For example range 0 - 25 could be written as:

(?:2[0-5]|1[0-9]|[0-9])

I don't want to do that every time and I wrote a function which
generate that sequence.

Is the range always starting from 0?
That does make it a lot easier :)

It doesn't seem to allow prefixed zeros, which is also good.

I think my first go would be a recursive version that recognizes
simple ranges and splits more complex ranges into simple ranges and
smaller complex ranges.
However, for 0...n ranges it's probably overkill.

function toRange(num) {
var n, size,
digit, i,
j, reg,
arr = [];

while (num > 0) {
n = num;
size = Math.floor(Math.log(num) / Math.LN10);

You get rounding errors at e.g., num = 1000.
Any suggestions or improvements are welcome.

If you get to the point where you generate a match for something on the
form 100 .. 99999, you can just do one expression: [1-9]\d{2,4}

/L
 
T

Thomas 'PointedEars' Lahn

Stefan said:
Asen said:
I would like to have function which automatically generates ranges.
For example range 0 - 25 could be written as:

(?:2[0-5]|1[0-9]|[0-9])

I don't want to do that every time and I wrote a function which
generate that sequence.

function toRange(num) { ...
}

Thank you, that's going to be very useful. From a first look, I can't
think of any major improvements.

I can't think of any possible use-cases. If you want to compare ranges, use
arithmetic, not RegExps. It is far more efficient.

This might be a good learning exercise, nothing more.
I did a quick Perl port of your function, because that's the language
where I'm most likely going to use it:
http://foo.at/paste/2011/toRange-perl.txt

Perl has integer arithmetic, too.
Out of curiosity, were you validating IP addresses when you wrote that?

If so, the IP*v4* address should have been split at the dot, and its
individual parts be subject to integer tests (isn't this particularly easy
to do in Perl?). AISB, far more efficient than this.

BTW, IPv6 is to come soon, and then you can effectively dump that code.
I for one prefer not to write code for the recycle bin. How about you?


PointedEars
 
M

Michael Haufe (\TNO\)

It depends. While I prefer that approach you are not able to use it
every time. If you have to match for example "Gregor" followed by
number in range 0-20 and you want to get all matches in the text how
would you implement it?

RegExp approach would be:

var str = 'Gregor1 Gregor23 Gregor10 Gregor9 Gregor99';
str.match(new RegExp('Gregor' + toRange(20) + '\\b', 'g')); //
Gregor1, Gregor10, Gregor9

Regards.

var str = 'Gregor1 Gregor23 Gregor10 Gregor9 Gregor99',
matches = [];
str.replace(/Gregor(\d+)/g,function($0,$1){
if(+$1 >= 0 && +$1 <= 20)
matches.push($0);
return $0;
});
 
T

Thomas 'PointedEars' Lahn

Stefan said:
Thomas said:
Stefan said:
Asen Bozhilov wrote:
I would like to have function which automatically generates ranges.
For example range 0 - 25 could be written as:

(?:2[0-5]|1[0-9]|[0-9])

I don't want to do that every time and I wrote a function which
generate that sequence.
Thank you, that's going to be very useful. From a first look, I can't
think of any major improvements.
I can't think of any possible use-cases. If you want to compare ranges,
use arithmetic, not RegExps. It is far more efficient.

There are many interfaces which allow you to specify a pattern, but not
a callback - for example, log analyzers

If the log analyzer cannot deal with this without a complicated RegExp, it
is not suited to the task.

Where you have grep, you usually also have awk, or even perl. No need for
building expressions.
Date/time strings can also be validated with these ranges.

No, they cannot. Date validation (and determining whether a date is within
a range of other dates) requires far more effort than just matching strings.
(Haven't we been over this ad nauseam?)
More generally, if you want to check a string which can contain an integer
number without leading zeros, these ranges allow you to do that with a
single pattern.

No doubt about that. But in total 20 years of software development I have
never had the need for such an expression, and I doubt I ever will.
Your alternative would involve capturing groups and additional comparisons
of the numeric substring. I'm not convince that's really "far more
efficient".

Because, problems with IEEE-754 precision aside, you are not considering the
computational and storage cost of making the exact expression. The
capturing RegExp (if you want to use one, you don't have to, there is
split() everywhere) can be quite simple. Let's say I want to find out
whether something is a syntactically valid IPv4 address, and I want to use
RegExp for that, I would not build an exact expression, but instead:

var addr = "192.168.0.1";
var rx = /^\s*(\d+)\.(\d+)\.(\d+)\.(\d+)\s*$/;
var matches = addr.match(rx);
matches.shift();
if (matches.filter(function(x) { return x >= 0 && x <= 255; }).length < 4)
{
console.log(addr + " is not a valid IPv4 address!");
}

It is even faster with split():

var parts = addr.split(".");
if (matches.filter(function(x) { return x >= 0 && x <= 255; }).length
!= 4)
{
/* syntactically invalid */
}

(And now imagine this in Perl!)

Isn't this just *slightly* better by several measures than building a
Regular Expression? There is good use for RegExp; validating numeric values
is not it.


HTH

PointedEars
 
T

Thomas 'PointedEars' Lahn

Thomas said:
var addr = "192.168.0.1";
var rx = /^\s*(\d+)\.(\d+)\.(\d+)\.(\d+)\s*$/;
var matches = addr.match(rx);
matches.shift();
if (matches.filter(function(x) { return x >= 0 && x <= 255; }).length <
4)
{
console.log(addr + " is not a valid IPv4 address!");
}

It is even faster with split():

var parts = addr.split(".");
if (matches.filter(function(x) { return x >= 0 && x <= 255; }).length
!= 4)
{
/* syntactically invalid */
}

Should be

var parts = addr.split(".");
if (parts.filter(function(x) {
x = parseInt(x, 10);
return x >= 0 && x <= 255;
}).length != 4)
{
console.log(addr + " is not a valid IPv4 address!");
}

Explicit conversion to Number is required for the comparison to make sense
in both examples.


PointedEars
 
R

RobG

It depends. While I prefer that approach you are not able to use it
every time. If you have to match for example "Gregor" followed by
number in range 0-20 and you want to get all matches in the text how
would you implement it?
RegExp approach would be:
var str = 'Gregor1 Gregor23 Gregor10 Gregor9 Gregor99';
str.match(new RegExp('Gregor' + toRange(20) + '\\b', 'g')); //
Gregor1, Gregor10, Gregor9

var str = 'Gregor1 Gregor23 Gregor10 Gregor9 Gregor99',
    matches = [];
str.replace(/Gregor(\d+)/g,function($0,$1){
        if(+$1 >= 0 && +$1 <= 20)
                matches.push($0);
        return $0;
});

Unfortunately the OP wanted a function to generate the range-testing
RegExp expression, not a function to do the test. Otherwise, neat
solution.
 
R

Ry Nohryb

Should be

  var parts = addr.split(".");
  if (parts.filter(function(x) {
        x = parseInt(x, 10);
        return x >= 0 && x <= 255;
      }).length != 4)
  {
    console.log(addr + " is not a valid IPv4 address!");
  }

Explicit conversion to Number is required for the comparison to make sense
in both examples.

You forgot to test parts.length before the .filter(), so your test
would pass "999.192.168.0.999.1.-127" as a valid ip.

var parts = addr.split(".");

function isaByte (x) {
x = parseInt(x, 10);
return x === x & 0xff; // nicer 'byte' test, imo.
}

if (parts.length !== 4 || parts.filter(isaByte).length !== 4) {
console.log(addr + " is not a valid IPv4 address!");
}
 
R

Ry Nohryb

You forgot to test parts.length before the .filter(), so your test
would pass "999.192.168.0.999.1.-127" as a valid ip.

var parts = addr.split(".");

function isaByte (x) {
  x = parseInt(x, 10);
  return x === x & 0xff; // nicer 'byte' test, imo.

}

if (parts.length !== 4 || parts.filter(isaByte).length !== 4) {
  console.log(addr + " is not a valid IPv4 address!");

}

Better with parens :))

return x === (x & 0xff);
 
D

Dr J R Stockton

In comp.lang.javascript message <96200c24-e586-4e63-ab0f-fe6afcf50213@m1
0g2000yqd.googlegroups.com>, Thu, 26 May 2011 04:10:21, Asen Bozhilov
size = Math.floor(Math.log(num) / Math.LN10);

Math.log, Math.LN10, and non-integer division are inexact, and should
not be trusted in conjunction with Math.floor when the argument of
Math.floor should be integer (or possibly very near integer).

The following is (Firefox 3.6.17, Chrome 11.0, Win XP, P4/3G) self-
descriptive :

function Thousand() { var X = 1, K = 0
do {
if (K != Math.floor(Math.log(X) / Math.LN10)) break
K++ }
while (isFinite(X*=10))
return X }

and

function Errs() { var X = 1, K = 0, A = []
do {
if (K != Math.floor(Math.log(X) / Math.LN10)) A.push(K)
K++ }
while (isFinite(X*=10))
return A.length }

returns 167.

I could be off-by-one; but Errs certainly should not push for only some
values.

That portion of the task can be done, less elegantly but correctly,
using integer arithmetic.
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]>,
Fri, 27 May 2011 00:56:40, Thomas 'PointedEars' Lahn
No, they cannot. Date validation (and determining whether a date is within
a range of other dates) requires far more effort than just matching strings.
(Haven't we been over this ad nauseam?)

Manifestly incorrect. To validate an ISO 8601 Gregorian Date String,
first decapitate the year to four digits (the calendar repeats every 400
years, hence every 10000 years). That leaves just 3652425 valid
possibilities, each of ten characters; so a RegExp of 11*3652425
characters will certainly do it. End of existence proof.

GMT date/time strings require little extra work. UTC ones take more
work. Local ones are considerably worse, especially for Israel.

Numerous optimisations are evident. Another is to code the year, in the
string, in Base 20 and the month and day in Base 32. Another, of
course, is to validate the Ordinal Date YYYY-DDD.

And I'm reasonably sure that I've seen a full RegExp Gregorian date
checker using only several lines of RegExp, probably in this group.

Of course, it's inelegant by RegExp; I don't recall whether its speed
was compared with an optimal one using new Date() (meaning new
Date(Date.UTC())) or an optimal non-Object method.
 
T

Thomas 'PointedEars' Lahn

Dr said:
Thomas 'PointedEars' Lahn posted:

Manifestly incorrect. To validate an ISO 8601 Gregorian Date String,
first decapitate the year to four digits (the calendar repeats every 400
years, hence every 10000 years). That leaves just 3652425 valid
possibilities, each of ten characters; so a RegExp of 11*3652425
characters will certainly do it. End of existence proof.

While trying to be funny, you miss the point. The problem to solve was
_not_ to validate a date string, but to determine whether a date is between
two other dates (corresponding to the original problem of determining
whether an number is within a numeric range, or an IPv4 address is within an
IPv4 address range).


PointedEars
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]
rlyn.invalid>, Sat, 28 May 2011 19:01:42, Dr J R Stockton
And I'm reasonably sure that I've seen a full RegExp Gregorian date
checker using only several lines of RegExp, probably in this group.


FYI 1 : I have now found on the Web a PHP RegExp about 550 characters
long which is claimed to fully validate a date/time.


FYI 2 : I have now found in this group a JavaScript RegExp about 220
characters long which I have shown validates all "dates" from 18991232
to 20009999 inclusive, when presented as DD-MM-YYYY. That takes under
25 seconds on Opera 11.11 on WinXP on P4/3G, including using my Date
Object code to validate YYYY-MM-DD. There is not time tonight to make
it presentable.
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]
rlyn.invalid>, Sat, 28 May 2011 19:01:42, Dr J R Stockton
<[email protected]> posted:

FYI 2 : I have now found in this group a JavaScriptRegExpabout 220
characters long which I have shown validates all "dates" from 18991232
to 20009999 inclusive, when presented as DD-MM-YYYY.  That takes under
25 seconds on Opera 11.11 on WinXP on P4/3G, including using myDate
Object code to validate YYYY-MM-DD.  There is not time tonight to make
it presentable.

Now moderately presentable at temporary page
<http://www.merlyn.demon.co.uk/$tmp.htm>.

--
(c) John Stockton, nr London, UK. [email protected]
Turnpike v6.05.
Website - w. FAQish topics, links, acronyms
PAS EXE etc. : <http://www.merlyn.demon.co.uk/programs/> - see in
00index.htm
Dates - miscdate.htm estrdate.htm js-dates.htm pas-time.htm
critdate.htm etc.
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]>,
Sun, 29 May 2011 02:24:21, Thomas 'PointedEars' Lahn
While trying to be funny, you miss the point. The problem to solve was
_not_ to validate a date string, but to determine whether a date is between
two other dates (corresponding to the original problem of determining
whether an number is within a numeric range, or an IPv4 address is within an
IPv4 address range).

I am commenting on what you wrote and I quoted.

Note also that, for testing a user-supplied YYYY-MM-DD string for value
being between two given YYYY-MM-DD strings, it is not sufficient to do
only a straight string comparison. The string 2062-02-29 undoubtedly
lies between the strings 2040-11-21 and 2069-12-31, but, considered as a
date, it does not.

The Samoans should have some fun with JavaScript dates, since ECMA 3/5
do not allow for the omission of a day from the local calendar, and they
will have no Friday this year after Christmas Eve (probably).

The RegExp method that I was testing for validating dates in 1900-2000
is much faster than I had expected; I needed to speed up the validator I
was testing it against.
 
J

Jukka K. Korpela

30.5.2011 22:11 said:
Note also that, for testing a user-supplied YYYY-MM-DD string for value
being between two given YYYY-MM-DD strings, it is not sufficient to do
only a straight string comparison.

Why not?
The string 2062-02-29 undoubtedly
lies between the strings 2040-11-21 and 2069-12-31, but, considered as a
date, it does not.

Under which ordering relations of dates?

Maybe you are trying to say that if you wish to compare dates ignoring
the year, then you need to ignore the year. That would be true, somewhat
trivially. But that's not what comparison of dates normally means.
 
L

Lasse Reichstein Nielsen

Jukka K. Korpela said:
Why not?


Under which ordering relations of dates?

The problem is that 2062-02-29 is not a date at all, since 2062 isn't
a leap year.

Being in the numeric/alphabetic range isn't sufficient validation.
While I'm sure you can make a RegExp that matches only valid dates too,
it's simply not the most effective way of doing that.

/L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top