Date::Parse mysteriously lowercases text when an unrelated variableis used.

  • Thread starter Mumia W. (reading news)
  • Start date
M

Mumia W. (reading news)

To reduce my boredom I took grocery_stocker's data from "At a loss how
to sort this file," and I wrote a program to sort it:

1 #!/usr/bin/perl
2
3 use strict;
4 use warnings;
5 use Date::parse;
6 my $reverser = -1;
7
8 my $sortfn = sub {
9 my ($c, $d) = map str2time(substr($_, 43, 16)), ($a, $b);
10 $reverser * ($c <=> $d);
11 };
12
13 @ARGV = 'sort-long-dates.txt';
14 my @data = map uc $_, grep /./, <>;
15 my @sorted = sort $sortfn @data;
16 print join("",@sorted), "\n";
17

The data is sorted as expected, but something strange happens:
Date::parse::str2time lowercases the weekday names. I deliberately
capitalize everything on the line to show that only the dates are affected.

I sort of know why this is happening. The value returned by substr() is
tied to the portion of the string it came from, and str2time() probably
lowercases its input string before doing any other processing.

My question is, "why are the dates lowercased only when line 10 uses
the$reverser variable?"

I wanted to be able to change my program's sort order (ascending,
descending) by only changing one variable, so I created $reverser;
however, when I use the $reverser, the dates are converted to lowercase,
but when I do not use the $reverser, and line 10 reads "$d <=> $c", the
dates are not modified.

I can't see the relationship between the str2time() on line 9 and the
use of $reverser on line 10.

BTW, the dates are not modified if the $reverser variable is defined
within the $sortfn subroutine, e.g.

7 my $sortfn = sub {
8 my $reverser = -1;
9 my ($c, $d) = map str2time(substr($_, 43, 16)), ($a, $b);
10 $reverser * ($c <=> $d);
11 };

What's going on?

-------------------------------
Perl 5.8.4
Debian GNU/Linux 3.1
Date::parse 2.27


--
(e-mail address removed)


lar ttyp2 216.106.179.129 Fri Nov 17 17:12 - 17:14 (00:01)
lar ttypa 216.106.179.129 Fri Nov 17 15:53 - 15:55 (00:01)
lar ttypp 216.106.179.129 Thu Nov 16 17:11 - 17:21 (00:09)
lar ttypk 216.106.179.129 Thu Nov 16 14:20 - 14:21 (00:01)
lar ttypn 216.106.179.129 Thu Nov 16 13:23 - 13:37 (00:13)
irongeek ttypi 216.106.179.129 Wed Nov 15 17:27 - 17:32 (00:04)
sabre ttyp5 216.106.179.129 Wed Nov 15 13:59 - 14:03 (00:04)
lar ttyp5 216.106.179.129 Wed Nov 15 13:57 - 13:59 (00:01)
sabre ttyp5 216.106.179.129 Wed Nov 15 13:28 - 13:57 (00:28)
sabre ttypc 216.106.179.129 Wed Nov 15 12:10 - 12:10 (00:00)

lar ttypd 71.57.146.22 Fri Nov 17 07:27 - 07:43 (00:16)
irongeek ttyp2 71.57.146.22 Thu Nov 16 07:49 - 07:56 (00:07)
sabre ttypg 71.57.146.22 Sat Nov 11 15:56 - 16:09 (00:12)
 
P

Peter Scott

To reduce my boredom I took grocery_stocker's data from "At a loss how
to sort this file," and I wrote a program to sort it:

1 #!/usr/bin/perl
2
3 use strict;
4 use warnings;
5 use Date::parse;
6 my $reverser = -1;
7
8 my $sortfn = sub {
9 my ($c, $d) = map str2time(substr($_, 43, 16)), ($a, $b);
10 $reverser * ($c <=> $d);
11 };
12
13 @ARGV = 'sort-long-dates.txt';
14 my @data = map uc $_, grep /./, <>;
15 my @sorted = sort $sortfn @data;
16 print join("",@sorted), "\n";
17

The data is sorted as expected, but something strange happens:
Date::parse::str2time lowercases the weekday names. I deliberately
capitalize everything on the line to show that only the dates are affected. [snip]
BTW, the dates are not modified if the $reverser variable is defined
within the $sortfn subroutine, e.g.

7 my $sortfn = sub {
8 my $reverser = -1;
9 my ($c, $d) = map str2time(substr($_, 43, 16)), ($a, $b);
10 $reverser * ($c <=> $d);
11 };

What's going on?

Good report. I suspect you are running into the bug reported at
http://groups-beta.google.com/group...hread/eefb0a6227a31891/7b14af5525b9d861?hl=en
.. Maybe you could try 5.9.3 on your program and let us know. If it's not
fixed you can dust off perlbug.

str2time does sort() and lc() internally. Apparently the bug has
something to do with sort subs that are closures (moving the definition of
$reverser out of the sub does that).
 
A

anno4000

Peter Scott said:
To reduce my boredom I took grocery_stocker's data from "At a loss how
to sort this file," and I wrote a program to sort it:

1 #!/usr/bin/perl
2
3 use strict;
4 use warnings;
5 use Date::parse;
6 my $reverser = -1;
7
8 my $sortfn = sub {
9 my ($c, $d) = map str2time(substr($_, 43, 16)), ($a, $b);
10 $reverser * ($c <=> $d);
11 };
12
13 @ARGV = 'sort-long-dates.txt';
14 my @data = map uc $_, grep /./, <>;
15 my @sorted = sort $sortfn @data;
16 print join("",@sorted), "\n";
17

The data is sorted as expected, but something strange happens:
Date::parse::str2time lowercases the weekday names. I deliberately
capitalize everything on the line to show that only the dates are affected. [snip]
BTW, the dates are not modified if the $reverser variable is defined
within the $sortfn subroutine, e.g.

7 my $sortfn = sub {
8 my $reverser = -1;
9 my ($c, $d) = map str2time(substr($_, 43, 16)), ($a, $b);
10 $reverser * ($c <=> $d);
11 };

What's going on?

Good report. I suspect you are running into the bug reported at
http://groups-beta.google.com/group...hread/eefb0a6227a31891/7b14af5525b9d861?hl=en
. Maybe you could try 5.9.3 on your program and let us know. If it's not
fixed you can dust off perlbug.

It's still in 5.9.4.
str2time does sort() and lc() internally. Apparently the bug has
something to do with sort subs that are closures (moving the definition of
$reverser out of the sub does that).

It's worse than that. The use of substr() to extract the date is
also necessary to bring the bug out. Reducing the input file to
contain only the dates and writing

my ($c, $d) = map str2time( $_), ($a, $b);

shows the original uppercase dates. Changing it to

my ($c, $d) = map str2time( substr( $_, 0)), ($a, $b);

brings the bug back. So it's not only a sort routine that is a closure,
it's also the lvalue property of substr() that is involved. I don't
envy whoever is going to tackle this. It's not going to be fixed
tomorrow.

Anno
 
M

Mumia W. (reading news)

To reduce my boredom I took grocery_stocker's data from "At a loss how
to sort this file," and I wrote a program to sort it:

1 #!/usr/bin/perl
2
3 use strict;
4 use warnings;
5 use Date::parse;
6 my $reverser = -1;
7
8 my $sortfn = sub {
9 my ($c, $d) = map str2time(substr($_, 43, 16)), ($a, $b);
10 $reverser * ($c <=> $d);
11 };
12
13 @ARGV = 'sort-long-dates.txt';
14 my @data = map uc $_, grep /./, <>;
15 my @sorted = sort $sortfn @data;
16 print join("",@sorted), "\n";
17

The data is sorted as expected, but something strange happens:
Date::parse::str2time lowercases the weekday names. I deliberately
capitalize everything on the line to show that only the dates are affected. [snip]
BTW, the dates are not modified if the $reverser variable is defined
within the $sortfn subroutine, e.g.

7 my $sortfn = sub {
8 my $reverser = -1;
9 my ($c, $d) = map str2time(substr($_, 43, 16)), ($a, $b);
10 $reverser * ($c <=> $d);
11 };

What's going on?

Good report. I suspect you are running into the bug reported at
http://groups-beta.google.com/group...hread/eefb0a6227a31891/7b14af5525b9d861?hl=en
.. Maybe you could try 5.9.3 on your program and let us know. If it's not
fixed you can dust off perlbug.

str2time does sort() and lc() internally. Apparently the bug has
something to do with sort subs that are closures (moving the definition of
$reverser out of the sub does that).

Indeed, that looks like the bug, and it looks like it's still in Perl 5.9.4:

--------PROGRAM FOLLOWS---------

#!/usr/local/bin/perl5.9.4

use strict;
use warnings;
use Date::parse;
my $reverser = -1;

printf ("Perl version: %vd\n", $^V);
printf ("Date::parse version: $Date::parse::VERSION\n");
# print '-------------------', "\n";

# Use a sorting function that is NOT a closure.
my $sortfn1 = sub {
my ($c, $d) = map str2time(substr($_, 27)), ($a, $b);
-1 * ($c <=> $d);
};

# Use a sorting function that IS a closure.
my $sortfn2 = sub {
my ($c, $d) = map str2time(substr($_, 27)), ($a, $b);
$reverser * ($c <=> $d);
};

# Load the data while making everything upper case.
my @data = map uc $_, <DATA>;

# If the the result of using $sortfn2 is different from
# the result of using $sortfn1, this perl interpreter
# is buggy.
my $usingfn1 = join '', sort $sortfn1 @data;
my $usingfn2 = join '', sort $sortfn2 @data;

if ($usingfn1 ne $usingfn2) {
print "We have a buggy perl here.\n";
}


__DATA__
irongeek ttypi Wed Nov 15 17:27
lar ttyp2 Fri Nov 17 17:12
lar ttypa Fri Nov 17 15:53
lar ttypp Thu Nov 16 17:11
lar ttypk Thu Nov 16 14:20
sabre ttyp5 Wed Nov 15 13:59
lar ttyp5 Wed Nov 15 13:57
sabre ttyp5 Wed Nov 15 13:28
sabre ttypc Wed Nov 15 12:10
lar ttypn Thu Nov 16 13:23
lar ttypd Fri Nov 17 07:27
irongeek ttyp2 Thu Nov 16 07:49
sabre ttypg Sat Nov 11 15:56
-------------OUTPUT FOLLOWS--------------
Perl version: 5.9.4
Date::parse version: 2.27
We have a buggy perl here.
-------------OUTPUT ENDS-----------------

That program demonstrates that when the sorting subroutine is not a
closure, the lowercasing effect of Date::parse::str2time is NOT
permitted to modify the original data, and when the sorting subroutine
IS a closure, the lowercasing effect of Date::parse::str2time IS
permitted to modify the original data.

This is probably a bug in the perl interpreter.

I'm using Debian GNU/Linux 3.1 i386.

---
Flags:
category=
severity=
---
Site configuration information for perl 5.9.4:

Configured by (user) at Mon Nov 20 03:07:07 CST 2006.

Summary of my perl5 (revision 5 version 9 subversion 4) configuration:
Platform:
osname=linux, osvers=2.4.27-3-386,
archname=i686-linux-thread-multi-64int
uname='linux dike 2.4.27-3-386 #1 thu sep 14 08:44:58 utc 2006 i686
gnulinux '
config_args='-Dusethreads -Dprefix=/usr/local/share/perl-5.9
-Duse64bitint -Dcc=gcc -Dusedevel -Duseperlio -Dman1ext=1perl
-Dman3ext=3perl -de'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
optimize='-O2',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-fno-strict-aliasing -pipe -I/usr/local/include'
ccversion='', gccversion='3.3.5 (Debian 1:3.3.5-13)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.3.2'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:


---
@INC for perl 5.9.4:
/usr/local/lib/CPAN/share/perl
/usr/local/lib/CPAN/lib/perl
/usr/local/lib/CPAN/lib/site_perl/5.9.4/i686-linux-thread-multi-64int
/usr/local/lib/CPAN/lib/site_perl/5.9.4
/usr/local/share/perl-5.9/lib/5.9.4/i686-linux-thread-multi-64int
/usr/local/share/perl-5.9/lib/5.9.4

/usr/local/share/perl-5.9/lib/site_perl/5.9.4/i686-linux-thread-multi-64int
/usr/local/share/perl-5.9/lib/site_perl/5.9.4
 
M

Mumia W. (reading news)

To reduce my boredom I took grocery_stocker's data from "At a loss
how to sort this file," and I wrote a program to sort it:

1 #!/usr/bin/perl
2
3 use strict;
4 use warnings;
5 use Date::parse;
6 my $reverser = -1;
7
8 my $sortfn = sub {
9 my ($c, $d) = map str2time(substr($_, 43, 16)), ($a, $b);
10 $reverser * ($c <=> $d);
11 };
12
13 @ARGV = 'sort-long-dates.txt';
14 my @data = map uc $_, grep /./, <>;
15 my @sorted = sort $sortfn @data;
16 print join("",@sorted), "\n";
17

The data is sorted as expected, but something strange happens:
Date::parse::str2time lowercases the weekday names. I deliberately
capitalize everything on the line to show that only the dates are
affected. [snip]
BTW, the dates are not modified if the $reverser variable is defined
within the $sortfn subroutine, e.g.

7 my $sortfn = sub {
8 my $reverser = -1;
9 my ($c, $d) = map str2time(substr($_, 43, 16)), ($a, $b);
10 $reverser * ($c <=> $d);
11 };

What's going on?

Good report. I suspect you are running into the bug reported at
http://groups-beta.google.com/group...hread/eefb0a6227a31891/7b14af5525b9d861?hl=en

.. Maybe you could try 5.9.3 on your program and let us know. If
it's not
fixed you can dust off perlbug.

str2time does sort() and lc() internally. Apparently the bug has
something to do with sort subs that are closures (moving the
definition of
$reverser out of the sub does that).

Indeed, that looks like the bug, and it looks like it's still in Perl
5.9.4:

--------PROGRAM FOLLOWS---------

#!/usr/local/bin/perl5.9.4

use strict;
use warnings;
use Date::parse;
my $reverser = -1;

printf ("Perl version: %vd\n", $^V);
printf ("Date::parse version: $Date::parse::VERSION\n");
# print '-------------------', "\n";

# Use a sorting function that is NOT a closure.
my $sortfn1 = sub {
my ($c, $d) = map str2time(substr($_, 27)), ($a, $b);
-1 * ($c <=> $d);
};

# Use a sorting function that IS a closure.
my $sortfn2 = sub {
my ($c, $d) = map str2time(substr($_, 27)), ($a, $b);
$reverser * ($c <=> $d);
};

# Load the data while making everything upper case.
my @data = map uc $_, <DATA>;

# If the the result of using $sortfn2 is different from
# the result of using $sortfn1, this perl interpreter
# is buggy.
my $usingfn1 = join '', sort $sortfn1 @data;
my $usingfn2 = join '', sort $sortfn2 @data;

if ($usingfn1 ne $usingfn2) {
print "We have a buggy perl here.\n";
}


__DATA__
irongeek ttypi Wed Nov 15 17:27
lar ttyp2 Fri Nov 17 17:12
lar ttypa Fri Nov 17 15:53
lar ttypp Thu Nov 16 17:11
lar ttypk Thu Nov 16 14:20
sabre ttyp5 Wed Nov 15 13:59
lar ttyp5 Wed Nov 15 13:57
sabre ttyp5 Wed Nov 15 13:28
sabre ttypc Wed Nov 15 12:10
lar ttypn Thu Nov 16 13:23
lar ttypd Fri Nov 17 07:27
irongeek ttyp2 Thu Nov 16 07:49
sabre ttypg Sat Nov 11 15:56
-------------OUTPUT FOLLOWS--------------
Perl version: 5.9.4
Date::parse version: 2.27
We have a buggy perl here.
-------------OUTPUT ENDS-----------------

That program demonstrates that when the sorting subroutine is not a
closure, the lowercasing effect of Date::parse::str2time is NOT
permitted to modify the original data, and when the sorting subroutine
IS a closure, the lowercasing effect of Date::parse::str2time IS
permitted to modify the original data.

This is probably a bug in the perl interpreter.

I'm using Debian GNU/Linux 3.1 i386.

---
Flags:
category=
severity=
---
Site configuration information for perl 5.9.4:

Configured by (user) at Mon Nov 20 03:07:07 CST 2006.

Summary of my perl5 (revision 5 version 9 subversion 4) configuration:
Platform:
osname=linux, osvers=2.4.27-3-386,
archname=i686-linux-thread-multi-64int
uname='linux dike 2.4.27-3-386 #1 thu sep 14 08:44:58 utc 2006 i686
gnulinux '
config_args='-Dusethreads -Dprefix=/usr/local/share/perl-5.9
-Duse64bitint -Dcc=gcc -Dusedevel -Duseperlio -Dman1ext=1perl
-Dman3ext=3perl -de'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
optimize='-O2',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-fno-strict-aliasing -pipe -I/usr/local/include'
ccversion='', gccversion='3.3.5 (Debian 1:3.3.5-13)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.3.2'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:


---
@INC for perl 5.9.4:
/usr/local/lib/CPAN/share/perl
/usr/local/lib/CPAN/lib/perl
/usr/local/lib/CPAN/lib/site_perl/5.9.4/i686-linux-thread-multi-64int
/usr/local/lib/CPAN/lib/site_perl/5.9.4
/usr/local/share/perl-5.9/lib/5.9.4/i686-linux-thread-multi-64int
/usr/local/share/perl-5.9/lib/5.9.4

/usr/local/share/perl-5.9/lib/site_perl/5.9.4/i686-linux-thread-multi-64int
/usr/local/share/perl-5.9/lib/site_perl/5.9.4
.

---
Environment for perl 5.9.4:
HOME=/home/(user)
LANG=en_US
LANGUAGE=en_US:en_GB:en
LC_CTYPE=en_US.ISO-8859-1
LD_LIBRARY_PATH (unset)
LOGDIR (unset)

PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/sbin:/usr/sbin:.:/home/(user)/bin:/usr/local/share/perl-5.9/bin


PERL5LIB=/usr/local/lib/CPAN/share/perl:/usr/local/lib/CPAN/lib/perl:/usr/local/lib/CPAN/lib/site_perl/5.9.4

PERL_BADLANG (unset)
SHELL=/bin/bash

There are a few workarounds for this bug:

1) Copy the value returned by substr() using "my":
my ($c, $d) = map str2time(my $vv = substr($_, 27)), ($a, $b);

2) Create copies of $a and $b before giving them to map:
my @copies = ($a, $b);
my ($c, $d) = map str2time(substr($_, 27)), @copies;
... or do it in one line ...
my ($c, $d) = map str2time(substr($_, 27)), @{[ $a, $b ]};
... or, for the truly exotic ...
my ($c, $d) = map str2time(substr($_, 27)), sub { @_ }->($a, $b);


3) Sidestep the whole issue of substr() by extracting the data using a
match (it's almost cheating :p ):
my ($c, $d) = map str2time((/^.{25}(.*)$/)[0]), ($a, $b);

4) Make $reverser a package variable (declared with our) and access it
through the symbol table:

our $reverser = -1;

my $sortfn = sub {
my ($c, $d) = map str2time(substr($_, 27)), ($a, $b);
${*reverser} * ($c <=> $d);
};

Note that the bug still occurs if $reverser is accessed directly--even
as a package variable.

5) Use the environment hash %ENV:

$ENV{REVERSER} = '-1';
my $sortfn = sub {
my ($c, $d) = map str2time(substr($_, 27)), ($a, $b);
$ENV{REVERSER} * ($c <=> $d);
};

Note that using normal hashers--declared either "my" our "our"--allows
the bug to happen.

6) Use a named (non-anonymous) comparison subroutine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top