(beginner question) downsample data points using chained maps

Oscar Stiffelman · Dec 3, 2004

I have a vector of numeric points specified as 2 cols in a text file.
I want to break the points into windows and compute averages for each
window.

And (this is important), I am looking for a compact perlish way to do
it, ideally using the more "functional programming" parts of the
language, as opposed to the direct, nested for loops.

I am new to Perl, so I would appreciate feedback on the approach I
have taken below.

#!/usr/bin/perl
$w = shift; # this is the block size
die "must specify window size" if !$w;
@x = sort {@$a[0] <=> @$b[0]} map {[split]} <>;
print "@$_\n" for map {
($s0,$s1)=(0,0);
for(@$_) {
$s0 += @$_[0];
$s1 += @$_[1]
};
[$s0/$w, $s1/$w]
} map {
[@x[$_*$w .. ($_+1)*$w-1]]
} (0..$#x/$w);

# In pseudocode, this is what it does:
for(i = 0; i < n; i+= width) {
(x,y) = (0,0);
for(int j = 0; j < width; j++) {
x += vec[i+j][0];
y += vec[i+j][1];
}
push results, [x/width, y/width];
}

Can anyone suggest an even more compact or idiomatically-correct way
to do this?

P.S.,
I am evaluating using Perl to do some of the data manipulation and
analysis that I currently perform in Mathematica. In Mathematica,
lisp-ish mapping of anonymous functions is really common (and
convenient), so I am trying to decide if similar transformations can
be expressed compactly in perl.

Anno Siegel · Dec 3, 2004

Oscar Stiffelman said:
I have a vector of numeric points specified as 2 cols in a text file.
I want to break the points into windows and compute averages for each
window.

And (this is important), I am looking for a compact perlish way to do

Why is this *important*? A prototype that does what you want, even
in a less than elegant way, would be worth more at this point.

it, ideally using the more "functional programming" parts of the
language, as opposed to the direct, nested for loops.

I am new to Perl, so I would appreciate feedback on the approach I
have taken below.

You're going about this the wrong way. Start out with something
pedestrian and simple. When it *works*, you can transform it into
tight code, if you want.

#!/usr/bin/perl

No warnings, no strict. Add them, and the variable declarations that
are then necessary.

$w = shift; # this is the block size
die "must specify window size" if !$w;
@x = sort {@$a[0] <=> @$b[0]} map {[split]} <>;

What is this sort for? It sorts by the number of items in each line,
which is 2 every time by your own specification.

print "@$_\n" for map {
($s0,$s1)=(0,0);
for(@$_) {
$s0 += @$_[0];
$s1 += @$_[1]
};
[$s0/$w, $s1/$w]

If $w doesn't divide the number of lines, you may end up with a
final window of fewer than $w points. It would be better to
divide by the actual number of points instead of $w.

} map {
[@x[$_*$w .. ($_+1)*$w-1]]

This is fragile if $w doesn't divide the number of records. splice()
would be a better tool.

} (0..$#x/$w);

When I run this it prints out a series of pairs of zeroes for me
(the right number of pairs, but zero). I'm not going to debug it.

# In pseudocode, this is what it does:
for(i = 0; i < n; i+= width) {
(x,y) = (0,0);
for(int j = 0; j < width; j++) {
x += vec[i+j][0];
y += vec[i+j][1];
}
push results, [x/width, y/width];
}

It would have been better to make this a working Perl solution first.

Can anyone suggest an even more compact or idiomatically-correct way
to do this?

Here is how I would do it:

my @data = <DATA>;
my @res;
while ( @data ) {
my @chunk = splice @data, 0, $w;
my ( $x_mean, $y_mean);
for ( @chunk ) {
my ( $x, $y) = split;
$x_mean += $x;
$y_mean += $y;
}
push @res, [ $x_mean/@chunk, $y_mean/@chunk];
}
print "@$_\n" for @res;

Now, if you want, you can rework some of the loops as map()s:

my @data = <DATA>;
print "@$_\n" for map {
my ( $x, $y);
for ( @$_ ) {
$x += $_->[ 0];
$y += $_->[ 1];
}
[ $x/@$_, $y/@$_];
} map [ map [ split], splice( @data, 0, $w)], 0 .. $#data/$w;

Anno

Oscar Stiffelman · Dec 4, 2004

Why is this *important*? A prototype that does what you want, even
in a less than elegant way, would be worth more at this point.

Because this is just a toy problem I chose to understand the power of
the language for working with this kind of data (lists of
n-dimensional numbers). I am especially interested in Perl's support
for anonymous functions because I have found mappping of anonymous
functions to be a really convenient way to interact with this kind of
data in other languages.

Ultimately, what I am interested in is the type of transformations
that I can compactly express on the command line (using perl -e)
instead of from within a specific math environment.

I hate the context shifts that come from
1. c++ program produces data
2. start math software, load data, transform,plot,transform,plot,...
3. modify c++ program and goto 1

I am hoping to move to something more like:
c++ program | perl -e 'some transformation' | plot
for at least some fraction of my work.

it, ideally using the more "functional programming" parts of the
language, as opposed to the direct, nested for loops.

I am new to Perl, so I would appreciate feedback on the approach I
have taken below.

Click to expand...

You're going about this the wrong way. Start out with something
pedestrian and simple. When it *works*, you can transform it into
tight code, if you want.

#!/usr/bin/perl

Click to expand...

No warnings, no strict. Add them, and the variable declarations that
are then necessary.

$w = shift; # this is the block size
die "must specify window size" if !$w;
@x = sort {@$a[0] <=> @$b[0]} map {[split]} <>;

Click to expand...

What is this sort for? It sorts by the number of items in each line,
which is 2 every time by your own specification.

print "@$_\n" for map {
($s0,$s1)=(0,0);
for(@$_) {
$s0 += @$_[0];
$s1 += @$_[1]
};
[$s0/$w, $s1/$w]

Click to expand...

If $w doesn't divide the number of lines, you may end up with a
final window of fewer than $w points. It would be better to
divide by the actual number of points instead of $w.

} map {
[@x[$_*$w .. ($_+1)*$w-1]]

Click to expand...

This is fragile if $w doesn't divide the number of records. splice()
would be a better tool.

} (0..$#x/$w);

Click to expand...

When I run this it prints out a series of pairs of zeroes for me
(the right number of pairs, but zero). I'm not going to debug it.

# In pseudocode, this is what it does:
for(i = 0; i < n; i+= width) {
(x,y) = (0,0);
for(int j = 0; j < width; j++) {
x += vec[i+j][0];
y += vec[i+j][1];
}

Click to expand...

push results, [x/width, y/width];

}

Click to expand...

It would have been better to make this a working Perl solution first.

Can anyone suggest an even more compact or idiomatically-correct way
to do this?

Click to expand...

Here is how I would do it:

my @data = <DATA>;
my @res;
while ( @data ) {
my @chunk = splice @data, 0, $w;
my ( $x_mean, $y_mean);
for ( @chunk ) {
my ( $x, $y) = split;
$x_mean += $x;
$y_mean += $y;
}
push @res, [ $x_mean/@chunk, $y_mean/@chunk];
}
print "@$_\n" for @res;

Now, if you want, you can rework some of the loops as map()s:

my @data = <DATA>;
print "@$_\n" for map {
my ( $x, $y);
for ( @$_ ) {
$x += $_->[ 0];
$y += $_->[ 1];
}
[ $x/@$_, $y/@$_];
} map [ map [ split], splice( @data, 0, $w)], 0 .. $#data/$w;

Anno

Thanks, this is great.

-- Oscar

Need help in debugging tic tac toe (beginner)	2	Jun 28, 2023
Need help in debugging tic tac toe (Beginner)	0	Jun 28, 2023
Var and let simple beginner problem	1	Nov 8, 2019
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
Need Help: Program to Accept 2 Matrices and Show their Sum	0	Aug 21, 2022
2 JK Circuit in VHDL	0	Mar 29, 2019
Translater + module + tkinter	1	Feb 16, 2023
Help me fix this bug in my program! Displaying chart data	1	Apr 8, 2023

(beginner question) downsample data points using chained maps

Oscar Stiffelman

Anno Siegel

Oscar Stiffelman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads