(beginner question) downsample data points using chained maps

  • Thread starter Oscar Stiffelman
  • Start date
O

Oscar Stiffelman

I have a vector of numeric points specified as 2 cols in a text file.
I want to break the points into windows and compute averages for each
window.

And (this is important), I am looking for a compact perlish way to do
it, ideally using the more "functional programming" parts of the
language, as opposed to the direct, nested for loops.

I am new to Perl, so I would appreciate feedback on the approach I
have taken below.

#!/usr/bin/perl
$w = shift; # this is the block size
die "must specify window size" if !$w;
@x = sort {@$a[0] <=> @$b[0]} map {[split]} <>;
print "@$_\n" for map {
($s0,$s1)=(0,0);
for(@$_) {
$s0 += @$_[0];
$s1 += @$_[1]
};
[$s0/$w, $s1/$w]
} map {
[@x[$_*$w .. ($_+1)*$w-1]]
} (0..$#x/$w);


# In pseudocode, this is what it does:
for(i = 0; i < n; i+= width) {
(x,y) = (0,0);
for(int j = 0; j < width; j++) {
x += vec[i+j][0];
y += vec[i+j][1];
}
push results, [x/width, y/width];
}


Can anyone suggest an even more compact or idiomatically-correct way
to do this?

P.S.,
I am evaluating using Perl to do some of the data manipulation and
analysis that I currently perform in Mathematica. In Mathematica,
lisp-ish mapping of anonymous functions is really common (and
convenient), so I am trying to decide if similar transformations can
be expressed compactly in perl.
 
A

Anno Siegel

Oscar Stiffelman said:
I have a vector of numeric points specified as 2 cols in a text file.
I want to break the points into windows and compute averages for each
window.

And (this is important), I am looking for a compact perlish way to do

Why is this *important*? A prototype that does what you want, even
in a less than elegant way, would be worth more at this point.
it, ideally using the more "functional programming" parts of the
language, as opposed to the direct, nested for loops.

I am new to Perl, so I would appreciate feedback on the approach I
have taken below.

You're going about this the wrong way. Start out with something
pedestrian and simple. When it *works*, you can transform it into
tight code, if you want.
#!/usr/bin/perl

No warnings, no strict. Add them, and the variable declarations that
are then necessary.
$w = shift; # this is the block size
die "must specify window size" if !$w;
@x = sort {@$a[0] <=> @$b[0]} map {[split]} <>;

What is this sort for? It sorts by the number of items in each line,
which is 2 every time by your own specification.
print "@$_\n" for map {
($s0,$s1)=(0,0);
for(@$_) {
$s0 += @$_[0];
$s1 += @$_[1]
};
[$s0/$w, $s1/$w]

If $w doesn't divide the number of lines, you may end up with a
final window of fewer than $w points. It would be better to
divide by the actual number of points instead of $w.
} map {
[@x[$_*$w .. ($_+1)*$w-1]]

This is fragile if $w doesn't divide the number of records. splice()
would be a better tool.
} (0..$#x/$w);

When I run this it prints out a series of pairs of zeroes for me
(the right number of pairs, but zero). I'm not going to debug it.
# In pseudocode, this is what it does:
for(i = 0; i < n; i+= width) {
(x,y) = (0,0);
for(int j = 0; j < width; j++) {
x += vec[i+j][0];
y += vec[i+j][1];
}
push results, [x/width, y/width];
}

It would have been better to make this a working Perl solution first.
Can anyone suggest an even more compact or idiomatically-correct way
to do this?

Here is how I would do it:

my @data = <DATA>;
my @res;
while ( @data ) {
my @chunk = splice @data, 0, $w;
my ( $x_mean, $y_mean);
for ( @chunk ) {
my ( $x, $y) = split;
$x_mean += $x;
$y_mean += $y;
}
push @res, [ $x_mean/@chunk, $y_mean/@chunk];
}
print "@$_\n" for @res;

Now, if you want, you can rework some of the loops as map()s:

my @data = <DATA>;
print "@$_\n" for map {
my ( $x, $y);
for ( @$_ ) {
$x += $_->[ 0];
$y += $_->[ 1];
}
[ $x/@$_, $y/@$_];
} map [ map [ split], splice( @data, 0, $w)], 0 .. $#data/$w;

Anno
 
O

Oscar Stiffelman

Why is this *important*? A prototype that does what you want, even
in a less than elegant way, would be worth more at this point.
Because this is just a toy problem I chose to understand the power of
the language for working with this kind of data (lists of
n-dimensional numbers). I am especially interested in Perl's support
for anonymous functions because I have found mappping of anonymous
functions to be a really convenient way to interact with this kind of
data in other languages.

Ultimately, what I am interested in is the type of transformations
that I can compactly express on the command line (using perl -e)
instead of from within a specific math environment.

I hate the context shifts that come from
1. c++ program produces data
2. start math software, load data, transform,plot,transform,plot,...
3. modify c++ program and goto 1

I am hoping to move to something more like:
c++ program | perl -e 'some transformation' | plot
for at least some fraction of my work.

it, ideally using the more "functional programming" parts of the
language, as opposed to the direct, nested for loops.

I am new to Perl, so I would appreciate feedback on the approach I
have taken below.

You're going about this the wrong way. Start out with something
pedestrian and simple. When it *works*, you can transform it into
tight code, if you want.
#!/usr/bin/perl

No warnings, no strict. Add them, and the variable declarations that
are then necessary.
$w = shift; # this is the block size
die "must specify window size" if !$w;
@x = sort {@$a[0] <=> @$b[0]} map {[split]} <>;

What is this sort for? It sorts by the number of items in each line,
which is 2 every time by your own specification.
print "@$_\n" for map {
($s0,$s1)=(0,0);
for(@$_) {
$s0 += @$_[0];
$s1 += @$_[1]
};
[$s0/$w, $s1/$w]

If $w doesn't divide the number of lines, you may end up with a
final window of fewer than $w points. It would be better to
divide by the actual number of points instead of $w.
} map {
[@x[$_*$w .. ($_+1)*$w-1]]

This is fragile if $w doesn't divide the number of records. splice()
would be a better tool.
} (0..$#x/$w);

When I run this it prints out a series of pairs of zeroes for me
(the right number of pairs, but zero). I'm not going to debug it.
# In pseudocode, this is what it does:
for(i = 0; i < n; i+= width) {
(x,y) = (0,0);
for(int j = 0; j < width; j++) {
x += vec[i+j][0];
y += vec[i+j][1];
}
push results, [x/width, y/width];

It would have been better to make this a working Perl solution first.
Can anyone suggest an even more compact or idiomatically-correct way
to do this?

Here is how I would do it:

my @data = <DATA>;
my @res;
while ( @data ) {
my @chunk = splice @data, 0, $w;
my ( $x_mean, $y_mean);
for ( @chunk ) {
my ( $x, $y) = split;
$x_mean += $x;
$y_mean += $y;
}
push @res, [ $x_mean/@chunk, $y_mean/@chunk];
}
print "@$_\n" for @res;

Now, if you want, you can rework some of the loops as map()s:

my @data = <DATA>;
print "@$_\n" for map {
my ( $x, $y);
for ( @$_ ) {
$x += $_->[ 0];
$y += $_->[ 1];
}
[ $x/@$_, $y/@$_];
} map [ map [ split], splice( @data, 0, $w)], 0 .. $#data/$w;

Anno

Thanks, this is great.

-- Oscar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,439
Latest member
shasuze

Latest Threads

Top