Arrays instead of files into hashes

F

Francois Massion

I have code which compares 2 lists of words and churns out the
difference. So far I have done this by reading the content of files
into a hash, but I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.

I haven't managed to do this successfuly. Any suggestions? Her the
functioning code with the files:

#!/usr/bin/perl -w
use warnings; use strict;
use utf8; #
binmode STDIN, ":utf8"; # input
binmode STDOUT, ":utf8"; # output

my %list2;
open(WORDLIST2,'C:\temp\b.txt') or
die("cannot open 'C:\temp\b.txt' because $! !\n");
while (<WORDLIST2>) {
$list2{$_}=1; #or any other value
}
open(WORDLIST1,'C:\temp\a.txt') or
die("cannot open file1 because $! !\n");
while (<WORDLIST1>){
print unless exists($list2{$_});
}
 
T

Tim Greer

Francois said:
I have code which compares 2 lists of words and churns out the
difference. So far I have done this by reading the content of files
into a hash, but I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.

I haven't managed to do this successfuly. Any suggestions? Her the
functioning code with the files:


The idea of using hashes instead of arrays, is better, why do you want
to change it to arrays? You mean arrays to hold the data that is
usually located in the {a,b}.txt files? Where is that information
being generated from where you could have the data saved into an array,
or are you asking if you can take the data from the files and put them
into arrays? Either way, you shou;dn't want to do that. If you are
using data generated from somewhere, then add that new value to a hash
and see if it exists are you receive the data.

There's no reason to use an array first. If you want to take the data
from the files, just do what you originally were wanting to do, and use
a hash, too. Or do you mean that you want to save the final, unique
values into an array, after you've compared using a hash solution?
Also, you mentioned you wanted to use arrays, but it's not working, so
can you show the code you're trying to use with arrays and what you
expect and what's not working? Sorry, I'm a little confused by what
you're trying to accomplish, so please elaborate.
 
J

J. Gleixner

Francois said:
I have code which compares 2 lists of words and churns out the
difference. So far I have done this by reading the content of files
into a hash, but I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.

I haven't managed to do this successfuly. Any suggestions? Her the
functioning code with the files:

#!/usr/bin/perl -w
use warnings; use strict;
use utf8; #
binmode STDIN, ":utf8"; # input
binmode STDOUT, ":utf8"; # output

my %list2;
open(WORDLIST2,'C:\temp\b.txt') or
die("cannot open 'C:\temp\b.txt' because $! !\n");
while (<WORDLIST2>) {
$list2{$_}=1; #or any other value
}
open(WORDLIST1,'C:\temp\a.txt') or
die("cannot open file1 because $! !\n");
while (<WORDLIST1>){
print unless exists($list2{$_});
}

If you already have @array1, you could do the following to
create a hash that has the unique values from @array1.

my %list2 = map { $_ => 1 } @array1;

See also: perldoc -q "How do I compute the difference of two arrays"
 
S

sln

I have code which compares 2 lists of words and churns out the
difference. So far I have done this by reading the content of files
into a hash, but I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.

I haven't managed to do this successfuly. Any suggestions? Her the
functioning code with the files:

#!/usr/bin/perl -w
use warnings; use strict;
use utf8; #
binmode STDIN, ":utf8"; # input
binmode STDOUT, ":utf8"; # output

my %list2;
open(WORDLIST2,'C:\temp\b.txt') or
die("cannot open 'C:\temp\b.txt' because $! !\n");
while (<WORDLIST2>) {
$list2{$_}=1; #or any other value
}
open(WORDLIST1,'C:\temp\a.txt') or
die("cannot open file1 because $! !\n");
while (<WORDLIST1>){
print unless exists($list2{$_});
}

This some real old code. True beginner code.
It resolves into the common elements of two arrays.
I know your looking for the difference but you could
extrapolate difference from common. It works for what I
use it for, merging directories, renaming dups.

If you want the whole program for usage exampl just say so.
Good luck!

sln

#######################################################
# Get Common Elements (from two N-dimensioned Array's)
# IN - Refs to the NxN arrays to compare,
# sort flag and the compare field.
# OUT - Ndx's into Right_Array of matching elements
# ---------------------------------------------------
# Notes -
# 1. Elements are assumed textual and case insensitive
# 2. Ignores in-array duplicates
# 3. Sort will be done if sort flag > 0
#
sub GetCommonElementsNxM($$$$)
{
my ($A_Left,$A_Right,$Srtflg,$Fld) = @_;
$Srtflg = 0 unless defined $Srtflg;
$Fld = 0 unless defined $Fld;
# my @Dup = ();
my @Ndx = ();

if ($Srtflg > 0) {
@{$A_Left} = sort {uc($a->[$Fld]) cmp uc($b->[$Fld])} @{$A_Left};
@{$A_Right} = sort {uc($a->[$Fld]) cmp uc($b->[$Fld])} @{$A_Right};
} else {print "==> Common Elements : Not sorting arrays\n";}

my $rpos = 0;
my $rend = @{$A_Right};
my $cnt = 0;
my $llast = undef;
my $rlast = undef;
foreach my $left_element (@{$A_Left})
{
next if (uc($left_element->[$Fld]) eq uc($llast->[$Fld]));

$rpos += $cnt;
$cnt = 0;
foreach my $right_element (@{$A_Right}[$rpos..($rend-1)])
{
last if (uc($left_element->[$Fld]) lt uc($right_element->[$Fld]));
$cnt++;
next if (uc($right_element->[$Fld]) eq uc($rlast->[$Fld]));
if (uc($left_element->[$Fld]) eq uc($right_element->[$Fld]))
{
# push (@Dup, $right_element->[$Fld]); # the string
push (@Ndx, $rpos+$cnt-1); # the index into R_Array
last;
}
$rlast = $right_element;
}
$llast = $left_element;
last if ($rpos >= $rend);
}
# return (@Dup);
return (@Ndx);
}


#######################################################
# Get Common Elements from single Array's
# IN - Refs to the Nx1 arrays to compare, sort flag
# OUT - Ndx's into Right_Array of matching elements
# ---------------------------------------------------
# Notes -
# 1. Elements are assumed textual and case insensitive
# 2. Ignores in-array duplicates
# 3. Sort will be done if sort flag > 0
#######################################################
sub GetCommonElements($$$)
{
my ($A_Left,$A_Right,$Srtflg) = @_;
$Srtflg = 0 unless defined $Srtflg;
# my @Dup = ();
my @Ndx = ();

if ($Srtflg > 0) {
@{$A_Left} = sort {uc($a) cmp uc($b)} @{$A_Left};
@{$A_Right} = sort {uc($a) cmp uc($b)} @{$A_Right};
} else {print "==> Common Elements : Not sorting arrays\n";}

my $rpos = 0;
my $rend = @{$A_Right};
my $cnt = 0;
my $llast = '';
my $rlast = '';
foreach my $left_element (@{$A_Left})
{
next if (uc($left_element) eq uc($llast));

$rpos += $cnt;
$cnt = 0;
foreach my $right_element (@{$A_Right}[$rpos..($rend-1)])
{
last if (uc($left_element) lt uc($right_element));
$cnt++;
next if (uc($right_element) eq uc($rlast));
if (uc($left_element) eq uc($right_element))
{
# push (@Dup, $right_element); # the string
push (@Ndx, $rpos+$cnt-1); # the index into R_Array
last;
}
$rlast = $right_element;
}
$llast = $left_element;
last if ($rpos >= $rend);
}
# return (@Dup);
return (@Ndx);
}
 
T

Tad J McClellan

Francois Massion said:
I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.

while (<WORDLIST2>) {
$list2{$_}=1; #or any other value
}


my @b_txt = <WORDLIST2>;
close WORDLIST2;
foreach ( @b_txt ) {
$list2{$_} = 1;
}
 
T

Tim Greer

my @b_txt = <WORDLIST2>;
Isin't slurp a bit rich?
[snip]

sln

Probably, but that's what the OP asked for. I can't imagine why, but
perhaps they just need to elaborate on their reasons to get a good
answer. Personally, I didn't answer them, because I had to ask what
they were wanting to do, since it didn't seem clear to me (or didn't
seem to have a purpose and would just waste good processing).
 
S

sln

I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.


while (<WORDLIST2>) {
$list2{$_}=1; #or any other value
}


my @b_txt = <WORDLIST2>;
Isin't slurp a bit rich?
[snip]

sln

Probably, but that's what the OP asked for. I can't imagine why, but
perhaps they just need to elaborate on their reasons to get a good
answer. Personally, I didn't answer them, because I had to ask what
they were wanting to do, since it didn't seem clear to me (or didn't
seem to have a purpose and would just waste good processing).

There's a thin line between yes and no, but if its not yes....

sln
 
T

Tim Greer

On Mon, 12 Jan 2009 17:55:36 -0600, Tad J McClellan


I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.


while (<WORDLIST2>) {
$list2{$_}=1; #or any other value
}


my @b_txt = <WORDLIST2>;
Isin't slurp a bit rich?
[snip]

sln

Probably, but that's what the OP asked for. I can't imagine why, but
perhaps they just need to elaborate on their reasons to get a good
answer. Personally, I didn't answer them, because I had to ask what
they were wanting to do, since it didn't seem clear to me (or didn't
seem to have a purpose and would just waste good processing).

There's a thin line between yes and no, but if its not yes....

sln

Its (its) or it's (it is) not yes, and what isn't? I'd say it is yes
(being it's "probably" going to be more resource intensive to grab all
of the contents of WORLDLIST2 into an array), only to compare then,
comparing it to a while loop to see if a hash value exists before
adding it (if the OP wants to add it). I absolutely don't understand
your cryptic response. I'm certain you were being sarcastic, but I
don't know if you mean to have a question in there.
 
S

sln

(e-mail address removed) wrote:

On Mon, 12 Jan 2009 17:55:36 -0600, Tad J McClellan


I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.


while (<WORDLIST2>) {
$list2{$_}=1; #or any other value
}


my @b_txt = <WORDLIST2>;
Isin't slurp a bit rich?
[snip]

sln

Probably, but that's what the OP asked for. I can't imagine why, but
perhaps they just need to elaborate on their reasons to get a good
answer. Personally, I didn't answer them, because I had to ask what
they were wanting to do, since it didn't seem clear to me (or didn't
seem to have a purpose and would just waste good processing).

There's a thin line between yes and no, but if its not yes....

sln

Its (its) or it's (it is) not yes, and what isn't? I'd say it is yes
(being it's "probably" going to be more resource intensive to grab all
of the contents of WORLDLIST2 into an array), only to compare then,
comparing it to a while loop to see if a hash value exists before
adding it (if the OP wants to add it). I absolutely don't understand
your cryptic response. I'm certain you were being sarcastic, but I
don't know if you mean to have a question in there.

This no joke! I don't write anything unles I need to!
I said before this is beginner code. No sarcasm inteneded.
I do not want to go through this. The prototypes are wrong along
with alot of other stuff. Amazingly, this heards all the dup porn out
there into safely stored data. Go figure... it works like a champ!!

sln

mrg_test1.pl
----------

use strict;
use warnings;
require SMG;

use File::stat;
use File::Spec;
use File::path;
use File::Copy;
use File::Find;
use sort 'stable';

my $VERSION = 1.00;

my $current = sort::current();
#print "\n==> sort : $current\n\n";

# Test smerge alone
#-----------------------
if (1)
{
#
my $ToDir = 'D:\agent\t1';
my $FromDir = 'D:\agent\t2';
my $PrefixName = time()."_";
my $PrefixLevel = 5;
my @Exclude = ('\.jbf$', '\.alb$'); # if all goes ok, these will be rmdir'd with the from, but not moved
my $Duponly = 0;
my $RmvFrom = 0;

if (SMG::SafeMerge ($ToDir,$FromDir,$PrefixName,$PrefixLevel,\@Exclude,$Duponly,$RmvFrom)) {
print "\nSafe Merge exited ok.\n";
} else { print "\nSafe Merge had a headache.\n"; }

}

---------------------
smg.pm
----------
package SMG;
use strict;
use warnings;
use File::stat;
use File::path;
use File::Copy;
use File::Find;
use File::Spec;
use sort 'stable';


require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw(SafeMerge GetCommonElementsNxM GetCommonElements);
my $VERSION = 1.00;


#my $current = sort::current();



####################################################################################
# SAFE MERGE
# The two major options -
# 1. Merge 2 directories into 1, preserving/renaming duplicate named files.
# 2. Compare 2 directories and just rename the duplicates of 1 directory.
# Also -
# - The prefix level will be on a per-file monitored basis.
# - File/Dir names are cached and renamed in the "From" array,
# - Move is done via os-rename function into the "ToDir".
# - Remove "From" directory option (after rename).
# - Duplicate's-renaming-only is done in-place in the "FromDir".
# - Report output is created in the current directory (for now).
# - Itteration scheme are within-directory, then cross-directory until no more dups.
#-----------------------------------------------------------------------------------
sub SafeMerge ($$$$$$$)
{
return 0 if (@_ < 2 || @_ > 7);
my ($ToDir,$FromDir,$PrefixName,$PrefixLevel,$Exclude,$Duponly,$RmvFrom) = @_;

$PrefixName = '' unless defined $PrefixName; # Renaming Prefix
$PrefixLevel = 0 unless defined $PrefixLevel; # Number of times to += prefix

$Exclude = [] unless defined $Exclude; # Exclude list (must be re)
$Duponly = 0 unless defined $Duponly; # Flag to just rename dups in "from"
$RmvFrom = 0 unless defined $RmvFrom; # Flag to remove "from" after operation

my @To = ();
my @From = ();
my @ndx = ();
my @Xfound = ();
my $dupsize = @ndx;
my $passes = 0;

print "\n\n *******************************\n";
print "********[ Safe Merge ]*********\n";
print " *******************************\n";
print "\nPrefix\t$PrefixName\nLevel\t$PrefixLevel\nChecks ... ";

$ToDir = File::Spec->canonpath( "$ToDir" );
$FromDir = File::Spec->canonpath( "$FromDir" );
#print "$ToDir\n$FromDir";
if ($ToDir eq $FromDir) { print "\rFrom/To are identical: $ToDir\n"; return 0; }
if (!-e $ToDir) { print "\rTo directory does not exist: $ToDir\n"; return 0; }
if (!-e $FromDir) { print "\rFrom directory does not exist: $FromDir\n"; return 0;}

# Glob seems to be bothered by spaces in dir path names
my $Sep = "\\".File::Spec->canonpath( "/" );
{
my $td = File::Spec->canonpath( "$ToDir/*" );
my $fd = File::Spec->canonpath( "$FromDir/*" );
if ($td =~ / /) {
$td = "'$td'";
}
if ($fd =~ / /) {
$fd = "'$fd'";
}
foreach (glob ("$td")) {
/.+$Sep(.+)$/;
# /.+\/(.+)$/;
push (@To, [$1, 0]);
}
foreach (glob ("$fd")) {
/.+$Sep(.+)$/;
# /.+\/(.+)$/;
my $ftmp = $1;
for (@{$Exclude}) {
if ($ftmp =~ /$_/) {
# save in @Xfound if $RmvFrom is set (see below)
print "\rExclude - $ftmp\n";
push (@Xfound, $ftmp) if ($RmvFrom);
$ftmp = undef;
last;
}
}
push (@From, [$ftmp, 0, '']) if defined ($ftmp);
}
}

# Will return if "From" empty. Technically not an error.
# If it is empty through exclusions (regexp filter), "From" could be rmdir'd below.
if (@From == 0) { print "\rEmpty dir: $FromDir\n"; return 0; }
print "OK\n";


###**************************************************************************
## Iterate until there are no more dups across directories after (D2) rename
#
do {
@ndx = GetCommonElementsNxM(\@To,\@From, 1, 0);
$dupsize = @ndx;
$passes++;

print "\nP A S S \# $passes\n---------------------\n";
print "Found $dupsize dups:\n";

#**********************************
# Check for cross-directory dups
# -------------
if ($dupsize)
{
my $errflg = 0;
print "\n<DIR> $FromDir\n\n";
for (my $i = 0; $i < $dupsize; $i++)
{
my $Fromref = $From[$ndx[$i]];
my $pname = $PrefixName;

print "\t$Fromref->[0] ... $PrefixName";

#*****************************************
# Rename and check for in-directory dups
# ----------
while (1)
{
# Level check
if ($Fromref->[1] >= $PrefixLevel)
{
print "$Fromref->[0] <<level $Fromref->[1] exceeded>>\n";
# Clear the name in From so this file doesn't get moved
$Fromref->[0] = "";
$errflg = 1;
last;
}
my $stmp = uc($pname.$Fromref->[0]);
my $found = 0;
for (@From) { if (uc($_->[0]) eq $stmp) { $found = 1; last; } }

if (!$found) {
# OK to rename! Dup not found in "From"
# Change the name in "From" to this new one
# so it gets moved later. Increment its level of prefix
# ---------------------
print "$Fromref->[0]";
print "\n";
if ($Fromref->[1] == 0) { $Fromref->[2] = $Fromref->[0]; }
$Fromref->[0] = $pname.$Fromref->[0];
$Fromref->[1] += 1;
last;
}
else {
# File exist with that name, add another level of prefix.
# ---------------------
print "$PrefixName";
$pname = $pname.$PrefixName;
if ($Fromref->[1] == 0) { $Fromref->[2] = $Fromref->[0]; }
$Fromref->[1] += 1;
}
}
}
if ($errflg) {
print "!!! Can't rename some files, raise the level or change the prefix.\n";
#return 0;
}
}
} while ($dupsize > 0);
#
## End cross-dir iteration
###

#*************************************************************
# Check for rename of duplicates in D2 (only) without moving
# ------------
if ($Duponly)
{
my $rentried = 0;
my $renok = 0;
print "\nRename only ... ";
for (@From) {
my $res = 0;
if ($_->[1] > 0 && $_->[1] < $PrefixLevel) {
$rentried++;
my $fr = File::Spec->canonpath( "$FromDir/$_->[2]" );
my $fr2 = File::Spec->canonpath( "$FromDir/$_->[0]" );
if (!($res = rename ("$fr", "$fr2"))) {
print "Rename error:\t$_->[2] ... $_->[0]\n";
} else {
$renok++; }
}
}
if ($rentried) { print "$renok out of $rentried files OK\n"; }
else { print "No duplicates found\n"; }
return 1;
}

#**********************
# Move "From" into "To"
# -------------------
#print @From."@From\n";
my $sep = File::Spec->canonpath( "/" );
print "\nMoving:\t$FromDir$sep* to $ToDir$sep\n";
my $movcnt = 0;

for (@From) {
my $res = 0;
if ($_->[1] > 0 && $_->[1] < $PrefixLevel) {
my $fr = File::Spec->canonpath( "$FromDir/$_->[2]" );
my $to = File::Spec->canonpath( "$ToDir/$_->[0]" );
$res = File::Copy::move ("$fr", "$to");
if ($res == 0) { print "Move error:\t$_->[2] ... $to\n"; }
}
if ($_->[1] == 0 && $_->[0] gt '') {
my $fr = File::Spec->canonpath( "$FromDir/$_->[0]" );
my $to = File::Spec->canonpath( "$ToDir/$_->[0]" );
$res = File::Copy::move ("$fr", "$to");
if ($res == 0) { print "Move error:\t$fr\n" }
}
$movcnt++ if ($res);
}
my $notmoved = @From - $movcnt;
print "Moved:\t$movcnt out of ".@From." files.\n";

#***************************************
# Check if "FromDir" is to be deleted
# --------------
if ($RmvFrom) {
print "Remove:\t$FromDir ... ";
if ($notmoved > 0) {
print "not deleted, contains $notmoved file(s) that couldn't be moved!\n";
}
else {
# check if @Xfound has values (and their not directories)
# if so, delete these first before trying to remove the From directory
# for now, don't want to "unlink" a directory, and we're not doing a tree here
# ----------------
for (@Xfound) {
my $fr = File::Spec->canonpath( "$FromDir/$_" );
unlink ("$fr") if (!-d $fr) ; }
my $fr = File::Spec->canonpath( "$FromDir" );
if (!rmdir "$fr") {print "$!\n" } else {print "OK\n"; }
}
}
return 1;
}


#######################################################
# Get Common Elements (from two N-dimensioned Array's)
# IN - Refs to the NxN arrays to compare,
# sort flag and the compare field.
# OUT - Ndx's into Right_Array of matching elements
# ---------------------------------------------------
# Notes -
# 1. Elements are assumed textual and case insensitive
# 2. Ignores in-array duplicates
# 3. Sort will be done if sort flag > 0
#
sub GetCommonElementsNxM($$$$)
{
my ($A_Left,$A_Right,$Srtflg,$Fld) = @_;
$Srtflg = 0 unless defined $Srtflg;
$Fld = 0 unless defined $Fld;
# my @Dup = ();
my @Ndx = ();

if ($Srtflg > 0) {
@{$A_Left} = sort {uc($a->[$Fld]) cmp uc($b->[$Fld])} @{$A_Left};
@{$A_Right} = sort {uc($a->[$Fld]) cmp uc($b->[$Fld])} @{$A_Right};
} else {print "==> Common Elements : Not sorting arrays\n";}

my $rpos = 0;
my $rend = @{$A_Right};
my $cnt = 0;
my $llast = undef;
my $rlast = undef;
foreach my $left_element (@{$A_Left})
{
next if (uc($left_element->[$Fld]) eq uc($llast->[$Fld]));

$rpos += $cnt;
$cnt = 0;
foreach my $right_element (@{$A_Right}[$rpos..($rend-1)])
{
last if (uc($left_element->[$Fld]) lt uc($right_element->[$Fld]));
$cnt++;
next if (uc($right_element->[$Fld]) eq uc($rlast->[$Fld]));
if (uc($left_element->[$Fld]) eq uc($right_element->[$Fld]))
{
# push (@Dup, $right_element->[$Fld]); # the string
push (@Ndx, $rpos+$cnt-1); # the index into R_Array
last;
}
$rlast = $right_element;
}
$llast = $left_element;
last if ($rpos >= $rend);
}
# return (@Dup);
return (@Ndx);
}


#######################################################
# Get Common Elements from single Array's
# IN - Refs to the Nx1 arrays to compare, sort flag
# OUT - Ndx's into Right_Array of matching elements
# ---------------------------------------------------
# Notes -
# 1. Elements are assumed textual and case insensitive
# 2. Ignores in-array duplicates
# 3. Sort will be done if sort flag > 0
#######################################################
sub GetCommonElements($$$)
{
my ($A_Left,$A_Right,$Srtflg) = @_;
$Srtflg = 0 unless defined $Srtflg;
# my @Dup = ();
my @Ndx = ();

if ($Srtflg > 0) {
@{$A_Left} = sort {uc($a) cmp uc($b)} @{$A_Left};
@{$A_Right} = sort {uc($a) cmp uc($b)} @{$A_Right};
} else {print "==> Common Elements : Not sorting arrays\n";}

my $rpos = 0;
my $rend = @{$A_Right};
my $cnt = 0;
my $llast = '';
my $rlast = '';
foreach my $left_element (@{$A_Left})
{
next if (uc($left_element) eq uc($llast));

$rpos += $cnt;
$cnt = 0;
foreach my $right_element (@{$A_Right}[$rpos..($rend-1)])
{
last if (uc($left_element) lt uc($right_element));
$cnt++;
next if (uc($right_element) eq uc($rlast));
if (uc($left_element) eq uc($right_element))
{
# push (@Dup, $right_element); # the string
push (@Ndx, $rpos+$cnt-1); # the index into R_Array
last;
}
$rlast = $right_element;
}
$llast = $left_element;
last if ($rpos >= $rend);
}
# return (@Dup);
return (@Ndx);
}


1;
 
T

Tim Greer

(e-mail address removed) wrote:

On Mon, 12 Jan 2009 17:55:36 -0600, Tad J McClellan


I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.


while (<WORDLIST2>) {
$list2{$_}=1; #or any other value
}


my @b_txt = <WORDLIST2>;
Isin't slurp a bit rich?
[snip]

sln

Probably, but that's what the OP asked for. I can't imagine why,
but perhaps they just need to elaborate on their reasons to get a
good
answer. Personally, I didn't answer them, because I had to ask what
they were wanting to do, since it didn't seem clear to me (or didn't
seem to have a purpose and would just waste good processing).

There's a thin line between yes and no, but if its not yes....

sln

Its (its) or it's (it is) not yes, and what isn't? I'd say it is yes
(being it's "probably" going to be more resource intensive to grab all
of the contents of WORLDLIST2 into an array), only to compare then,
comparing it to a while loop to see if a hash value exists before
adding it (if the OP wants to add it). I absolutely don't understand
your cryptic response. I'm certain you were being sarcastic, but I
don't know if you mean to have a question in there.

This no joke! I don't write anything unles I need to!
I said before this is beginner code. No sarcasm inteneded.
I do not want to go through this. The prototypes are wrong along
with alot of other stuff. Amazingly, this heards all the dup porn out
there into safely stored data. Go figure... it works like a champ!!

sln

mrg_test1.pl
<snip irrelevant code>

What are you yammering on about? Why did you paste a long portion of
irrelevant code to this OP's question? Regardless, that code could be
summarized and be much shorter, but since it's not relevant to the OPs
problem, there's no reason to discuss it. You were doing okay for a
couple for days, and now you're posting crazy nonsense again. And...
talk about bloated code, geez!
 
F

Francois Massion

On Mon, 12 Jan 2009 17:55:36 -0600, Tad J McClellan
I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.
while (<WORDLIST2>) {
    $list2{$_}=1; #or any other value
}
   my @b_txt = <WORDLIST2>;
Isin't slurp a bit rich?
[snip]

Probably, but that's what the OP asked for.  I can't imagine why, but
perhaps they just need to elaborate on their reasons to get a good
answer.  Personally, I didn't answer them, because I had to ask what
they were wanting to do, since it didn't seem clear to me (or didn't
seem to have a purpose and would just waste good processing).
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!- ZitiertenText ausblenden -

- Zitierten Text anzeigen -

Thanks to all for your contributions. I'll have to try them out. The
background for my asking (as a non-pro) is the following. I am doing
terminology extraction for linguistic purposes. Thus I take a text,
split it up in words or expressions and perform various "refining"
operations in order to get only the clean interesting terms as they
appear in a dictionary. Each operation is currently a small amateurish
little script and the output is an array which I can display in a file
or on screen.

Now I want to automate all these single steps into one operation which
means that instead of reading word lists from text files I would like
to use the arrays generated by the previous step. This is the reason
for the question above. I have 2 different files as the result of 2
previous steps and the difference are the words which are interesting
for my terminology work. I hope this helps.
 
F

Francois Massion

On Mon, 12 Jan 2009 17:55:36 -0600, Tad J McClellan
I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.
while (<WORDLIST2>) {
    $list2{$_}=1; #or any other value
}
   my @b_txt = <WORDLIST2>;
Isin't slurp a bit rich?
[snip]
sln
Probably, but that's what the OP asked for.  I can't imagine why, but
perhaps they just need to elaborate on their reasons to get a good
answer.  Personally, I didn't answer them, because I had to ask what
they were wanting to do, since it didn't seem clear to me (or didn't
seem to have a purpose and would just waste good processing).
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!- Zitierten Text ausblenden -
- Zitierten Text anzeigen -

Thanks to all for your contributions. I'll have to try them out. The
background for my asking (as a non-pro) is the following. I am doing
terminology extraction for linguistic purposes. Thus I take a text,
split it up in words or expressions and perform various "refining"
operations in order to get only the clean interesting terms as they
appear in a dictionary. Each operation is currently a small amateurish
little script and the output is anarraywhich I can display in a file
or on screen.

Now I want to automate all these single steps into one operation which
means that instead of reading word lists from text files I would like
to use the arrays generated by the previous step. This is the reason
for the question above. I have 2 different files as the result of 2
previous steps and the difference are the words which are interesting
for my terminology work. I hope this helps.- Zitierten Text ausblenden -

- Zitierten Text anzeigen -

Thanks to all. Basically the solution to my problem seems to be:

my %in_array2 = map { $_ => 1 } @array2;
my @array3 = grep { !$in_array2{$_} } @array1;

It works as expected. I just need to solve 2 problems not directly
related to my question:
i) When I output @array3 I get a space before each entry
ii) no matter how I encode the files (ascii, unicode, "use utf8" etc)
special characters (like the German Umlaut) are corrupt and the output
file is in UTF8).

I'll find a solution but maybe someone knows already the answer...
 
T

Tad J McClellan

i) When I output @array3 I get a space before each entry


If you are having trouble with output, it is easier to help if
we can actually see the statement that generates the output...

Are you maybe getting a space before each entry *except the first one*?

If so, then remove the double quotes.


perldoc -q space

Why do I get weird spaces when I print an array of lines?
 
S

sln

(e-mail address removed) wrote:
@seesig.invalid> wrote:
Francois Massion <[email protected]> wrote:
I would like to use 2 arrays, says @array1 and
@array2 instead of files a.txt and b.txt.
while (<WORDLIST2>) {
    $list2{$_}=1; #or any other value
}
   my @b_txt = <WORDLIST2>;
Isin't slurp a bit rich?
[snip]

Probably, but that's what the OP asked for.  I can't imagine why, but
perhaps they just need to elaborate on their reasons to get a good
answer.  Personally, I didn't answer them, because I had to ask what
they were wanting to do, since it didn't seem clear to me (or didn't
seem to have a purpose and would just waste good processing).
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!- Zitierten Text ausblenden -
- Zitierten Text anzeigen -

Thanks to all for your contributions. I'll have to try them out. The
background for my asking (as a non-pro) is the following. I am doing
terminology extraction for linguistic purposes. Thus I take a text,
split it up in words or expressions and perform various "refining"
operations in order to get only the clean interesting terms as they
appear in a dictionary. Each operation is currently a small amateurish
little script and the output is anarraywhich I can display in a file
or on screen.

Now I want to automate all these single steps into one operation which
means that instead of reading word lists from text files I would like
to use the arrays generated by the previous step. This is the reason
for the question above. I have 2 different files as the result of 2
previous steps and the difference are the words which are interesting
for my terminology work. I hope this helps.- Zitierten Text ausblenden -

- Zitierten Text anzeigen -

Thanks to all. Basically the solution to my problem seems to be:

my %in_array2 = map { $_ => 1 } @array2;
my @array3 = grep { !$in_array2{$_} } @array1;
[snip]

There are many solutions. The one you picked will take the longest.
I just don't like this. I don't line grep to begin with, there is not a functional
way to terminate it, let alone the hash lookup and map overhead.

If speed isin't a concern, then any solution will do.

sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,701
Latest member
XavierQ83

Latest Threads

Top