Matching words and letters

S

Sandman

I just can't seem to wrap my mind around how to attack this problem in the best
way, so I thought I'd just post here and see if anyone here has done something
similar and/or has a logical solution to it.

I have a series of letters, say "kljetilamnop", then I have a big textfile with
legit words, but let's use a short one here:

mall
bill
kill
lot
pile
antilope
hello

Now, I want to cycle through this word list and match it to the letters, and
return true if the word can be formed by the letters. So, the above would
return:

mall
kill
lot
antilope

The result is to be used in a Scrabble-like fashion, which leads me to the
other part of my problem. When matching these words, I need to be able to
specify that some of the characters are "locked" in their position. So, lets
say that the letter "k" in the letter string is a fixed letter, then the only
result would be "kill" of course. I imagine that this would be easiest if I fed
the function the string and also the locked position in an array (1,6,8 for
example).

Anyway, this could easily become really complex, and if I just get the basic
scrambled-letters to match to words, I think I can manage my way from there.
So, any help appreciated.
 
A

A. Sinan Unur

I have a series of letters, say "kljetilamnop", then I have a big
textfile with legit words, but let's use a short one here:

mall
bill
kill
lot
pile
antilope
hello

Now, I want to cycle through this word list and match it to the
letters, and return true if the word can be formed by the letters. So,
the above would return:

mall
kill
lot
antilope

pile should be in that list as well.

Now, undoubtedly, there is a better way to do this, but here is something
that works for the data you posted:

#! /usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

my $source = normalize('kljetilamnop');

LINE: while(my $word = <DATA>) {
chomp $word;
my $target = normalize($word);
for my $c (keys %{ $target }) {
next LINE unless(
defined($source->{$c}) && ($source->{$c} >= $target->{$c})
);
}
print "$word\n";
}

sub normalize {
my ($string) = @_;
my @letters = split //, $string;
my %count;
$count{$_}++ for (@letters);
return \%count;
}



__DATA__
mall
bill
kill
lot
pile
antilope
hello

D:\Home\Dload>perl rs.pl
mall
kill
lot
pile
antilope
which leads me to the other part of my problem.

I am going to ignore that part.
So, any help appreciated.

Hope this helps.

Sinan.
 
A

Anno Siegel

Sandman said:
I just can't seem to wrap my mind around how to attack this problem in the best
way, so I thought I'd just post here and see if anyone here has done something
similar and/or has a logical solution to it.

I have a series of letters, say "kljetilamnop", then I have a big textfile with
legit words, but let's use a short one here:

mall
bill
kill
lot
pile
antilope
hello

Now, I want to cycle through this word list and match it to the letters, and
return true if the word can be formed by the letters. So, the above would
return:

mall
kill
lot
antilope

[snip other requirements]

One way of looking at this is to consider a string (word, or list of
available letters) as a multiset of its letters. A multiset is like
a set, but each element can appear with a multiplicity. The bag of
available letters, for instance, would contain each letter of
"kjetiamnop" once and the letter "l" twice.

A word can be built from the given letters when its bag is contained
(in the obvious sense of multisets) in the bag of given letters.

Fortunately, there is a CPAN module that implements bags. Unfortunately,
the implementation doesn't have a "contained" method, so we'll have to
extend the class. Fortunately, the class is well-written, so inheritance
makes this easy:


#!/usr/local/bin/perl
use strict; use warnings; $| = 1;

my $letters = CBag->new->insert( map { $_ => 1 } split //, 'kljetilamnop');

while ( <DATA> ) {
chomp;
my $word = CBag->new->insert( map { $_ => 1 } split //);
print "$_\n" if $word <= $letters;
}

# Extend Set::Bag to support comparison
{{
package CBag;
use base 'Set::Bag';

sub contained {
my ( $bag, $other) = @_;
for ( $bag->elements ) {
return 0 if $bag->grab( $_) > ( $other->grab( $_) || 0);
}
return 1;
}

use overload(
'<=' => 'contained',
);

}}

__DATA__
mall
bill
kill
lot
pile
antilope
hello
 
J

Josef Moellers

Sandman wrote:

[ elaborate problem description deleted ]

A very simple/simplistic solution:

my @words = qw(mall bill kill lot pile antilope hello);
my $fixed = 'k';
my $set = 'ljetilamnop' . $fixed;

foreach (@words) {
print "$_\n" if m/[$fixed]/ && m/^[$set]*$/;
}
 
A

A. Sinan Unur

Sandman wrote:

[ elaborate problem description deleted ]

A very simple/simplistic solution:

my @words = qw(mall bill kill lot pile antilope hello);
my $fixed = 'k';
my $set = 'ljetilamnop' . $fixed;

foreach (@words) {
print "$_\n" if m/[$fixed]/ && m/^[$set]*$/;
}

How is this a solution?

D:\Home>perl tttt.pl
kill

Sinan
 
S

Sandman

A. Sinan Unur said:
pile should be in that list as well.

Yes, sorry. :)
Now, undoubtedly, there is a better way to do this, but here is something
that works for the data you posted:

Yes, and I've adapted it for my uses - it works great!

##########################

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
exit unless $ARGV[0];
$|=1; # Dont buffer output

my %words;

open(FILE, "</home/sandman/conf/rim.db"); # file with legal words
while (<FILE>)
{
chomp;
my @list=split (/\t/);
foreach (@list){
# no need to include too long words.
next if length($_) > (length($ARGV[0])+1);
# And we don't care for too short words
next if length($_) < 3;
$words{lc($_)}++;
}
}

my $source = normalize($ARGV[0]);
my $prnr;
my $word;

LINE: foreach $word (keys %words){
chomp $word;
my $target = normalize($word);
for my $c (keys %{ $target }) {
next LINE unless(
defined($source->{$c}) && ($source->{$c} >= $target->{$c})
);
}
my @input = split //, $ARGV[0];
my @match = split //, $word;
if ($ARGV[1]){
foreach (split /,/, $ARGV[1]){
# for every fixed position
next LINE unless $input[($_-1)] eq $match[$_-1];
}
}
printf "%-12s ", $word;
$prnr++;
print "\n" if ($prnr % 6) == 0;
}
print "\n";

sub normalize {
my ($string) = @_;
my @letters = split //, $string;
my %count;
$count{$_}++ for (@letters);
return \%count;
}

##########################

sandman~> alfapet kamrslsi 1,2
kali kams kari karl karm kass
kal kam kar kas

##########################

These are swedish words, so it's quite correct.

Thank you!
 
A

Anno Siegel

Sandman said:
I just can't seem to wrap my mind around how to attack this problem in the best
way, so I thought I'd just post here and see if anyone here has done something
similar and/or has a logical solution to it.

I have a series of letters, say "kljetilamnop", then I have a big textfile with
legit words, but let's use a short one here:

mall
bill
kill
lot
pile
antilope
hello

Now, I want to cycle through this word list and match it to the letters, and
return true if the word can be formed by the letters. So, the above would
return:

mall
kill
lot
antilope

Yet another way:

while ( <DATA> ) {
chomp;
( my $x = $_) =~ tr/kljetilamnop//d;
print "$_\n" unless length $x;
}

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top