Howto get array.agrep (NOT array.grep)

P

Phil Rhoades

People,

Is there some way to get agrep working with Ruby arrays? - agrep has
some nice, useful features that grep doesn't . .

Thanks,

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: (e-mail address removed)
 
P

Phrogz

Phil said:
Is there some way to get agrep working with Ruby arrays? - agrep has
some nice, useful features that grep doesn't . .

Perhaps if you explained what this mysterious 'agrep' was, we might
help.
Something from another language? A unix utility?

Give us a sample array, and what you'd like the result to be after
calling this method on that array.
 
P

Phil Rhoades

Perhaps if you explained what this mysterious 'agrep' was, we might
help.
Something from another language? A unix utility?

Give us a sample array, and what you'd like the result to be after
calling this method on that array.


NAME
agrep - print lines approximately matching a pattern

SYNOPSIS
agrep [OPTION]... PATTERN [FILE]...

DESCRIPTION
Searches for approximate matches of PATTERN in each FILE or
standard input. Exam-
ple: 'agrep -2 optimize foo.txt' outputs all lines in file
'foo.txt' that match
"optimize" within two errors. E.g. lines which contain
"optimise", "optmise", and
"opitmize" all match.


--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: (e-mail address removed)
 
S

Simon Krahnke

* Phil Rhoades said:
NAME
agrep - print lines approximately matching a pattern

Enurable#grep can do that, if you pass it the right block. When you pass
a block to grep it's the block's job to match the elements.

Now the interesting question is: How would that block look like?

mfg, simon .... l
 
R

Ryan Davis

Enurable#grep can do that, if you pass it the right block. When you
pass
a block to grep it's the block's job to match the elements.
no.

enum.grep(pattern) => array
enum.grep(pattern) {| obj | block } => array
------------------------------------------------------------------------
Returns an array of every element in _enum_ for which +Pattern
===
element+. If the optional _block_ is supplied, each matching
element is passed to it, and the block's result is stored in the
output array.

The block just morphs the result, it doesn't morph the match.
 
J

Jens Wille

hi phil!

if all you want is getting all the strings within a certain edit
distance of your pattern, have a look at [1]. it doesn't support
regular expressions in the pattern because i don't how to achieve
that easily without re-implementing agrep's algorithm ;-) it's
really just a quick hack that might get you started, hopefully.

[1]
<http://prometheus.rubyforge.org/ruby-nuggets/classes/Enumerable.html#M000091>

cheers
jens

--
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv für Forschung & Lehre
Kunsthistorisches Institut der Universität zu Köln
Albertus-Magnus-Platz, D-50923 Köln
Tel.: +49 (0)221 470-6668, E-Mail: (e-mail address removed)
http://www.prometheus-bildarchiv.de/
 
P

Phil Rhoades

jens,


hi phil!

if all you want is getting all the strings within a certain edit
distance of your pattern, have a look at [1]. it doesn't support
regular expressions in the pattern because i don't how to achieve
that easily without re-implementing agrep's algorithm ;-) it's
really just a quick hack that might get you started, hopefully.

[1]
<http://prometheus.rubyforge.org/ruby-nuggets/classes/Enumerable.html#M000091>


This might work but it would be more difficult without regexs - the
current application does a system call to agrep but of course it is very
slow for large numbers of calls. A typical call is something like:

agrep -2 "Smith\|J.*12345" list1.txt list2.txt list3.txt

This allows two differences on a minimum amount of information
consisting of last name, first initial and zip code. If I use the
Enumerable version, I would have to use the whole, delimited, name &
address string and increase the differences/distance number?

Did you just do that hack now? - how do I get/install it? (Fedora 8).

Thanks,

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: (e-mail address removed)
 
J

Jens Wille

Phil Rhoades [2008-04-26 19:13]:
This might work but it would be more difficult without regexs -
the current application does a system call to agrep but of course
it is very slow for large numbers of calls. A typical call is
something like:

agrep -2 "Smith\|J.*12345" list1.txt list2.txt list3.txt

This allows two differences on a minimum amount of information
consisting of last name, first initial and zip code. If I use
the Enumerable version, I would have to use the whole, delimited,
name & address string and increase the differences/distance
number?
i think something like that could work in your case (requires the
Text gem):

File.open('list1.txt').select { |line|
# extract name and zip code from line
line =~ /\A(.*?\|.).*\b(\d{5})\b/ # adjust appropriately!

# name may have two errors, zip only one -- or whatever...
Text::Levenshtein.distance($1, 'Smith|J') <= 2 &&
Text::Levenshtein.distance($2, '12345') <= 1
}
Did you just do that hack now?
that's right. but i just read a bit on agrep's algorithm and it
might be fun to implement it in ruby (though a bit slow, probably).
as an alternative, it might be even worth writing ruby bindings to
agrep. who knows, if time permits... ;-)
- how do I get/install it? (Fedora 8).
well, i don't think that particular implementation suits your needs
and is obviously easily adapted (after all, it's just a select with
an appropriate block utilizing Text::Levenshtein.distance). but you
can get ruby-nuggets from rubyforge (gem install ruby-nuggets), or,
if the new version hasn't found its way onto the mirrors yet, from
our own gem server at http://prometheus.khi.uni-koeln.de/rubygems/.

cheers
jens
 
P

Phil Rhoades

jens,


Phil Rhoades [2008-04-26 19:13]:
This might work but it would be more difficult without regexs -
the current application does a system call to agrep but of course
it is very slow for large numbers of calls. A typical call is
something like:

agrep -2 "Smith\|J.*12345" list1.txt list2.txt list3.txt

This allows two differences on a minimum amount of information
consisting of last name, first initial and zip code. If I use
the Enumerable version, I would have to use the whole, delimited,
name & address string and increase the differences/distance
number?

i think something like that could work in your case (requires the
Text gem):

File.open('list1.txt').select { |line|
# extract name and zip code from line
line =~ /\A(.*?\|.).*\b(\d{5})\b/ # adjust appropriately!

# name may have two errors, zip only one -- or whatever...
Text::Levenshtein.distance($1, 'Smith|J') <= 2 &&
Text::Levenshtein.distance($2, '12345') <= 1
}


I see what you are doing but this would have to be repeated for the
three different lists (list1.txt, list2.txt, list3.txt) - I guess that
should still be faster than a single system call . .

that's right. but i just read a bit on agrep's algorithm and it
might be fun to implement it in ruby (though a bit slow, probably).


I don't know if it helps but there is this:

http://www.koders.com/ruby/fidCEAEDCAA28D4A59A76ADF20A0DA2A3858438834D.aspx

as an alternative, it might be even worth writing ruby bindings to
agrep. who knows, if time permits... ;-)


I was wondering about something like that but I have never created a
Ruby binding before . .

well, i don't think that particular implementation suits your needs
and is obviously easily adapted (after all, it's just a select with
an appropriate block utilizing Text::Levenshtein.distance). but you
can get ruby-nuggets from rubyforge (gem install ruby-nuggets), or,
if the new version hasn't found its way onto the mirrors yet, from
our own gem server at http://prometheus.khi.uni-koeln.de/rubygems/.


Thanks!

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: (e-mail address removed)
 
J

Jens Wille

Phil Rhoades [2008-04-26 22:26]:
I see what you are doing but this would have to be repeated for
the three different lists (list1.txt, list2.txt, list3.txt)
well, yeah. but that's not really a problem, is it?

%w[list1.txt list2.txt list3.txt].inject([]) { |matches, file|
matches + File.open(file).select { |line|
# ...same as before...
}
}
=> http://amatch.rubyforge.org

silly me!! totally forgot about that one ;-) thanks for the reminder!

maybe i'll be able to come up with something that wraps flori's
Amatch into (Enumerable|File)#agrep.
I was wondering about something like that but I have never
created a Ruby binding before . .
neither have i. but that shouldn't stop us, right? ;-)

cheers
jens
 
J

Jens Wille

Jens Wille [2008-04-26 22:45]:
maybe i'll be able to come up with something that wraps flori's
Amatch into (Enumerable|File)#agrep.
that was actually pretty easy and is definitely an improvement (see
ruby-nuggets v0.1.9), but it still won't give us support for regular
expression patterns :-(

i also added IO::agrep, so you would now be able to do:

%w[list1.txt list2.txt list3.txt].inject([]) { |matches, file|
matches + File.agrep(file, /Smith\|J.*12345/, 2)
}

-- if only you had regular expressions at your disposal!

cheers
jens
 
P

Phil Rhoades

jens,


Jens Wille [2008-04-26 22:45]:
maybe i'll be able to come up with something that wraps flori's
Amatch into (Enumerable|File)#agrep.
that was actually pretty easy and is definitely an improvement (see
ruby-nuggets v0.1.9), but it still won't give us support for regular
expression patterns :-(

i also added IO::agrep, so you would now be able to do:

%w[list1.txt list2.txt list3.txt].inject([]) { |matches, file|
matches + File.agrep(file, /Smith\|J.*12345/, 2)
}

-- if only you had regular expressions at your disposal!


Yes, that would be nice! . . I guess it will be there sometime.

Thanks for looking at this!

Regards,

Phil.
--
Philip Rhoades

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top