minimal indeces for object access

C

Csaba Gabor

This is sort of cute. If you have an object with all sorts of (text)
properties, sometimes you might not want to access a given property by
specifying the entire property name - maybe you'd only like to specify
part of it.

Some scenarios:
1. document.getElementById ... ugh! too much typing.
2. send keys has all sorts of non-single-letter possibilities, such as
escape (or is that esc?), home, up, page up, return (or is that ret? or
enter?). ... Who can keep track of all the possibilities, not to
mention all the abbreviations?
3. eBay has all sorts of fields you might want to query (if you
happened to have an auction object, with appropriate field properties).
... This is about like scenario 2, only the possibilities are much
less standard, and they are likely to be user entered (ex.
AuctionPrice, TimeLeft).

So the general problem is how to specify a minimal abbreviation and get
access to the property you're after. For example, suppose that you
have an object (like a dispatch table), oBject, with properties:
pageDown, Down, Insert, Return

Let's say that idx is an abbreviation of a string s if the letters of
idx appear, in order, in s (and we'll then write idx <= s). So in the
example, 'i' would be an abbreviation for Insert. Since Insert is the
only word that i abbreviates, it's pretty clear that a function
getIdx(oBject, "i") should return "Insert".

But what about 'w', which abbreviates both pageDown and Down? For
that, we'll say that if idx<=s and s<=t, then even though idx<=t, we
don't accept it as a selecting abbreviation for t. To say this in
words, you need to include non-s letters from the longer word, t. In
our example, this means that w is a selecting abbreviation for Down but
not pageDown. For pageDown, you must start with one of the letters
from 'page', for example 'p' or 'ed', but not 'en' (since the latter
also abbreviates Return per the next paragraph).

There is one other situation, illustrated by "et", which abbreviates
both Insert and Return. However, neither of them is a substring of the
other, so "et" is not a selecting abbreviation. In short a string,
idx, is a selecting abbreviation of s if idx<=s and s<=t for every
other string t where idx<=t.

This lets us code a simple function to access an object's properties by
means of a minimal set of letters, should one desire. In the example
below, "gnbid" will select for document.getElementById in Firefox (this
scenario 1 example does not work for IE, since IE does not expose
functions on the document object. However, this approach is valid for
custom objects/functions in IE). If you supply a name that is not
unique, however, you'll get an alert telling you what the other
possibilities are so you can change your proposed abbreviation.

Some comments about efficiency follow the code.
Csaba Gabor from Vienna


<div id=testId>Hi mom!</div>
<script type='text/javascript'>
function getIdx(oBject, partialIdx) {
// this determines all indeces, idx2, of oBject such that:
// partialIdx <= idx2 AND there's NO idx3 shorter than idx2 where
// partialIdx <= idx3 <= idx2 with '<=' meaning: is a substring of
// if this set has size 1, the singleton element is returned.
// else the comma separated set is returned (with an alert),
var idx2, idx2L, oActive={}, ctr=0;
var idx = partialIdx.toLowerCase();
for (idx2 in oBject)
if ((idx2L=idx2.toLowerCase()).substringP(idx)) {
for (var idx3 in oActive) { // check new candidate idx2
if (idx3.length<idx2.length &&
idx2L.substringP(idx3.toLowerCase())) break;
else if (idx2.length<idx3.length &&
idx3.toLowerCase().substringP(idx2L)) {
delete (oActive[idx3]); --ctr; } }
ctr++;
oActive[idx2] = 1; }
if (!ctr) { alert ("No match found for '" + idx +
"' in getIdx"); return null; }
var sOut = "";
for (idx2 in oActive) sOut += (sOut ? ", " : "") + idx2;
if (ctr>1) { alert ("Match against '" + idx +
"' was not unique, producing\n" + sOut); return null; }
return sOut;
}

String.prototype.substringP = function(sub) { // isSubstring
// returns true iff the chars of sub are found, in order, in string
for (var i=0, p=-1;i<sub.length;++i)
if ((p=this.indexOf(sub.charAt(i),p+1))<0) return false;
return true;
}

function ex(obj, partialIdx) { // along with any args
// executes obj[getIdx(obj, partialIdx)](arg2, arg3, ...)
var idx = getIdx(obj, partialIdx);
if (idx && obj[idx] && // need a better function test
typeof(obj[idx])=="function") {
var aTmp = [];
for (var i=2;i<arguments.length;aTmp.push(arguments[i++]));
return obj[idx].apply(obj, aTmp); }
}

var res=ex (document, 'gnbid', 'testId');
if (res) alert (res.innerHTML); else alert('Not found');
</script>


If you experiment with the above example, you'll find that document has
a boatload of properties. You could specify a more minimal
abbreviation if you filter the allowed properties based on type (such
as function, in our example) just within the first loop of getIdx(...).

For most user input situations and moderately sized objects, the
looping in getIdx should not be too significant. That means that
although worst case behaviour could be quadratic in the number of
properties of the object, typical behaviour should be linear in the
number of properties (ie. each property must be examined once, and
will be a few collisions). In cases where the looping is significant,
you should first be questioning your entire reason for the
abbreviations, and only afterwards consider the below efficiency
improvement.

It is possible to improve significantly on the speed of the lookup, by
pre caching all the possible substring paths. At first blush, this
sounds horrendous - there are zillions of possible substrings for each
property name - the problem seems to be getting worse, not better. For
example, with LongSubstringA vs. LongSubstringB, all possible
substrings would have to be cached because only the final letter is
different between the two. And what to do about ALongSubstring vs.
BLongSubstring?

There is a better way. Instead of cacheing all the possible
substrings, we'll only cache all the possible paths to all the possible
substrings. It's better. Really. Because the number of edges leaving
from any node is no more than the number of letters in a given word.
Here's how it works. Start with oCache = {} and oCache.oWords =
{restOfWord:[originalWord]}. For each letter, x, of each property name
remainder (restOfWord), pN, in oCache.oWords, make oCache[x] =
cacheForTheRestOfpNAfterTheFirstLetterx. Same letter from multiple
property name remainders? No worries, just union the caches. After
all this is done, go through the caches and where .oWords has a single
entry, snip the cache there and have it point to the single element of
the array (or for multiple entries, the array itself, indicating that
the abbreviation does not uniquely resolve).

The lookup is then done as follows: start with oCache and for each
letter, x, of the proposed abbreviation, examine (and replace) oCache
with oCache[x]. If oCache[x] resolves to a string, that is the index
being abbreviated. If it resolves to an array, the proposed
abbreviation does not resolve uniquely (with candidates being elements
of the array). Otherwise, keep going with the smaller cache. Umm, "No
worries, just union" may be just a teensy bit more difficult than the
phrase implies, but the nifty reward is that the lookup is now linear
in the length of the abbreviation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top