Question About Method

Michael Lawson · Oct 22, 2003

I'm not quite sure if this the right forum for this question, so if
I'm wrong in my assumption, if you could point me in the right
direction, I'd appreciate it.

I'm in the process of learning C++, coming from a brief spell with C.
I'm completely self-taught, and the best way for me to learn a
language is by coming up with an idea for a project, and working on it
until I understand the methods used to finish it. Not the most ideal
way to learn, I'm sure, but it's the only option I have at the moment.

Currently, I'm working on a program that unscrambles words using a
600k dictionary file. I've made one in C before, but unfortunately, it
was horribly slow.

To speed up the routine I've decided to make an integer array
consisting of the line number where each letter starts in the
alphabetized dictionary file. So in case the scrambled letters were
abc (to make things simple), it would only scan through the a, b, and
c sections of the file. Also, I've been pondering the idea of putting
all of the data in the dictionary file into a linked list so as to make
the data processing quite a bit faster. Unfortunately, this would use
quite a bit of memory and I'm pretty sure it's not the most efficient
way. The reasoning behind my madness is this: scanning through the same
area of a file over and over again is redundant. Accessing the word
directly from a variable is a lot faster than having to grab it from a
file, and *then* allocating it to a variable. But like I said, memory is
an issue. Or is it?

My idea consists of scanning through the whole file when the program is
originally opened, storing the values of the words in a linked list, and
using an array to point to where the various letter groups starts. Is there
an easier/more practical way of doing this? And if so, what? If my method
*is* feasible, what sort of problems would I be likely to encounter?

Thanks much,
Mike

Sam Holden · Oct 22, 2003

To speed up the routine I've decided to make an integer array
consisting of the line number where each letter starts in the
alphabetized dictionary file. So in case the scrambled letters were
abc (to make things simple), it would only scan through the a, b, and
c sections of the file. Also, I've been pondering the idea of putting
all of the data in the dictionary file into a linked list so as to make
the data processing quite a bit faster. Unfortunately, this would use
quite a bit of memory and I'm pretty sure it's not the most efficient
way. The reasoning behind my madness is this: scanning through the same
area of a file over and over again is redundant. Accessing the word
directly from a variable is a lot faster than having to grab it from a
file, and *then* allocating it to a variable. But like I said, memory is
an issue. Or is it?

A better algorithm will speed things up.

There's no need to search through the words, you can preprocess them
once and then do a fast lookup.

Convert each dictionary word into a word with the same letters but in
sorted order. For example, "cat" -> "act" and "aardvark"=>"aaadkrrv".
Store the mappings of "sorted words" to words in a multimap<string,string>
or map<string, vector<string> > or whatever.

Finding the unscrambled word is then as simple as converting the
scrambled word to a "sorted word" and then looking up the words it
could be in the map.

Memory will be gobbled up, but memory isn't usually a problem on
PCs these days. Memory can be optimised by using a hashing function
instead of "sorted words" as the index to an array of words. And file
offsets for the words.

But that's algorithms and not C++.

David White · Oct 22, 2003

Sam Holden said:
A better algorithm will speed things up.

There's no need to search through the words, you can preprocess them
once and then do a fast lookup.

Convert each dictionary word into a word with the same letters but in
sorted order. For example, "cat" -> "act" and "aardvark"=>"aaadkrrv".
Store the mappings of "sorted words" to words in a multimap<string,string>
or map<string, vector<string> > or whatever.

Click to expand...

Yes, I know of a Scrabble algorithm that found the highest-scoring possible
word from a dictionary of 90,000 words in a fraction of a second on a 33MHz
486. It stored words as sorted anagrams also, and then in a tree. Each
letter of the sorted anagram was a node in the tree.

DW

David Rubin · Oct 22, 2003

Sam Holden wrote:

[snip - unscramble words]

Convert each dictionary word into a word with the same letters but in
sorted order. For example, "cat" -> "act" and "aardvark"=>"aaadkrrv".
Store the mappings of "sorted words" to words in a multimap<string,string>

What do you do about anagrams?

or map<string, vector<string> > or whatever.

Using a map makes the question moot, assuming you store (a hash of)
sorted words. But consider "edra" which could unscramble to "read",
"dear", or "dare". I'm not sure you can solve this problem without
knowledge of the context. Perhaps you can use some sort of suffix
chaining algorithm? Clearly, there is no 1-1 mapping.

/david

Sam Holden · Oct 22, 2003

Sam Holden wrote:

[snip - unscramble words]

Convert each dictionary word into a word with the same letters but in
sorted order. For example, "cat" -> "act" and "aardvark"=>"aaadkrrv".
Store the mappings of "sorted words" to words in a multimap<string,string>

Click to expand...

What do you do about anagrams?

It's a multimap so they all get put in with the same key.

Using a map makes the question moot, assuming you store (a hash of)
sorted words. But consider "edra" which could unscramble to "read",
"dear", or "dare". I'm not sure you can solve this problem without
knowledge of the context. Perhaps you can use some sort of suffix
chaining algorithm? Clearly, there is no 1-1 mapping.

If a "scrambled" word just has a random arrangement of letters then clearly
you can't. You get a number of possible answers. But that's in the OPs
problem statement as well.

David Rubin · Oct 24, 2003

Sam said:
Sam Holden wrote:

[snip - unscramble words]

Convert each dictionary word into a word with the same letters but in
sorted order. For example, "cat" -> "act" and "aardvark"=>"aaadkrrv".
Store the mappings of "sorted words" to words in a multimap<string,string>

Click to expand...

What do you do about anagrams?

Click to expand...

It's a multimap so they all get put in with the same key.

That's my point. There could be more than one unscrambled word per
scrambled word.

[...anagrams...] Clearly, there is no 1-1 mapping.

If a "scrambled" word just has a random arrangement of letters then clearly
you can't. You get a number of possible answers. But that's in the OPs
problem statement as well.

This is all I saw:

Currently, I'm working on a program that unscrambles words using a
600k dictionary file.

OP doesn't mention anagrams at all. But it seems like an obvious
concern.

/david

Jerry Coffin · Oct 24, 2003

[ ... ]

That's my point. There could be more than one unscrambled word per
scrambled word.

I would use equal_range to find all the words corresponding to a set of
letters. Except under rather unusual circumstances, the computer
probably can't pick between them, at least without a LOT of extra work.

Question about my projects	3	Jul 23, 2021
Best Method for Microsoft Outlook PST To MBOX Conversion - 2025	6	Dec 9, 2024
Question about multiple metadata files to one file	0	Feb 14, 2022
Best Method to Import PST Files to Gmail Without Outlook	5	Dec 26, 2024
Euler Cromer method	1	Sep 28, 2024
Noob question about mathematical addition vs. "string addition" in C#	1	Mar 6, 2022
Trying to creade method .between()	3	Sep 24, 2023
Question about WEKA, Python and Python-WEKA-Wrapper3	0	Mar 31, 2022

Question About Method

Michael Lawson

Sam Holden

David White

David Rubin

Sam Holden

David Rubin

Jerry Coffin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads