what are the most frequently used functions?

X

Xah Lee

I had a idea today.

I wanted to know what are the top most frequently used functions in the
emacs lisp language. I thought i can write a quick script that go thru
all the elisp library locations and get a word-frequency report i want.

I started with a simple program:
http://xahlee.org/p/titus/count_word_frequency.py

and applied it to a Shakespeare text. Here's a sample result:
http://xahlee.org/p/titus/word_frequency.html

Then, i wrote a more elaborate one that recurse thru directories to
work on elisp code treasury.

The code is here:
http://xahlee.org/x/count_word_frequency.py

and i got a strange result. The word “the†appeared on the top,
along with many other English words. I quickly realized that these are
due to lisp function's doc strings. (not comments)

At this point, it dawned on me that there's no easy way to work around
this, Unless, i write this script in elisp which has functions that
read lisp code and can easily filter out doc strings.

Originally, i planned to use the word-frequency script on Perl, Python,
as well as Java, as well as Elisp. However, now it seems to me this
task is nigh impossible. Each of these lang has their own doc string
syntax. It's gonna be a heavy undertaking if the word-frequency script
is to work with all these langs, since that amounts to writing a parser
for each lang.

Alternatively, one can write multiple word-frequency scripts using each
lang in question, since most lang has facilities to deal with its own
syntax. However, this is still not trivial, and amounts to several
programing efforts.

Anyone would be interested in this problem?

PS bpalmer on #emacs irc.freenode wrote a elisp quicky to deal with
lisp, but that program is currently not fully working... see bottom
http://paste.lisp.org/display/28840

Xah
(e-mail address removed)
∑ http://xahlee.org/
 
R

robert

Xah said:
I had a idea today.

I wanted to know what are the top most frequently used functions in the
emacs lisp language. I thought i can write a quick script that go thru
all the elisp library locations and get a word-frequency report i want.

I started with a simple program:
http://xahlee.org/p/titus/count_word_frequency.py

and applied it to a Shakespeare text. Here's a sample result:
http://xahlee.org/p/titus/word_frequency.html

Then, i wrote a more elaborate one that recurse thru directories to
work on elisp code treasury.

The code is here:
http://xahlee.org/x/count_word_frequency.py

and i got a strange result. The word “the†appeared on the top,
along with many other English words. I quickly realized that these are
due to lisp function's doc strings. (not comments)

Would be interesting to see if the type-checking "The" in lisp is still frequent. I doubt.
At this point, it dawned on me that there's no easy way to work around
this, Unless, i write this script in elisp which has functions that
read lisp code and can easily filter out doc strings.

Originally, i planned to use the word-frequency script on Perl, Python,
as well as Java, as well as Elisp. However, now it seems to me this
task is nigh impossible. Each of these lang has their own doc string
syntax. It's gonna be a heavy undertaking if the word-frequency script
is to work with all these langs, since that amounts to writing a parser
for each lang.

Alternatively, one can write multiple word-frequency scripts using each
lang in question, since most lang has facilities to deal with its own
syntax. However, this is still not trivial, and amounts to several
programing efforts.

Editor code (best maybe scintilla/sc1, check also emacs itself, ...) has libraries for colorizing comments in all kinds of programming langs ...
Anyone would be interested in this problem?

I have a theory, that "bad source code" has more if/else/elif/case/switch dispatching statements per number of code words (lines..) than "good code" - independent of the language.

If you can count these ratio and correlate it to maybe a sf-ranking and to languages, that would be highly interesting for me... (in case drop a pointer in this thread / repeated subject)



-robert
 
J

Jürgen Exner

Oh, really? You should mark your calendar and celebrate the day annually!!!

And the relationship with Perl, Python, Java is exactly what?

jue
 
R

robert

Jürgen Exner said:
Oh, really? You should mark your calendar and celebrate the day annually!!!


And the relationship with Perl, Python, Java is exactly what?

read more of the context and answer to the OP
 
B

Barry Margolin

"Xah Lee said:
I had a idea today.

I wanted to know what are the top most frequently used functions in the
emacs lisp language. I thought i can write a quick script that go thru
all the elisp library locations and get a word-frequency report i want.

I started with a simple program:
http://xahlee.org/p/titus/count_word_frequency.py

and applied it to a Shakespeare text. Here's a sample result:
http://xahlee.org/p/titus/word_frequency.html

Then, i wrote a more elaborate one that recurse thru directories to
work on elisp code treasury.

The code is here:
http://xahlee.org/x/count_word_frequency.py

and i got a strange result. The word “the†appeared on the top,
along with many other English words. I quickly realized that these are
due to lisp function's doc strings. (not comments)

At this point, it dawned on me that there's no easy way to work around
this, Unless, i write this script in elisp which has functions that
read lisp code and can easily filter out doc strings.

For Lisp, just look for symbols that are immediately preceded by ( or
#'. The tokens after ( are not always functions, since this is also
used for constructing literal lists and for subforms of special
operators (e.g. the variable names in LET bindings) but I think the ones
that aren't functions will have low enough frequency that they won't
impact the results.

Perl would be harder, I think. For ordinary function calls you can look
for a word followed by (, but built-in functions allow use without
parentheses around the parameters.
 
X

Xah Lee

Barry Margolin wrote:
« For Lisp, just look for symbols that are immediately preceded by (
....»

Thanks a lot! great thought.

I've done accordingly, which counts satisfactorily.
http://xahlee.org/emacs/function-frequency.html

Will take a break and think about Perl, Python, Java later... For
Python and Java, i think the report will also have to count method
call since that what these langs deal with... slightly quite more
complex than just functional langs...

Xah
(e-mail address removed)
∑ http://xahlee.org/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top