NLTK and package structure

S

Steven Bird

The Natural Language Toolkit (NLTK) is a suite of open source Python
packages for natural language processing, available at
http://nltk.org/, together with an O'Reilly book which is available
online for free. Development is now hosted at http://github.com/nltk
-- get it here: (e-mail address removed):nltk/nltk.git

I am seeking advice on how to speed up our import process. The
contents of several sub-packages are made available at the top level,
for the convenience of programmers and so that the examples published
in the book are more concise. This has been done by having lots of
"from subpackage import *" in the top-level __init__.py. Some of
these imports are lazy. Unfortunately, any import of nltk leads to
cascading imports which pull in most of the library, unacceptably
slowing down the load time.

https://github.com/nltk/nltk/blob/master/nltk/__init__.py

I am looking for a solution that meets the following requirements:
1) import nltk is as fast as possible
2) published code examples are not broken
(or are easily fixed by calling nltk.load_subpackages() before the
rest of the code)
3) popular subpackage names are available at the top level
(e.g. nltk.probability.ConditionalFreqDist as nltk.ConditionalFreqDist)

The existing discussion of this issue amongst our developers is posted here:
http://code.google.com/p/nltk/issues/detail?id=378

Our practice in structuring subpackages is described here:
http://code.google.com/p/nltk/wiki/PackageStructure

Thanks for any advice.

-Steven Bird (NLTK Project coordinator)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top