The method of insert doesn't work with nltk texts: AttributeError: 'ConcatenatedCorpusView' object

T

Token Type

I wrote codes to add 'like' at the end of every 3 word in a nltk text as follows:
for i in range(3,len(text),4):
new_text = text.insert(i, 'like')
return new_text[:50]

Traceback (most recent call last):
File "<pyshell#77>", line 1, in <module>
hedge(text)
File "<pyshell#76>", line 3, in hedge
new_text = text.insert(i, 'like')
AttributeError: 'ConcatenatedCorpusView' object has no attribute 'insert'

Isn't text in the brown corpus above a list? why doesn't it has attribute 'insert'?

Thanks much for your hints.
 
D

Dave Angel

I wrote codes to add 'like' at the end of every 3 word in a nltk text as follows:
for i in range(3,len(text),4):
new_text = text.insert(i, 'like')
return new_text[:50]
Traceback (most recent call last):
File "<pyshell#77>", line 1, in <module>
hedge(text)
File "<pyshell#76>", line 3, in hedge
new_text = text.insert(i, 'like')
AttributeError: 'ConcatenatedCorpusView' object has no attribute 'insert'

Isn't text in the brown corpus above a list? why doesn't it has attribute 'insert'?
I tried to find online documentation for nltk, and although I found the
mention of a free online book, I didn't see it. So, some generic comments.

The error message is telling you that the object 'text' is not a list,
but a "ConcatenatedCorpusView". Perhaps you can look that up in your
docs for nltk. But there's quite a bit you can do just with the
interpreter.

try print type(text) to see the type of text.

try dir(text) to see what attributes it has

try help(text) to see what docstrings might be built in.

Incidentally, if you really think it's a list of words (or that it acts
like a list), then 'text' might not be the best name for it. Any reason
you didn't just call it words ?
 
D

Dave Angel

First, thanks very much for your kind help.

1)Further more, I test the function of insert. It did work as follows:
text = ['The', 'Fulton', 'County', 'Grand']
text.insert(3,'like')
text
['The', 'Fulton', 'County', 'like', 'Grand']
2) I tested the text from nltk. It is list actually. See the following:
text = nltk.corpus.brown.words(categories = 'news')
text[:10]
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an',
'investigation', 'of']

How come python tells me that it is not a list by prompting "AttributeError:
'ConcatenatedCorpusView' object has no attribute 'insert'"? I am confused.

Since we doubt text is not a list, I have to add one more line of code
there as follows. Then it seems working.text = list(text)
for i in range(3,len(text),4):
text.insert(i, 'like')
return text[:50]
['The', 'Fulton', 'County', 'like', 'Grand', 'Jury', 'said', 'like',
'Friday', 'an', 'investigation', 'like', 'of', "Atlanta's", 'recent',
'like', 'primary', 'election', 'produced', 'like', '``', 'no', 'evidence',
'like', "''", 'that', 'any', 'like', 'irregularities', 'took', 'place',
'like', '.', 'The', 'jury', 'like', 'further', 'said', 'in', 'like',
'term-end', 'presentments', 'that', 'like', 'the', 'City', 'Executive',
'like', 'Committee', ',']

Isn't it odd?

Without reading the documentation, or at least the help(), I can't
figure it to be odd. If a class wants to support slicing semantics, all
it has to do is implement special methods like __getslice__ and
__setslice__. If it doesn't document .insert(), then you shouldn't try
to call it. Duck-typing.

What did you get when you tried type(), dir() and help() ? Did they help.
 
P

Peter Otten

Token said:
I wrote codes to add 'like' at the end of every 3 word in a nltk text as follows:for i in range(3,len(text),4):
new_text = text.insert(i, 'like')
return new_text[:50]

Traceback (most recent call last):
File "<pyshell#77>", line 1, in <module>
hedge(text)
File "<pyshell#76>", line 3, in hedge
new_text = text.insert(i, 'like')
AttributeError: 'ConcatenatedCorpusView' object has no attribute 'insert'

Isn't text in the brown corpus above a list? why doesn't it has attribute 'insert'?

Thanks much for your hints.

The error message shows that text is not a list. It looks like a list,
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

but it is actually a nltk.corpus.reader.util.ConcatenatedCorpusView:
<class 'nltk.corpus.reader.util.ConcatenatedCorpusView'>

The implementer of a class is free to decide what methods he wants to
implement. You can get a first impression of the available ones with dir():
['_MAX_REPR_SIZE', '__add__', '__class__', '__cmp__', '__contains__',
'__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__',
'__getitem__', '__hash__', '__init__', '__iter__', '__len__', '__module__',
'__mul__', '__new__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__',
'__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', '_offsets', '_open_piece', '_pieces', 'close', 'count',
'index', 'iterate_from']

As you can see insert() is not among these methods. However, __iter__() is a
hint that you can convert the ConcatenatedCorpusView to a list, and that
does provide an insert() method. Let's try:
text = list(text)
type(text)
text.insert(0, "yadda")
text[:5]
['yadda', 'The', 'Fulton', 'County', 'Grand']

Note that your hedge() function may still not work as you expect:
['-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-',
'-', '-', '-', '-', '-'].... text.insert(i, "X")
....['X', '-', '-', 'X', '-', '-', 'X', '-', '-', 'X', '-', '-', 'X', '-', '-',
'X', '-', '-', 'X', '-', '-', '-', '-', '-', '-', '-', '-']

That is because the list is growing with every insert() call. One workaround
is to start inserting items at the end of the list:
text = ["-"] * 20
for i in reversed(range(0, len(text), 3)):
.... text.insert(i, "X")
....['X', '-', '-', '-', 'X', '-', '-', '-', 'X', '-', '-', '-', 'X', '-', '-',
'-', 'X', '-', '-', '-', 'X', '-', '-', '-', 'X', '-', '-']
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top