Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > The method of insert doesn't work with nltk texts: AttributeError: 'ConcatenatedCorpusView' object has no attribute 'insert'

Reply
Thread Tools

The method of insert doesn't work with nltk texts: AttributeError: 'ConcatenatedCorpusView' object has no attribute 'insert'

 
 
Token Type
Guest
Posts: n/a
 
      09-02-2012
I wrote codes to add 'like' at the end of every 3 word in a nltk text as follows:

>>> text = nltk.corpus.brown.words(categories = 'news')
>>> def hedge(text):

for i in range(3,len(text),4):
new_text = text.insert(i, 'like')
return new_text[:50]

>>> hedge(text)


Traceback (most recent call last):
File "<pyshell#77>", line 1, in <module>
hedge(text)
File "<pyshell#76>", line 3, in hedge
new_text = text.insert(i, 'like')
AttributeError: 'ConcatenatedCorpusView' object has no attribute 'insert'

Isn't text in the brown corpus above a list? why doesn't it has attribute 'insert'?

Thanks much for your hints.
 
Reply With Quote
 
 
 
 
Dave Angel
Guest
Posts: n/a
 
      09-02-2012
On 09/02/2012 05:39 AM, Token Type wrote:
> I wrote codes to add 'like' at the end of every 3 word in a nltk text as follows:
>
>>>> text = nltk.corpus.brown.words(categories = 'news')
>>>> def hedge(text):

> for i in range(3,len(text),4):
> new_text = text.insert(i, 'like')
> return new_text[:50]
>
>>>> hedge(text)

> Traceback (most recent call last):
> File "<pyshell#77>", line 1, in <module>
> hedge(text)
> File "<pyshell#76>", line 3, in hedge
> new_text = text.insert(i, 'like')
> AttributeError: 'ConcatenatedCorpusView' object has no attribute 'insert'
>
> Isn't text in the brown corpus above a list? why doesn't it has attribute 'insert'?
>

I tried to find online documentation for nltk, and although I found the
mention of a free online book, I didn't see it. So, some generic comments.

The error message is telling you that the object 'text' is not a list,
but a "ConcatenatedCorpusView". Perhaps you can look that up in your
docs for nltk. But there's quite a bit you can do just with the
interpreter.

try print type(text) to see the type of text.

try dir(text) to see what attributes it has

try help(text) to see what docstrings might be built in.

Incidentally, if you really think it's a list of words (or that it acts
like a list), then 'text' might not be the best name for it. Any reason
you didn't just call it words ?

--

DaveA


 
Reply With Quote
 
 
 
 
Dave Angel
Guest
Posts: n/a
 
      09-02-2012
On 09/02/2012 09:06 AM, John H. Li wrote:
> First, thanks very much for your kind help.
>
> 1)Further more, I test the function of insert. It did work as follows:
>
>>>> text = ['The', 'Fulton', 'County', 'Grand']
>>>> text.insert(3,'like')
>>>> text

> ['The', 'Fulton', 'County', 'like', 'Grand']
> 2) I tested the text from nltk. It is list actually. See the following:
>>>> text = nltk.corpus.brown.words(categories = 'news')
>>>> text[:10]

> ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an',
> 'investigation', 'of']
>
> How come python tells me that it is not a list by prompting "AttributeError:
> 'ConcatenatedCorpusView' object has no attribute 'insert'"? I am confused.
>
> Since we doubt text is not a list, I have to add one more line of code
> there as follows. Then it seems working.
>>>> text = nltk.corpus.brown.words(categories = 'news')
>>>> def hedge(text):

> text = list(text)
> for i in range(3,len(text),4):
> text.insert(i, 'like')
> return text[:50]
>
>>>> hedge(text)

> ['The', 'Fulton', 'County', 'like', 'Grand', 'Jury', 'said', 'like',
> 'Friday', 'an', 'investigation', 'like', 'of', "Atlanta's", 'recent',
> 'like', 'primary', 'election', 'produced', 'like', '``', 'no', 'evidence',
> 'like', "''", 'that', 'any', 'like', 'irregularities', 'took', 'place',
> 'like', '.', 'The', 'jury', 'like', 'further', 'said', 'in', 'like',
> 'term-end', 'presentments', 'that', 'like', 'the', 'City', 'Executive',
> 'like', 'Committee', ',']
>
> Isn't it odd?
>
>


Without reading the documentation, or at least the help(), I can't
figure it to be odd. If a class wants to support slicing semantics, all
it has to do is implement special methods like __getslice__ and
__setslice__. If it doesn't document .insert(), then you shouldn't try
to call it. Duck-typing.

What did you get when you tried type(), dir() and help() ? Did they help.

--

DaveA

 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      09-02-2012
Token Type wrote:

> I wrote codes to add 'like' at the end of every 3 word in a nltk text as

follows:
>
> >>> text = nltk.corpus.brown.words(categories = 'news')
> >>> def hedge(text):

> for i in range(3,len(text),4):
> new_text = text.insert(i, 'like')
> return new_text[:50]
>
> >>> hedge(text)

>
> Traceback (most recent call last):
> File "<pyshell#77>", line 1, in <module>
> hedge(text)
> File "<pyshell#76>", line 3, in hedge
> new_text = text.insert(i, 'like')
> AttributeError: 'ConcatenatedCorpusView' object has no attribute 'insert'
>
> Isn't text in the brown corpus above a list? why doesn't it has attribute

'insert'?
>
> Thanks much for your hints.


The error message shows that text is not a list. It looks like a list,

>>> text = nltk.corpus.brown.words(categories="news")
>>> text

['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

but it is actually a nltk.corpus.reader.util.ConcatenatedCorpusView:

>>> type(text)

<class 'nltk.corpus.reader.util.ConcatenatedCorpusView'>

The implementer of a class is free to decide what methods he wants to
implement. You can get a first impression of the available ones with dir():

>>> dir(text)

['_MAX_REPR_SIZE', '__add__', '__class__', '__cmp__', '__contains__',
'__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__',
'__getitem__', '__hash__', '__init__', '__iter__', '__len__', '__module__',
'__mul__', '__new__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__',
'__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', '_offsets', '_open_piece', '_pieces', 'close', 'count',
'index', 'iterate_from']

As you can see insert() is not among these methods. However, __iter__() is a
hint that you can convert the ConcatenatedCorpusView to a list, and that
does provide an insert() method. Let's try:

>>> text = list(text)
>>> type(text)

<type 'list'>
>>> text.insert(0, "yadda")
>>> text[:5]

['yadda', 'The', 'Fulton', 'County', 'Grand']

Note that your hedge() function may still not work as you expect:

>>> text = ["-"] * 20
>>> text

['-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-',
'-', '-', '-', '-', '-']
>>> for i in range(0, len(text), 3):

.... text.insert(i, "X")
....
>>> text

['X', '-', '-', 'X', '-', '-', 'X', '-', '-', 'X', '-', '-', 'X', '-', '-',
'X', '-', '-', 'X', '-', '-', '-', '-', '-', '-', '-', '-']

That is because the list is growing with every insert() call. One workaround
is to start inserting items at the end of the list:
>>> text = ["-"] * 20
>>> for i in reversed(range(0, len(text), 3)):

.... text.insert(i, "X")
....
>>> text

['X', '-', '-', '-', 'X', '-', '-', '-', 'X', '-', '-', '-', 'X', '-', '-',
'-', 'X', '-', '-', '-', 'X', '-', '-', '-', 'X', '-', '-']


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ATTRIBUTE ERROR: 'module' object has no attribute 'ssl' johnny Python 5 12-10-2006 01:10 PM
How to insert the "modified time" attribute in "date taken" attribute of an image in batch mode? ashjas Computer Support 8 11-08-2006 10:04 PM
RE: error problems for import some copora with nltk Tony Meyer Python 6 12-23-2004 07:21 PM
error problems for import some copora with nltk ekyungchung@gmail.com Python 1 12-22-2004 07:15 AM
newbie NLTK question j_pennington_moore Python 0 07-20-2004 03:53 AM



Advertisments