Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Python3.0 has more duplication in source code than Python2.5

Reply
Thread Tools

Python3.0 has more duplication in source code than Python2.5

 
 
Terry
Guest
Posts: n/a
 
      02-07-2009
I used a CPD (copy/paste detector) in PMD to analyze the code
duplication in Python source code. I found that Python3.0 contains
more duplicated code than the previous versions. The CPD tool is far
from perfect, but I still feel the analysis makes some sense.

|Source Code | NLOC | Dup60 | Dup30 | Rate60 | Rate 30
|
Python1.5(Core) 19418 1072 3023 6% 16%
Python2.5(Core) 35797 1656 6441 5% 18%
Python3.0(Core) 40737 3460 9076 8% 22%
Apache(server) 18693 1114 2553 6% 14%

NLOC: The net lines of code
Dup60: Lines of code that has 60 continuous tokens duplicated to other
code (counted twice or more)
Dup30: 30 tokens duplicated
Rate60: Dup60/NLOC
Rate30: Dup30/NLOC

We can see that the common duplicated rate is tended to be stable. But
Python3.0 is slightly bigger than that. Consider the small increase in
NLOC, the duplication rate of Python3.0 might be too big.

Does that say something about the code quality of Python3.0?
 
Reply With Quote
 
 
 
 
Martin v. Lwis
Guest
Posts: n/a
 
      02-07-2009
> Does that say something about the code quality of Python3.0?

Not necessarily. IIUC, copying a single file with 2000 lines
completely could already account for that increase.

It would be interesting to see what specific files have gained
large numbers of additional files, compared to 2.5.

Regards,
Martin
 
Reply With Quote
 
 
 
 
Terry
Guest
Posts: n/a
 
      02-07-2009
On 2月7日, 下午3时36分, "Martin v.. Löwis" <(E-Mail Removed)> wrote:
> > Does that say something about the code quality of Python3.0?

>
> Not necessarily. IIUC, copying a single file with 2000 lines
> completely could already account for that increase.
>
> It would be interesting to see what specific files have gained
> large numbers of additional files, compared to 2.5.
>
> Regards,
> Martin


But the duplication are always not very big, from about 100 lines
(rare) to less the 5 lines. As you can see the Rate30 is much bigger
than Rate60, that means there are a lot of small duplications.
 
Reply With Quote
 
Diez B. Roggisch
Guest
Posts: n/a
 
      02-07-2009
Terry schrieb:
> On 2月7日, 下午3时36分, "Martin v. Löwis" <(E-Mail Removed)> wrote:
>>> Does that say something about the code quality of Python3.0?

>> Not necessarily. IIUC, copying a single file with 2000 lines
>> completely could already account for that increase.
>>
>> It would be interesting to see what specific files have gained
>> large numbers of additional files, compared to 2.5.
>>
>> Regards,
>> Martin

>
> But the duplication are always not very big, from about 100 lines
> (rare) to less the 5 lines. As you can see the Rate30 is much bigger
> than Rate60, that means there are a lot of small duplications.


Do you by any chance have a few examples of these? There is a lot of
idiomatic code in python to e.g. acquire and release the GIL or doing
refcount-stuff. If that happens to be done with rather generic names as
arguments, I can well imagine that as being the cause.

Diez
 
Reply With Quote
 
Terry
Guest
Posts: n/a
 
      02-07-2009
On 2月7日, 下午7时10分, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> Terry schrieb:
>
> > On 2月7日, 下午3时36分, "Martin v. Löwis" <(E-Mail Removed)> wrote:
> >>> Does that say something about the code quality of Python3.0?
> >> Not necessarily. IIUC, copying a single file with 2000 lines
> >> completely could already account for that increase.

>
> >> It would be interesting to see what specific files have gained
> >> large numbers of additional files, compared to 2.5.

>
> >> Regards,
> >> Martin

>
> > But the duplication are always not very big, from about 100 lines
> > (rare) to less the 5 lines. As you can see the Rate30 is much bigger
> > than Rate60, that means there are a lot of small duplications.

>
> Do you by any chance have a few examples of these? There is a lot of
> idiomatic code in python to e.g. acquire and release the GIL or doing
> refcount-stuff. If that happens to be done with rather generic names as
> arguments, I can well imagine that as being the cause.
>
> Diez


Example 1:
Found a 64 line (153 tokens) duplication in the following files:
Starting at line 73 of D:\DOWNLOADS\Python-3.0\Python\thread_pth.h
Starting at line 222 of D:\DOWNLOADS\Python-3.0\Python
\thread_pthread.h

return (long) threadid;
#else
return (long) *(long *) &threadid;
#endif
}

static void
do_PyThread_exit_thread(int no_cleanup)
{
dprintf(("PyThread_exit_thread called\n"));
if (!initialized) {
if (no_cleanup)
_exit(0);
else
exit(0);
}
}

void
PyThread_exit_thread(void)
{
do_PyThread_exit_thread(0);
}

void
PyThread__exit_thread(void)
{
do_PyThread_exit_thread(1);
}

#ifndef NO_EXIT_PROG
static void
do_PyThread_exit_prog(int status, int no_cleanup)
{
dprintf(("PyThread_exit_prog(%d) called\n", status));
if (!initialized)
if (no_cleanup)
_exit(status);
else
exit(status);
}

void
PyThread_exit_prog(int status)
{
do_PyThread_exit_prog(status, 0);
}

void
PyThread__exit_prog(int status)
{
do_PyThread_exit_prog(status, 1);
}
#endif /* NO_EXIT_PROG */

#ifdef USE_SEMAPHORES

/*
* Lock support.
*/

PyThread_type_lock
PyThread_allocate_lock(void)
{

 
Reply With Quote
 
Terry
Guest
Posts: n/a
 
      02-07-2009
On 2月7日, 下午7时10分, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> Terry schrieb:
>
> > On 2月7日, 下午3时36分, "Martin v. Löwis" <(E-Mail Removed)> wrote:
> >>> Does that say something about the code quality of Python3.0?
> >> Not necessarily. IIUC, copying a single file with 2000 lines
> >> completely could already account for that increase.

>
> >> It would be interesting to see what specific files have gained
> >> large numbers of additional files, compared to 2.5.

>
> >> Regards,
> >> Martin

>
> > But the duplication are always not very big, from about 100 lines
> > (rare) to less the 5 lines. As you can see the Rate30 is much bigger
> > than Rate60, that means there are a lot of small duplications.

>
> Do you by any chance have a few examples of these? There is a lot of
> idiomatic code in python to e.g. acquire and release the GIL or doing
> refcount-stuff. If that happens to be done with rather generic names as
> arguments, I can well imagine that as being the cause.
>
> Diez


Example 2:
Found a 16 line (106 tokens) duplication in the following files:
Starting at line 4970 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c
Starting at line 5015 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c
Starting at line 5073 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c
Starting at line 5119 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c

PyErr_Format(PyExc_TypeError,
"GeneratorExp field \"generators\" must be a list, not a %.200s", tmp-
>ob_type->tp_name);

goto failed;
}
len = PyList_GET_SIZE(tmp);
generators = asdl_seq_new(len, arena);
if (generators == NULL) goto failed;
for (i = 0; i < len; i++) {
comprehension_ty value;
res = obj2ast_comprehension
(PyList_GET_ITEM(tmp, i), &value, arena);
if (res != 0) goto failed;
asdl_seq_SET(generators, i, value);
}
Py_XDECREF(tmp);
tmp = NULL;
} else {
PyErr_SetString(PyExc_TypeError, "required
field \"generators\" missing from GeneratorExp");

 
Reply With Quote
 
Terry
Guest
Posts: n/a
 
      02-07-2009
On 2月7日, 下午7时10分, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> Terry schrieb:
>
> > On 2月7日, 下午3时36分, "Martin v. Löwis" <(E-Mail Removed)> wrote:
> >>> Does that say something about the code quality of Python3.0?
> >> Not necessarily. IIUC, copying a single file with 2000 lines
> >> completely could already account for that increase.

>
> >> It would be interesting to see what specific files have gained
> >> large numbers of additional files, compared to 2.5.

>
> >> Regards,
> >> Martin

>
> > But the duplication are always not very big, from about 100 lines
> > (rare) to less the 5 lines. As you can see the Rate30 is much bigger
> > than Rate60, that means there are a lot of small duplications.

>
> Do you by any chance have a few examples of these? There is a lot of
> idiomatic code in python to e.g. acquire and release the GIL or doing
> refcount-stuff. If that happens to be done with rather generic names as
> arguments, I can well imagine that as being the cause.
>
> Diez


Example of a small one (61 token duplicated):
Found a 19 line (61 tokens) duplication in the following files:
Starting at line 132 of D:\DOWNLOADS\Python-3.0\Python\modsupport.c
Starting at line 179 of D:\DOWNLOADS\Python-3.0\Python\modsupport.c

PyTuple_SET_ITEM(v, i, w);
}
if (itemfailed) {
/* do_mkvalue() should have already set an error */
Py_DECREF(v);
return NULL;
}
if (**p_format != endchar) {
Py_DECREF(v);
PyErr_SetString(PyExc_SystemError,
"Unmatched paren in format");
return NULL;
}
if (endchar)
++*p_format;
return v;
}

static PyObject *

 
Reply With Quote
 
Terry
Guest
Posts: n/a
 
      02-07-2009
On 2月7日, 下午7时10分, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> Terry schrieb:
>
> > On 2月7日, 下午3时36分, "Martin v. Löwis" <(E-Mail Removed)> wrote:
> >>> Does that say something about the code quality of Python3.0?
> >> Not necessarily. IIUC, copying a single file with 2000 lines
> >> completely could already account for that increase.

>
> >> It would be interesting to see what specific files have gained
> >> large numbers of additional files, compared to 2.5.

>
> >> Regards,
> >> Martin

>
> > But the duplication are always not very big, from about 100 lines
> > (rare) to less the 5 lines. As you can see the Rate30 is much bigger
> > than Rate60, that means there are a lot of small duplications.

>
> Do you by any chance have a few examples of these? There is a lot of
> idiomatic code in python to e.g. acquire and release the GIL or doing
> refcount-stuff. If that happens to be done with rather generic names as
> arguments, I can well imagine that as being the cause.
>
> Diez


Example of a even small one (30 token duplicated):
Found a 11 line (30 tokens) duplication in the following files:
Starting at line 2551 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c
Starting at line 3173 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c

if (PyObject_SetAttrString(result, "ifs", value) == -1)
goto failed;
Py_DECREF(value);
return result;
failed:
Py_XDECREF(value);
Py_XDECREF(result);
return NULL;
}

PyObject*

 
Reply With Quote
 
Terry
Guest
Posts: n/a
 
      02-07-2009
On 2月7日, 下午7时10分, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> Terry schrieb:
>
> > On 2月7日, 下午3时36分, "Martin v. Löwis" <(E-Mail Removed)> wrote:
> >>> Does that say something about the code quality of Python3.0?
> >> Not necessarily. IIUC, copying a single file with 2000 lines
> >> completely could already account for that increase.

>
> >> It would be interesting to see what specific files have gained
> >> large numbers of additional files, compared to 2.5.

>
> >> Regards,
> >> Martin

>
> > But the duplication are always not very big, from about 100 lines
> > (rare) to less the 5 lines. As you can see the Rate30 is much bigger
> > than Rate60, that means there are a lot of small duplications.

>
> Do you by any chance have a few examples of these? There is a lot of
> idiomatic code in python to e.g. acquire and release the GIL or doing
> refcount-stuff. If that happens to be done with rather generic names as
> arguments, I can well imagine that as being the cause.
>
> Diez


And I'm not saying that you can not have duplication in code. But it
seems that the stable & successful software releases tend to have
relatively stable duplication rate.
 
Reply With Quote
 
Benjamin Peterson
Guest
Posts: n/a
 
      02-07-2009
Terry <terry.yinzhe <at> gmail.com> writes:
> On 2月7日, 下午7时10分, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> > Do you by any chance have a few examples of these? There is a lot of
> > idiomatic code in python to e.g. acquire and release the GIL or doing
> > refcount-stuff. If that happens to be done with rather generic names as
> > arguments, I can well imagine that as being the cause.


> Starting at line 5119 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c


This isn't really fair because Python-ast.c is auto generated.




 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Like all great travelers, I have seen more than I remember andremember more than I have seen. shenrilaa@gmail.com Java 0 03-06-2008 08:11 AM
Like all great travelers, I have seen more than I remember andremember more than I have seen. shenrilaa@gmail.com C++ 0 03-05-2008 08:41 AM
Like all great travelers, I have seen more than I remember andremember more than I have seen. shenrilaa@gmail.com C Programming 0 03-05-2008 03:26 AM
Is it possible to reference more than one source code files in codebehind? antonyliu2002@yahoo.com ASP .Net 1 02-07-2006 04:05 PM
avoiding code duplication w/ type lists =?iso-8859-1?B?RnJhbmstUmVu6SBTY2jkZmVy?= C++ 4 02-01-2006 08:25 PM



Advertisments