Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > pythonXX.dll size: please split CJK codecs out

Reply
Thread Tools

pythonXX.dll size: please split CJK codecs out

 
 
Giovanni Bajo
Guest
Posts: n/a
 
      08-20-2005
Hello,

python24.dll is much bigger than python23.dll. This was discussed already on
the newsgroup, see the thread starting here:
http://mail.python.org/pipermail/pyt...ly/229096.html

I don't think I fully understand the reason why additional .pyd modules were
built into the .dll. OTOH, this does not help anyone, since:

- Normal users don't care about the size of the pythonXX.dll, or the number of
dependencies, nor if a given module is shipped as .py or .pyd. They just import
modules of the standard library, ignoring where each module resides. So,
putting more modules (or less modules) within pythonXX.dll makes absolutely no
differences for them.
- Users which freeze applications instead are *worse* served by this, because
they end up with larger programs. For them, it is better to have the highest
granularity wrt external modules, so that the resulting freezed application is
as small as possible.

A post in the previous thread (specifically
http://mail.python.org/pipermail/pyt...ly/229157.html) suggests
that py2exe users might get a small benefit from the fact that in some cases
they would be able to ship the program with only 3 files (app.exe,
python24.dll, and library.zip). But:

1) I reckon this is a *very* rare case. You need to write an application that
does not use Tk, socket, zlib, expat, nor any external library like numarray or
PIL.
2) Even if you fit the above case, you still end up with 3 files, which means
you still have to package your app somehow, etc. Also, the resulting package
will be *bigger* for no reason, as python24.dll might include modules which the
user doesn't need.

I don't think that merging things into python24.dll is a good way to serve
users of freezing programs, not even py2exe users. Personally, I use McMillan's
PyInstaller[1] which always builds a single executable, no matter what. So I do
not like the idea that things are getting worse because of py2exe: py2exe
should be fixed instead, if its users request to have fewer files to ship (in
my case, for instance, this missing feature is a showstopper for adopting
py2exe).

Can we at least undo this unfortunate move in time for 2.5? I would be grateful
if *at least* the CJK codecs (which are like 1Mb big) are splitted out of
python25.dll. IMHO, I would prefer having *more* granularity, rather than
*less*.

+1 on splitting out the CJK codecs.

Thanks,
Giovanni Bajo


[1] See also my page on PyInstaller: http://www.develer.com/oss/PyInstaller


 
Reply With Quote
 
 
 
 
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Guest
Posts: n/a
 
      08-21-2005
Giovanni Bajo wrote:
> I don't think I fully understand the reason why additional .pyd modules were
> built into the .dll. OTOH, this does not help anyone, since:


The reason is simple: a single DLL is easier to maintain. You only need
to add the new files to the VC project, edit config.c, and be done. No
new project to create for N different configurations, no messing with
the MSI builder.

In addition, having everything in a single DLL speeds up Python startup
a little, since less file searching is necessary.

> Can we at least undo this unfortunate move in time for 2.5? I would be grateful
> if *at least* the CJK codecs (which are like 1Mb big) are splitted out of
> python25.dll. IMHO, I would prefer having *more* granularity, rather than
> *less*.


If somebody would formulate a policy (i.e. conditions under which
modules go into python2x.dll, vs. going into separate files), I'm
willing to implement it. This policy should best be formulated in
a PEP.

The policy should be flexible wrt. to future changes. I.e. it should
*not* say "do everything as in Python 2.3", because this means I
would have to rip off the modules added after 2.3 entirely (i.e.
not ship them at all). Instead, the policy should give clear guidance
even for modules that are not yet developed.

It should be a PEP, so that people can comment. For example,
I think I would be -1 on a policy "make python2x.dll as minimal
as possible, containing only modules that are absolutely
needed for startup".

Regards,
Martin
 
Reply With Quote
 
 
 
 
Giovanni Bajo
Guest
Posts: n/a
 
      08-21-2005
Martin v. L÷wis wrote:

>> I don't think I fully understand the reason why additional .pyd
>> modules were built into the .dll. OTOH, this does not help anyone,
>> since:

>
> The reason is simple: a single DLL is easier to maintain. You only
> need
> to add the new files to the VC project, edit config.c, and be done. No
> new project to create for N different configurations, no messing with
> the MSI builder.


FWIW, this just highlights how ineffecient your build system is. Everything you
currently do by hand could be automated, including MSI generation. Also, you
describe the Windows procedure, which I suppose it does not take into account
what needs to be done for other OS. But I'm sure that revamping the Python
building system is not a piece of cake.

I'll take the point though: it's easier to maintain for developers, and most
Python users don't care.

> In addition, having everything in a single DLL speeds up Python
> startup a little, since less file searching is necessary.


I highly doubt this can be noticed in an actual benchmark, but I could be
wrong. I can produce numbers though, if this can help people decide.

>> Can we at least undo this unfortunate move in time for 2.5? I would
>> be grateful if *at least* the CJK codecs (which are like 1Mb big)
>> are splitted out of python25.dll. IMHO, I would prefer having *more*
>> granularity, rather than *less*.

>
> If somebody would formulate a policy (i.e. conditions under which
> modules go into python2x.dll, vs. going into separate files), I'm
> willing to implement it. This policy should best be formulated in
> a PEP.
>
> The policy should be flexible wrt. to future changes. I.e. it should
> *not* say "do everything as in Python 2.3", because this means I
> would have to rip off the modules added after 2.3 entirely (i.e.
> not ship them at all). Instead, the policy should give clear guidance
> even for modules that are not yet developed.
>
> It should be a PEP, so that people can comment. For example,
> I think I would be -1 on a policy "make python2x.dll as minimal
> as possible, containing only modules that are absolutely
> needed for startup".


I'm willing to write up such a PEP, but it's hard to devise an universal
policy. Basically, the only element we can play with is the size of the
resulting binary for the module. Would you like a policy like "split out every
module whose binary on Windows is > X kbytes?".

My personal preference would go to something "make python2x.dll include only
the modules which are really core, like sys and os". This would also provide
guidance to future modules, as they would simply go in external modules (I
don't think really core stuff is being added right now).

At this point, my main goal is getting CJK out of the DLL, so everything that
lets me achieve this goal is good for me.

Thanks,
--
Giovanni Bajo


 
Reply With Quote
 
Michael Hoffman
Guest
Posts: n/a
 
      08-21-2005
Giovanni Bajo wrote:
>
> FWIW, this just highlights how ineffecient your build system is. Everything you
> currently do by hand could be automated, including MSI generation.


I'm sure Martin would be happy to consider a patch to make the build
system more efficient.

> I'm willing to write up such a PEP, but it's hard to devise an universal
> policy.


This is the reason that a PEP is needed before there are changes.
--
Michael Hoffman
 
Reply With Quote
 
Giovanni Bajo
Guest
Posts: n/a
 
      08-21-2005
Michael Hoffman wrote:

>> FWIW, this just highlights how ineffecient your build system is.
>> Everything you currently do by hand could be automated, including
>> MSI generation.

>
> I'm sure Martin would be happy to consider a patch to make the build
> system more efficient.



Out of curiosity, was this ever discussed among Python developers? Would
something like scons qualify for this? OTOH, scons opens nasty
self-bootstrapping issues (being written itself in Python).

Before considering a patch (or even a PEP) for this, the basic requirements
should be made clear. I know portability among several UNIX flavours is one,
for instance. What are the others?
--
Giovanni Bajo


 
Reply With Quote
 
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Guest
Posts: n/a
 
      08-21-2005
Giovanni Bajo wrote:
> FWIW, this just highlights how ineffecient your build system is. Everything you
> currently do by hand could be automated, including MSI generation. Also, you
> describe the Windows procedure, which I suppose it does not take into account
> what needs to be done for other OS. But I'm sure that revamping the Python
> building system is not a piece of cake.


You are wrong. It is not true that everything I do by hand could be
automated. Atleast after automation, I still would have to do things
by hand, namely invoke the automation.

You probably haven't looked at the MSI generation at all: it *is*
automatic. However, everytime something changes in the structure,
the code generating the MSI must be adjusted to the new structure.

> I'll take the point though: it's easier to maintain for developers, and most
> Python users don't care.


See, this I find surprising. If there really is such a big need for
python24.dll being split in many more modules - why doesn't anybody
just do this, and offers it as a separate installation for use
with py2exe?

The fact that this hasn't happened indicates that users don't need
it badly enough. I personally rarely need to create a standalone
Python application, but when I did, I just used freeze, and static
linking. That way, I got a single binary, with no magic packaging,
and a minimal one, too.

>>In addition, having everything in a single DLL speeds up Python
>>startup a little, since less file searching is necessary.

>
> I highly doubt this can be noticed in an actual benchmark, but I could be
> wrong. I can produce numbers though, if this can help people decide.


No, this is a minor issue. If you do write a PEP, and you find it
relatively easy to compare the maximum modularization to the minimal
one, it would be useful to underline your point, of course.

> I'm willing to write up such a PEP, but it's hard to devise an universal
> policy.


Indeed. For Python 2.4, I made up a policy for myself: everything that
does not depend on a separate (non-system) library goes into
pythonxy.dll. That way, everybody will be able to compile Python
from sources without downloading anything else, yet it causes minimum
maintenance overhead. That's how the current python24.dll came about.

> Basically, the only element we can play with is the size of the
> resulting binary for the module. Would you like a policy like "split out every
> module whose binary on Windows is > X kbytes?".


It's less important what I like - I think I would ask for a poll on
the proposed PEP, and I would be -1 on anything that means more work
for contributors. But that would be only one voice, and, if a majority
of the Windows Python users preferred your policy, it would be
implemented (of course, somebody contributing the resulting project
files or some automation for them would also help).

> My personal preference would go to something "make python2x.dll include only
> the modules which are really core, like sys and os". This would also provide
> guidance to future modules, as they would simply go in external modules (I
> don't think really core stuff is being added right now).


Ok, then write that into the PEP. You would have to provide a definition
for "core", e.g. "everything that is needed for startup".

As a guideline, the Unix build process currently includes only the
following modules by default:

- marshal, imp, __main__, __builtin__, sys, exceptions: Modules
living in Python/*.c
- gc, signal: invoked directly from the interpreter
- thread: not sure
- posix, errno, _sre, _codecs, so that setup.py can run
- zipimport, to avoid bootstrapping problems for importing python24.zip
- _symtable, because setup.py cannot get the dependencies right
- xxsubtype, for an undocumented reason I forgot

Regards,
Martin
 
Reply With Quote
 
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Guest
Posts: n/a
 
      08-21-2005
Giovanni Bajo wrote:
>>I'm sure Martin would be happy to consider a patch to make the build
>>system more efficient.

>
> Out of curiosity, was this ever discussed among Python developers? Would
> something like scons qualify for this? OTOH, scons opens nasty
> self-bootstrapping issues (being written itself in Python).


No. The Windows build system must be integrated with Visual Studio.
(Perhaps this is rather, "dunno: is it integrated with VS.NET 2003?")

When developing on Windows, you really want all the support you can
get from VS, e.g. when debugging, performing name completion, etc.
To me, this makes it likely that only VS project files will work.

> Before considering a patch (or even a PEP) for this, the basic requirements
> should be made clear. I know portability among several UNIX flavours is one,
> for instance. What are the others?


Clearly, the starting requirement would be that you look at the build
process *at all*. The Windows build process and the Unix build process
are completely different. Portability is desirable only for the Unix
build process; however, you might find that it already meets your needs
quite well.

Regards,
Martin
 
Reply With Quote
 
Giovanni Bajo
Guest
Posts: n/a
 
      08-21-2005
Martin v. L÷wis wrote:

>> Out of curiosity, was this ever discussed among Python developers?
>> Would something like scons qualify for this? OTOH, scons opens nasty
>> self-bootstrapping issues (being written itself in Python).

>
> No. The Windows build system must be integrated with Visual Studio.
> (Perhaps this is rather, "dunno: is it integrated with VS.NET 2003?")
> When developing on Windows, you really want all the support you can
> get from VS, e.g. when debugging, performing name completion, etc.
> To me, this makes it likely that only VS project files will work.


You seem to ignore the fact that scons can easily generate VS.NET projects. And
it does that by parsing the same file it could use to build the project
directly (by invoking your Visual Studio); and that very same file would be the
same under both Windows and UNIX.

And even if we disabled this feature and build the project directly from
command line, you could still edit your files with the Visual Studio
environment and debug them in there (since you are still compiling them with
Visual C, it's just scons invoking the compiler). You could even setup the
environment so that when you press CTRL+SHIFT+B (or F7, if you have the old
keybinding), it invokes scons and builds the project.

So, if the requirement is "integration with Visual Studio", that is not an
issue to switching to a different build process.

>> Before considering a patch (or even a PEP) for this, the basic
>> requirements should be made clear. I know portability among several
>> UNIX flavours is one, for instance. What are the others?

>
> Clearly, the starting requirement would be that you look at the build
> process *at all*.


I compiled Python several times under Windows (both 2.2.x and 2.3.x) using
Visual Studio 6, and one time under Linux. But I never investigated into it in
detail.

> The Windows build process and the Unix build process
> are completely different.


But there is no technical reason why it has to be so. I work on several
portable projects, and they use the same build process under both Windows and
Unix, while retaining full Visual Studio integration (I myself am a Visual
Studio user).

> Portability is desirable only for the Unix
> build process; however, you might find that it already meets your
> needs quite well.


Well, you came up with a maintenance problem: you told me that building more
external modules needs more effort. In a well-configured and fully-automated
build system, when you add a file you have to write its name only one time in a
project description file; if you want to build a dynamic library, you have to
add a single line. This would take care of both Windows and UNIX, both
compilation, packaging and installation.
--
Giovanni Bajo


 
Reply With Quote
 
Ron Adam
Guest
Posts: n/a
 
      08-21-2005
Martin v. L÷wis wrote:

>>Can we at least undo this unfortunate move in time for 2.5? I would be grateful
>>if *at least* the CJK codecs (which are like 1Mb big) are splitted out of
>>python25.dll. IMHO, I would prefer having *more* granularity, rather than
>>*less*.

>
> If somebody would formulate a policy (i.e. conditions under which
> modules go into python2x.dll, vs. going into separate files), I'm
> willing to implement it. This policy should best be formulated in
> a PEP.


+1 Yes, I think this needs to be addressed.

> The policy should be flexible wrt. to future changes. I.e. it should
> *not* say "do everything as in Python 2.3", because this means I
> would have to rip off the modules added after 2.3 entirely (i.e.
> not ship them at all). Instead, the policy should give clear guidance
> even for modules that are not yet developed.


Agree.

> It should be a PEP, so that people can comment. For example,
> I think I would be -1 on a policy "make python2x.dll as minimal
> as possible, containing only modules that are absolutely
> needed for startup".


Also agree, Both the minimal and maximal dll size possible are ideals
that are not the most optimal choices.

I would put the starting minimum boundary as:

1. "The minimum required to start the python interpreter with no
additional required files."

Currently python 2.4 (on windows) does not yet meet that guideline, so
it seems some modules still need to be added while other modules, (I
haven't checked which), are probably not needed to meet that guideline.

This could be extended to:

2. "The minimum required to run an agreed upon set of simple Python
programs."

I expect there may be a lot of differing opinions on just what those
minimum Python programs should be. But that is where the PEP process
comes in.


Regards,
Ron


> Regards,
> Martin


 
Reply With Quote
 
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Guest
Posts: n/a
 
      08-21-2005
Giovanni Bajo wrote:
> You seem to ignore the fact that scons can easily generate VS.NET projects.


I'm not ignoring it - I'm not aware of it. And also, I don't quite
believe it until I see it.

> But there is no technical reason why it has to be so. I work on several
> portable projects, and they use the same build process under both Windows and
> Unix, while retaining full Visual Studio integration (I myself am a Visual
> Studio user).


Well, as long "F6" works...

> Well, you came up with a maintenance problem: you told me that building more
> external modules needs more effort. In a well-configured and fully-automated
> build system, when you add a file you have to write its name only one time in a
> project description file; if you want to build a dynamic library, you have to
> add a single line. This would take care of both Windows and UNIX, both
> compilation, packaging and installation.


I very much doubt this is possible. For some modules, you also need to
create autoconf fragments on Unix, for example, and you need might need
to specify different libraries on different systems.

Regards,
Martin
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Issues with `codecs.register` and `codecs.CodecInfo` objects Karl Knechtel Python 2 07-10-2012 02:49 PM
regexp to match CJK characters Cafe Babe Ruby 8 10-30-2006 03:41 AM
CJK character and HttpRequestValidation Lau Lei Cheong ASP .Net 0 02-01-2005 09:13 AM
Processing XML files in CJK encodings gs Python 2 10-24-2004 07:26 PM
Using servlet to dowload file with CJK filename Fred Grafe Java 0 12-17-2003 07:55 PM



Advertisments