Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Methods for understanding complex, real world, C++ code?

Reply
Thread Tools

Methods for understanding complex, real world, C++ code?

 
 
mathog
Guest
Posts: n/a
 
      04-10-2012
Mostly when I program it is maintenance work - making little tweaks and
corrections to other people's code. The C++ books and tutorials are all
clear and wonderful about objects and methods but they only ever give
toy examples. In real world programs it seems that the code is always a
morass of object types and methods, often with similar names.
(Resulting in: "yes, 'print', but which 'print'?"). I find it extremely
difficult to find in such code where actions actually occur, or in many
cases what earlier events led to an issue later in the program.

Presumably there are tools to help with this that I should be using but
am not. What are these tools? If somebody knows of a tutorial or
reference on how to deal with complex code like this, please share it.

For instance, lately I have had to make some changes to Inkscape. The
development environment is Mingw. This program uses Cairo, GDK, GTK,
Pango, and heaven knows what else, in addition to all of its own code.
The closest thing I have to a class browser is the doxygen web
interface, example:

http://fossies.org/dox/inkscape-0.48...pp_source.html

One of the tasks on my "to do" list is to add import/export of images to
the EMF extension. I found the relevant sections in the EMF extension
(switch() cases provided, but no code in them) by a grep through all of
the source code for the EMF keyword function associated with images.
Looking around for an example of how to handle images in Inkscape
located the link above by more grep's, since I knew one could paste an
image into the program from the clipboard. From this link, on line 937
we see that rather than converting directly from GDK:ixbuf to an
Inkscape object type they punted. First they save from the Pixbuf to a
PNG file, and then they use a preexisting file_import function to read
that object back in. The program pastes at the cursor (or something like
"at the cursor") but how it knows where that is lies in a code segment
far far away. Anyway, this illustrates the point I'm trying to make.
One would imagine that somewhere in this heap of code there is a
"create/import image of size W,H at position X,Y", but good luck finding it.

Figuring out the logic in real C++ programs is also challenging. In
Inkscape, for instance, much of the program is event driven, so a
backtrace will often not tell you what happened before a given execution
point. Example:

A ->
B ->
C (configure: event X will cause E) ->
D (configure: something else relevant when E runs) ->
C ->
B (event X) ->
E (stops at preset breakpoint)

So the back trace shows

A->B->E

and there is no clue that C and D are important, or even that they ever
ran. This A->E example is grossly simplified, since in the real program
there were thousands of function calls before event X. Short of tracing
every function call how would one ever know that C,D need to be looked at?

Thank you,

David Mathog

 
Reply With Quote
 
 
 
 
K. Frank
Guest
Posts: n/a
 
      04-10-2012
Hi David!

On Apr 10, 11:54*am, mathog <(E-Mail Removed)> wrote:
> Mostly when I program it is maintenance work - making little tweaks and
> corrections to other people's code. *The C++ books and tutorials are all
> clear and wonderful about objects and methods but they only ever give
> toy examples. *In real world programs it seems that the code is always a
> morass of object types and methods, often with similar names.
> (Resulting in: "yes, 'print', but which 'print'?"). I find it extremely
> difficult to find in such code where actions actually occur, or in many
> cases what earlier events led to an issue later in the program.


First off, "We Share Your Pain!"

By the way, you should be aware of the Microsoft WSYP program
for improving user experience:

http://www.youtube.com/watch?v=3dF-POFE30E


> Presumably there are tools to help with this that I should be using but
> am not. What are these tools? *If somebody knows of a tutorial or
> reference on how to deal with complex code like this, please share it.


A couple of comments:

Much real-world code isn't that good. It may be that the
programmers who wrote it weren't that good. It may be that
the programmers were great, but the code grew and evolved
over time, obscuring what was originally a good design, and
that the (possibly rational) decision was made not to refactor
the code base and clean up the current design.

Also, good real-world code is often (very) complicated. Many
real-world problems are inherently complicated. So the art
of good programming is not to eliminate complexity (you can't),
but to master that complexity in as organized a way as you can.

Further, as you point out, many programming books aren't realistic.
It's legitimate to use toy examples -- otherwise the books would
become unreadably long -- but authors often leave out (purposely?)
real-world complexity that their favorite design methodology
doesn't handle well.

So ... Welcome to the real world.

How to deal with this? I don't have a good or simple answer.

What I do is try to get an overview of the code -- or, better,
the part of the code that is relevant to what I am doing. I
try to get a feel for its "shape," for lack of a better word.
Then I rely on intuition to zero in on where the action (relevant
to my problem) is.

(I know, I know ... This is hardly useful advice.)

Only at this point is it practical to look at the details.
(One can "look at details" by reading the code, running a
debugger, putting in print statements, or otherwise adding
instrumentation, according to one's taste. My preference
is to use a combination of reading the code and adding print
statements, but the exact technique doesn't really matter.)

It's hard, it's challenging, and I can't give you a detailed
recipe for how to do it, because for me, there's a lot of
intuition involved.

In my experience it's a rare talent for a developer to be able
to work effectively with a large, unfamiliar code base.

I liken it to the Radar O'Reilly character in the M.A.S.H.
story (movie, TV, etc.). The joke with him is that he'd
look up and say "Choppers." and then only after a few minutes
had passed would the other characters hear the choppers
flying in to deliver the wounded.

In my experience there is a minority of developers who, when
looking for a bug in "Other People's Code," have -- like
Radar O'Reilly -- some sort of sixth sense for where the
bodies are buried. And they are worth their weight in gold.

The only concrete advice I can give you is to gain (a lot
of) experience in working with large, unfamiliar code bases.
The more you see various chunks of code written by programmers
with differing styles and levels of talent, the more easily
you'll be able to recognize at a higher level what they're
trying to do, before drilling down into the details.

To give an overly simplistic "toy" example, when I look at
code I can say "Oh, this guy's using a bunch of nested if
statements." or "This guy's using a switch statement." or
"This guy's using virtual functions in a bunch of derived
classes." or "This guy's setting up a look-up table." all
to accomplish the same programming task. In this context
it doesn't matter whether a specific technique is "right"
or "wrong" or 'better" or 'worse," so it's not worth
arguing about. In practice you will see all manner of
code, and you need to be able to recognize what the guy
is trying to do whether or not he is doing it "right."

> For instance, lately I have had to make some changes to Inkscape...
> ...
> Figuring out the logic in real C++ programs is also challenging. *In
> Inkscape, for instance, much of the program is event driven, so a
> backtrace will often not tell you what happened before a given execution
> point. *Example:
>
> A ->
> B ->
> C (configure: event X will cause E) ->
> D (configure: something else relevant when E runs) ->
> C ->
> B (event X) ->
> E (stops at preset breakpoint)
>
> So the back trace shows
>
> A->B->E
>
> and there is no clue that C and D are important, or even that they ever
> ran.


Yes, you're absolutely right about this. Working with
event-driven (sometimes called "reactive") programming
is especially hard. Trying to understand "Other People's
Code" becomes even more difficult because, as you point
out, traditional procedural techniques such as reading
one line of code after another or stepping through
execution with a debugger don't map well to the actual
event-driven logic. At some point you have to hope for
the good fortune that the original programmer approached
his event-driven design in a thoughtful and well-organized
manner.

When I _write_ event-driven code, I tend to instrument it
with print statements that include a tag that indicates
which _logical_ process a particular step belongs to. So
in your example, although the backtrace would show:

A->B->E

my print-statement log file would show:

Process-Q: A
Process-Q: B
Process-Q: C: handle event X; schedule E
Process-Q: D: configure some property of E
Process-Q: E: (stops at some breakpoint)

Sometimes you can retroactively instrument existing code along
these lines, but often it's not practical.

> This A->E example is grossly simplified, since in the real program
> there were thousands of function calls before event X. *Short of tracing
> every function call how would one ever know that C,D need to be looked at?


Very hard. You need to develop that Radar O'Reilly sixth sense,
or you need to (partially) instrument the code along the lines
described above. And if the code isn't naturally organized
into logical sequences of event processing, it can get pretty
nasty.

> Thank you,


I apologize that I haven't offered any particularly good
recipe for tackling these issues. I would love to hear
what other folks think, and what procedures and tools they
use for these kinds of challenges.

The problem is that when dealing with "Other People's Code,"
you have to work with the code as it is, rather than as you
would wish it to be.

> David Mathog



Good Luck ... And Happy (OPC) Hacking!


K. Frank
 
Reply With Quote
 
 
 
 
Ian Collins
Guest
Posts: n/a
 
      04-10-2012
On 04/11/12 03:54 AM, mathog wrote:
> Mostly when I program it is maintenance work - making little tweaks and
> corrections to other people's code. The C++ books and tutorials are all
> clear and wonderful about objects and methods but they only ever give
> toy examples. In real world programs it seems that the code is always a
> morass of object types and methods, often with similar names.
> (Resulting in: "yes, 'print', but which 'print'?"). I find it extremely
> difficult to find in such code where actions actually occur, or in many
> cases what earlier events led to an issue later in the program.
>
> Presumably there are tools to help with this that I should be using but
> am not. What are these tools? If somebody knows of a tutorial or
> reference on how to deal with complex code like this, please share it.


I use Oracle's version of NetBeans (Solaris studio) which has pretty
good code browsing capabilities. Standard NetBeans or Eclipse should do
much the same.

--
Ian Collins
 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      04-10-2012
On Tue, 2012-04-10, mathog wrote:
> Mostly when I program it is maintenance work - making little tweaks and
> corrections to other people's code. The C++ books and tutorials are all
> clear and wonderful about objects and methods but they only ever give
> toy examples. In real world programs it seems that the code is always a
> morass of object types and methods, often with similar names.
> (Resulting in: "yes, 'print', but which 'print'?"). I find it extremely
> difficult to find in such code where actions actually occur, or in many
> cases what earlier events led to an issue later in the program.
>
> Presumably there are tools to help with this that I should be using but
> am not. What are these tools? If somebody knows of a tutorial or
> reference on how to deal with complex code like this, please share it.


I don't have specialized tools, but here are some I actually use. They
may be Unix-specific.

- Emacs with 'exuberant ctags' in C++ mode. Still won't work well
for looking up overloaded names though

- Doxygen with full graph generation enabled.

- A gprof profiling run of the code, compiled with inlining disabled.
Not for the profiling but for showing the main code flows.

- The nm(1) symbol lister to get a rough idea what one source file
contains and what it depends on.

- Pen and paper for reconstructing class diagrams, state machines etc.

- Valgrind for detecting obvious memory handling bugs.

- Changing the code (making things private or const; changing their
type and so on) just to see what stops compiling. Works best if your
Makefile isn't broken, so the right things are rebuilt automatically.

BTW, I think most of these problems are not related to C++. Most are
there in C too. (Exception: inheritance. I really hate debugging messy
code where everything is badly designed run-time polymorphism and
nothing is documented.)

And, like K. Frank I share your pain. Remember that in maintenance
programming, at least you're creating stuff which people *really need*
and are asking for. New code on the other hand is often not used in
the end.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
michael.boehnisch@gmail.com
Guest
Posts: n/a
 
      04-11-2012
On Tuesday, April 10, 2012 5:54:39 PM UTC+2, mathog wrote:
> In real world programs it seems that the code is always a
> morass of object types and methods, often with similar names.


I consider real world code that does not come with sufficient documentationto have a value close to zero. Often its less work to rewrite the stuff from scratch.

In cases where this is not an option and pressure to bother is gross, I'd recommend to reverse engineer an UML model for the code, i.e. start with a static class model depicting interfaces, inheritance and aggregation, and continue by filling in usage information who calls what else. It may also possible to group classes to components.

Once this is done, you can analyze method by method and add diagrams for the code's dynamics (sequence-, activity-, state diagrams). Whatever you findout you should also add as extra comments to the code.

If you're lucky, you end up with a usable code documentation. Many times you won't be lucky, in spite of the systematic approach.

Doxygen in combination with GraphViz is a good start if you are looking fortools. It will spare you of lots of manual work for the analysis. For modelling UML I prefer Sparxsystem's Enterprise Architect. It is a commercial product but relatively cheap compared to other tools. For reengineering purposes you'd need the "Professional" edition or better; its code-to-model import features should come handy.

best,

MiB.

 
Reply With Quote
 
K. Frank
Guest
Posts: n/a
 
      04-11-2012
Hello Jorgen!

On Apr 10, 6:26*pm, Jorgen Grahn <(E-Mail Removed)> wrote:
> On Tue, 2012-04-10, mathog wrote:
> > ...
> > I find it extremely
> > difficult to find in such code where actions actually occur, or in many
> > cases what earlier events led to an issue later in the program.

>
> > Presumably there are tools to help with this that I should be using but
> > am not. What are these tools? *If somebody knows of a tutorial or
> > reference on how to deal with complex code like this, please share it.

>
> I don't have specialized tools, but here are some I actually use. They
> may be Unix-specific.
>
> - Emacs with 'exuberant ctags' in C++ mode. Still won't work well
> * for looking up overloaded names though


I'm a big fan of using ctags with emacs, although some people find
emacs to be an acquired taste.

> ...
> - Changing the code (making things private or const; changing their
> * type and so on) just to see what stops compiling. Works best if your
> * Makefile isn't broken, so the right things are rebuilt automatically.


I've only ever done this by accident, but I like this idea as a
systematic
technique. Maybe I'll add it to my bag of tricks.

> ...
> Remember that in maintenance
> programming, at least you're creating stuff which people *really need*
> and are asking for. *New code on the other hand is often not used in
> the end.


Hear, hear! Sometimes confusing code started out bad, but lots
of times it starts out good, and the bloat and convolution and
bit-rot you're struggling with came about because the code was
useful and being used and growing and gaining new features
because people wanted it. When you find yourself working on a
multi-man-decade code base like this, it's actually kind of
cool (if frustrating).

> /Jorgen



Thanks for your thoughts and suggestions.


K. Frank
 
Reply With Quote
 
nick_keighley_nospam@hotmail.com
Guest
Posts: n/a
 
      04-11-2012
On Wednesday, April 11, 2012 12:31:03 AM UTC+1, (unknown) wrote:
> 
> Re: &quot;Methods for understanding complex, real world, C++&quot;
>
> Rewrite it. That's what I do.
> </pre>


my boss tends not to give me the time to re-write 750 KLOC when a one line change is required. Get real.
 
Reply With Quote
 
Rui Maciel
Guest
Posts: n/a
 
      04-11-2012
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> On Tuesday, April 10, 2012 5:54:39 PM UTC+2, mathog wrote:
>> In real world programs it seems that the code is always a
>> morass of object types and methods, often with similar names.

>
> I consider real world code that does not come with sufficient
> documentation to have a value close to zero. Often its less work to
> rewrite the stuff from scratch.


This approach doesn't appear to be very reasonable. If a piece of code
which may not be documented is already mature then your suggestion to
reinvent the wheel may end up needlessly reintroducing bugs and other
issues. So, just because you can't access the documentation of a piece of
code you risk ending up needlessly wasting resources to make something worse
than it already is. And where's the added value in this?


Rui Maciel

 
Reply With Quote
 
Rui Maciel
Guest
Posts: n/a
 
      04-11-2012
(E-Mail Removed) wrote:

> my boss tends not to give me the time to re-write 750 KLOC when a one line
> change is required. Get real.


Indeed.

I wonder if these proponents of this type of scorched earth approach to
documentation also believe it is a good idea to demolish their house and
rebuild it if they can't find its blueprints.


Rui Maciel
 
Reply With Quote
 
nick_keighley_nospam@hotmail.com
Guest
Posts: n/a
 
      04-11-2012
On Wednesday, April 11, 2012 10:06:19 AM UTC+1, Rui Maciel wrote:
> (E-Mail Removed) wrote:


> > my boss tends not to give me the time to re-write 750 KLOC when a one line
> > change is required. Get real.

>
> Indeed.
>
> I wonder if these proponents of this type of scorched earth approach to
> documentation also believe it is a good idea to demolish their house and
> rebuild it if they can't find its blueprints.


I'm guessing they're thinking of someting "really large" they've encountered which they could re-write from scratch in 48 hours if they drank enough Jolt.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there a way to find the class methods of a class, just like'methods' finds the instance methods? Kenneth McDonald Ruby 5 09-26-2008 03:09 PM
understanding inheritance and class singleton methods Ittay Dror Ruby 6 09-24-2008 12:51 PM
Article: understanding Ruby blocks, Procs and methods Eli Bendersky Ruby 0 04-18-2006 06:36 AM
Synchronized methods - correct understanding? Ian Pilcher Java 17 11-15-2005 09:11 PM
product of real and (integer)(after converted to real one) value - vhdl found fatal error senthil VHDL 5 01-24-2004 04:37 AM



Advertisments