Velocity Reviews > Java > Distance normalized TSP algorithm

# Distance normalized TSP algorithm

JSH
Guest
Posts: n/a

 07-25-2008
I came up with this idea for figuring out a path with the Traveling
Salesman Problem and posted about it, and an objection that came up
was the the algorithm required the distance between each of the nodes,
while the standard problem does not require this information.

In reaction to that objection I got upset, but now after taking some
time to think and ponder the problem I found I could handle the issue
easily so that a new distance normalized algorithm could handle any
TSP setup.

The simple idea is that if there are m nodes, and there is no distance
information between nodes given, then they are to be considered to be
embedded in an m-1 dimensional space equidistant from each other.

Then the algorithm proceeds as before, where you have two travelers
T_1 and T_2, where I'll talk through a single weight, but you can have
multiple weights. With a single weight between nodes, I'll use cost
to keep the idea closer to the standard way of stating the problem,
and also so that they complete a round trip circuit, I'll say they
begin from the same node N_1.

To use the algorithm and take them to the next nodes, T_1 considers
moving to another node, which I'll say is N_2.

Then T_2 considers moving to all other nodes available, not currently
under consideration by T_1, and not already visited, so T_2 might
cost*distance from each path BACK to N_1 because T_2 is moving
BACKWARDS, while T_1 is moving forwards.

Notice that with the distance normalized algorithm you just have cost
considered as distance=1. Also if there are multiple weights you
simply multiply by each weight, so if there were a cost and a time for
each path, you'd have cost*time*distance and so forth.

T_2 considers every available move back to N_1, and selects the one
with the least value, and returns that information.

T_1 stores that info and now considers moving to N_3, and T_2 now
calculates for all available nodes again, excluding the one under
loop through all available, and again T_2 picks the lowest cost move
BACK to N_1 and returns that to T_1.

In this way, all possible moves by T_1 and T_2 are considered, and
after T_1 has gotten values back from T_2 for each possible move, T_1
selects the lowest cost move, and then BOTH T_1 and T_2 make their
moves, and T_1 moves forward to the node that fits that least path,
while T_2 moves BACKWARD to the node that does, where those will be
different nodes and there is a rule that T_1 and T_2 can never move to
the same node.

A special case occurs if there are only 3 nodes left, meaning only one
is unvisited, and in this case the algorithm arbitrarily has T_1 move
forward to that node.

Notice that when the algorithm exits you have a path taken by T_1 and
a path taken by T_2, but you collapse that into a single journey by a
single traveler moving along the path of T_1, and then continuing
forward along the path of T_2, where to conceptualize it, T_2 is the
same traveler moving backwards in time to meet himself moving
forwards.

I am prepared to step through a simple example if that is necessary
but would not mind if someone else might help out in that regard, as
I'm curious to know if ANYONE else in the world can understand this
algorithm!!!

It seems so simple to me but often it's hard to explain something new
to people so I understand that it may take some time and patience for
others to know how to do the algorithm and I apologize for getting
frustrated and angry before, versus just coming up with the normalized
algorithm earlier.

Oh yeah, I'm still angling for coders if I can get some people
interested in this thing, as I have a Google Code project called
optimalpathengine.

James Harris

Patricia Shanahan
Guest
Posts: n/a

 07-25-2008
JSH wrote:
....
> The simple idea is that if there are m nodes, and there is no distance
> information between nodes given, then they are to be considered to be
> embedded in an m-1 dimensional space equidistant from each other.

I believe that my original example is well-formulated according to the
rules you are now using. To save everyone digging back through the
depths of the previous thread, I'll repeat it here:

"Cities A, B, C, D, E. I'm assuming symmetrical costs - getting from A
to B is the same cost as B to A.

C,A is 1000. C,B is 1000. C,D is 2000. C,E is 2000. All other costs are 1."

> Then the algorithm proceeds as before, where you have two travelers
> T_1 and T_2, where I'll talk through a single weight, but you can have
> multiple weights. With a single weight between nodes, I'll use cost
> to keep the idea closer to the standard way of stating the problem,
> and also so that they complete a round trip circuit, I'll say they
> begin from the same node N_1.

You don't specify any criteria for picking the first node, so I assume
the algorithm is supposed to work regardless of that choice. Suppose
they start, in my example, at A.

>
> To use the algorithm and take them to the next nodes, T_1 considers
> moving to another node, which I'll say is N_2.
>
> Then T_2 considers moving to all other nodes available, not currently
> under consideration by T_1, and not already visited, so T_2 might
> cost*distance from each path BACK to N_1 because T_2 is moving
> BACKWARDS, while T_1 is moving forwards.
>
> Notice that with the distance normalized algorithm you just have cost
> considered as distance=1. Also if there are multiple weights you
> simply multiply by each weight, so if there were a cost and a time for
> each path, you'd have cost*time*distance and so forth.
>
> T_2 considers every available move back to N_1, and selects the one
> with the least value, and returns that information.
>
> T_1 stores that info and now considers moving to N_3, and T_2 now
> calculates for all available nodes again, excluding the one under
> loop through all available, and again T_2 picks the lowest cost move
> BACK to N_1 and returns that to T_1.

As far as I can tell, this procedure would result in T_1 and T_2 moving
from A to two distinct nodes chosen from {B, D, E}. Each of those moves
has cost 1. However, any path that contains e.g. both AB and DA would
not be able to use AC. Since C has only two cost 1000 edges, not using
one of them forces use of at least one of the cost 2000 edges. Such a
path would have cost either 3003 or 4003.

other specification, including how you choose the starting node.

An optimal path, such as ACBDEA, has total cost 2003, but that can only
be achieved if T_1 or T_2, starting at A, choses to go to C, at a cost
of 1000, rather than to B, D, or E at a cost of 1. How would either T_1
or T_2 know that a cost 1000 move is better for the overall solution
than a cost 1 move?

The core difficulty in the NP-complete problems is that making a series
of locally optimal choices cannot be depended on to lead to a globally
optimal solution.

Patricia

Joshua Cranmer
Guest
Posts: n/a

 07-25-2008
JSH wrote:
> In this way, all possible moves by T_1 and T_2 are considered, and
> after T_1 has gotten values back from T_2 for each possible move, T_1
> selects the lowest cost move, and then BOTH T_1 and T_2 make their
> moves, and T_1 moves forward to the node that fits that least path,
> while T_2 moves BACKWARD to the node that does, where those will be
> different nodes and there is a rule that T_1 and T_2 can never move to
> the same node.

As Patricia points out, this breaks down in at least one case. The case
is this (graphically put):
¢
A --- B
\$ / \ \$ / \ \$
/ \\$/ \
* --- C --- *
\$\$\$ \$\$\$

In short, if C is a node with expensive connections to all but A and B
(which are of moderate expense), while A and B both have cheap
connections to everywhere but C, and even cheaper between the two of
them, then the traveler will see get to one of A or B, find the cheapest
one to be the other one, move to it, and then move back out into the
rest of the graph, cutting off the cheapest connections to C.

The example Patricia gave violated the Triangle Inequality, but looking
at the example I have, it is very likely that it fails even if you
assume the Triangle Inequality (I haven't done any analysis as extensive
of Patricia).

This, by the way, illustrates the danger of NP-hard problems: when you
come to a choice, the "best" solution at the onset can lead you into
pitfalls. Which means that you have to take into account not only the
cost of moving from one node to another but the loss of savings in not
moving to a third one (if that makes any sense).

> A special case occurs if there are only 3 nodes left, meaning only one
> is unvisited, and in this case the algorithm arbitrarily has T_1 move
> forward to that node.

Three unvisited nodes or three nodes including the ones T_1 and T_2 are on?

> I am prepared to step through a simple example if that is necessary
> but would not mind if someone else might help out in that regard, as
> I'm curious to know if ANYONE else in the world can understand this
> algorithm!!!

One exclamation point suffices. I understand it sufficiently to reason
on it, and I could probably code it up reasonably fast if I cared to.
It's still ill-formed: you don't mention what to do in case of ties.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

JSH
Guest
Posts: n/a

 07-25-2008
On Jul 24, 9:05*pm, Patricia Shanahan <(E-Mail Removed)> wrote:
> JSH wrote:
>
> ...
>
> > The simple idea is that if there are m nodes, and there is no distance
> > information between nodes given, then they are to be considered to be
> > embedded in an m-1 dimensional space equidistant from each other.

>
> I believe that my original example is well-formulated according to the
> rules you are now using. To save everyone digging back through the
> depths of the previous thread, I'll repeat it here:

Thanks!

> "Cities A, B, C, D, E. I'm assuming symmetrical costs - getting from A
> to B is the same cost as B to A.
>
> C,A is 1000. C,B is 1000. C,D is 2000. C,E is 2000. All other costs are 1.."
>
> > Then the algorithm proceeds as before, where you have two travelers
> > T_1 and T_2, where I'll talk through a single weight, but you can have
> > multiple weights. *With a single weight between nodes, I'll use cost
> > to keep the idea closer to the standard way of stating the problem,
> > and also so that they complete a round trip circuit, I'll say they
> > begin from the same node N_1.

>
> You don't specify any criteria for picking the first node, so I assume
> the algorithm is supposed to work regardless of that choice. Suppose
> they start, in my example, at A.

Good catch. That is just a miss on my part, but it's easily fixable,
by saying that a path is calculated for each possible starting node in
turn. The algorithm is still polynomial time with that change.

As that's fairly easy I'd like to focus first on getting a full path
from an arbitrary start at just any node, say, chosen randomly.

>
> > To use the algorithm and take them to the next nodes, T_1 considers
> > moving to another node, which I'll say is N_2.

>
> > Then T_2 considers moving to all other nodes available, not currently
> > under consideration by T_1, and not already visited, so T_2 might
> > start with N_3 and proceed from there, and calculates the
> > cost*distance from each path BACK to N_1 because T_2 is moving
> > BACKWARDS, while T_1 is moving forwards.

I realized an omission in my primary algorithm, as I simply forget
about the cost for T_1 from N_1 to the node being considered, which is
kind of an odd thing to forget.

The solution and correction to the algorithm is that cost is
multiplied as well.

So if cost from N_1 to N_2 is cost_1 for T_1 and cost from N_1 to N_3
is cost_2 for T_2, then the calculation is cost_1*cost_2*distance,
where here distance is 1, so it is cost_1*cost_2.

>
> > Notice that with the distance normalized algorithm you just have cost
> > considered as distance=1. *Also if there are multiple weights you
> > simply multiply by each weight, so if there were a cost and a time for
> > each path, you'd have cost*time*distance and so forth.

With the correction you'd have two weight values for each, so

cost_1*cost_2*time_1*time_2*distance

> > T_2 considers every available move back to N_1, and selects the one
> > with the least value, and returns that information.

If more than one path has the least value then you randomly pick.

> > T_1 stores that info and now considers moving to N_3, and T_2 now
> > calculates for all available nodes again, excluding the one under
> > loop through all available, and again T_2 picks the lowest cost move
> > BACK to N_1 and returns that to T_1.

>
> As far as I can tell, this procedure would result in T_1 and T_2 moving
> from A to two distinct nodes chosen from {B, D, E}. Each of those moves
> has cost 1. However, any path that contains e.g. both AB and DA would
> not be able to use AC. Since C has only two cost 1000 edges, not using
> one of them forces use of at least one of the cost 2000 edges. Such a
> path would have cost either 3003 or 4003.

Oh, that's not good. I'll need to consider this carefully later.
Kind of in a rush this morning...

> other specification, including how you choose the starting node.
>
> An optimal path, such as ACBDEA, has total cost 2003, but that can only
> be achieved if T_1 or T_2, starting at A, choses to go to C, at a cost
> of 1000, rather than to B, D, or E at a cost of 1. How would either T_1
> or T_2 know that a cost 1000 move is better for the overall solution
> than a cost 1 move?
>
> The core difficulty in the NP-complete problems is that making a series
> of locally optimal choices cannot be depended on to lead to a globally
> optimal solution.
>
> Patricia

Thanks for the reply! I'll have to consider it in detail later as I'm
in a rush this morning, but wanted to add the correction.

If you are correct then that kills the usefulness of the idea for TSP,
but doesn't mean I won't still program the algorithm out of curiosity.

James Harris

Joshua Cranmer
Guest
Posts: n/a

 07-25-2008
JSH wrote:
> Good catch. That is just a miss on my part, but it's easily fixable,
> by saying that a path is calculated for each possible starting node in
> turn. The algorithm is still polynomial time with that change.

Keeping score, that change should make it O(N^4):
for each starting node
for each node in path (actually N/2, but ignoring constants)
for each possible next node for T_1
for each possible previous node for T_2

> So if cost from N_1 to N_2 is cost_1 for T_1 and cost from N_1 to N_3
> is cost_2 for T_2, then the calculation is cost_1*cost_2*distance,
> where here distance is 1, so it is cost_1*cost_2.

them together?

> If more than one path has the least value then you randomly pick.

since (as I explained again other there) you lose something by moving to
a node, unlike other working greedy algorithms where "moving" to a node
is only expanding a search tree.

Taking this time to expound on what I think the flaw in the algorithm
is: you're ignoring the opportunity cost of making a move.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Christian
Guest
Posts: n/a

 07-25-2008
JSH schrieb:
> I came up with this idea for figuring out a path with the Traveling
> Salesman Problem and posted about it, and an objection that came up
> was the the algorithm required the distance between each of the nodes,
> while the standard problem does not require this information.
>
> In reaction to that objection I got upset, but now after taking some
> time to think and ponder the problem I found I could handle the issue
> easily so that a new distance normalized algorithm could handle any
> TSP setup.
>
> The simple idea is that if there are m nodes, and there is no distance
> information between nodes given, then they are to be considered to be
> embedded in an m-1 dimensional space equidistant from each other.

Don't do this ... its just getting more complicated..
Just take Euklidean TSP ...
Embed you m nodes in a 2 dimensional plane ..

You get your distances as weights and everything is fine ...
The Euclidean TSP is still NP hard. So it will solve normal TSP if you
can solve it.

I know people said you would have to create this data yourself in the
previous thread... but that was only in general correct not in
particular for this problem... there is no need to as Euclidean TSP has
been prooven to be NP hard 30 years ago .. (also some good
approximations exist to solve it)

Christian

Patricia Shanahan
Guest
Posts: n/a

 07-25-2008
JSH wrote:
....
> As that's fairly easy I'd like to focus first on getting a full path
> from an arbitrary start at just any node, say, chosen randomly.

....

Unless you are aiming for a probabilistic algorithm, which may get
different results for the same problem on different runs, don't pick the
start node at random. Make up some rule, no matter how arbitrary.

The one I used was "First vertex mentioned in the problem description".

Patricia

JSH
Guest
Posts: n/a

 07-26-2008
On Jul 25, 8:21*am, Christian <(E-Mail Removed)> wrote:
> JSH schrieb:
>
> > I came up with this idea for figuring out a path with the Traveling
> > Salesman Problem and posted about it, and an objection that came up
> > was the the algorithm required the distance between each of the nodes,
> > while the standard problem does not require this information.

>
> > In reaction to that objection I got upset, but now after taking some
> > time to think and ponder the problem I found I could handle the issue
> > easily so that a new distance normalized algorithm could handle any
> > TSP setup.

>
> > The simple idea is that if there are m nodes, and there is no distance
> > information between nodes given, then they are to be considered to be
> > embedded in an m-1 dimensional space equidistant from each other.

>
> Don't do this ... its just getting more complicated..
> Just take Euklidean TSP ...
> Embed you m nodes in a 2 dimensional plane ..
>
> You get your distances as weights and everything is fine ...
> The Euclidean TSP is still NP hard. So it will solve normal TSP if you
> can solve it.

That's what I thought to do at first as well, which was last week, but
after pondering the problem for some days I decided that an equal
weight for distance along all nodes was preferable, if no distance
between nodes is given as then one can assume that distance is equally
weighted or wouldn't it be included?

And besides it's easy.

The m-1 dimensional space provides an easy solution to the issue, and
equal weights distance.

> I know people said you would have to create this data yourself in the
> previous thread... but that was only in general correct not in
> particular for this problem... there is no need to as Euclidean TSP has
> been prooven to be NP hard 30 years ago .. (also some good
> approximations exist to solve it)
>
> Christian

I think it's more fun now to go after the general problem, though the
concept I'm using is easier I think to understand with the Euclidean,
as distance is so key to it, so I'll babble on a bit here on that
issue though now I'll get more speculative.

I think it's provable to solve the Euclidean TSP by considering in a 2-
D space a ring going outward from a start of just two nodes with the
forward traveler on one and the backwards traveler on another, where
these nodes are well behaved in that you can expand a circle out and
get only 4 new nodes at a time, and the weight is the distance, and
then consider the decision points of picking the best path, where also
the best path tends to go in one direction--so I'm cheating a bit to
make it easier to conceptualize.

Then the idea I have is like chewing at the problem from both ends, as
each traveler gets two nodes at a time and there must be a best path
between those two, so you move the travelers out and expand your
circle to get 4 more nodes and do so until you have all nodes. Key
here is that one traveler is moving forward in time while the other is
moving backwards so at the end you collapse your solution to one
traveler moving forward from the starting point to the end.

AT each decision point you picked the shortest so the final full path
is the shortest one.

Lots of problems with the specifics of that thought experiment as it's
very well behaved, but maybe it can give an idea of why using two
travelers--one going forward in time while the other goes backwards in
time--is so key of an idea!

Without that, if you have one traveler in the middle of your graph on
one node and expand your circle to have just two more nodes, how do
you get to an end? How do you get to your beginning?

My idea allows you to start in the middle, and work your way out from
BOTH sides.

James Harris

JSH
Guest
Posts: n/a

 07-26-2008
On Jul 25, 12:40*pm, Patricia Shanahan <(E-Mail Removed)> wrote:
> JSH wrote:
>
> ...> As that's fairly easy I'd like to focus first on getting a full path
> > from an arbitrary start at just any node, say, chosen randomly.

>
> ...
>
> Unless you are aiming for a probabilistic algorithm, which may get
> different results for the same problem on different runs, don't pick the
> start node at random. Make up some rule, no matter how arbitrary.

Oh, the thing is you made an excellent point of where to start, and
gave a great example which forced me to consider how a given start
could break down with the basic algorithm, which caused me to consider
a solution to that problem.

That solution is to try EACH node in turn as a starting point, and
yeah, you could do it randomly.

I wasn't exactly to that solution this morning but realized it a
little bit later on in the day.

> The one I used was "First vertex mentioned in the problem description".
>
> Patricia

Your example revealed though that the problem with an approach from a
particular node was that there could be kind of a trap hidden far back
with a steep cost, so the path from THAT node wouldn't be so great
using my algorithm.

But what about starting at another node?

So I made the leap to just starting at each node in turn, using my
algorithm to get a full path, and then comparing between to pick the
best one, which would solve your simple example, but would it work on
a bigger one?

I think it would, as I'm conjecturing now that you cannot setup a case
where from EACH node the algorithm will fail to give the optimal path,
where for a graph where the costs match perfectly with distance it
would give the optimal path from every node and I've designated such a
graph, a perfect one.

So now I'm moving to inventing terminology.

I call a graph where costs match well with distance a well correlated
graph, and notice your example is not one, as distances are all units,
so I found a way to get my distance argument back in, with
correlation.

So in my terminology your example is a dis-correlated graph, where the
ratio of starting points that give the optimal solution to points that
do not gives the degree of correlation.

James Harris

Patricia Shanahan
Guest
Posts: n/a

 07-26-2008
JSH wrote:
....
> AT each decision point you picked the shortest so the final full path
> is the shortest one.

....

This is the key difference between the NP-complete problems and problems
with known polynomial time algorithms.

There are a lot of problems that do have the property that the globally
optimal solution must be a composition of a series of locally optimal
solutions. For example, if the shortest path from X to Y includes node
A, the path from X to A must be one of the shortest X to A paths.
Polynomial time algorithms for shortest path calculation can use that
fact to prune the search.

Similarly, any subset of elements in a sorted list appear in sorted
order. Sort algorithms such as quick sort and merge sort use that fact.

In my favorite TSP example, an optimal solution must go between A and B
via C, even though that route is far more expensive than the AB edge,
and uses the most expensive edge connected to each of A and B. It has
to be done because it is the least bad way of visiting C.

A globally optimal TSP solution often requires choices that are not
locally optimal.

> Lots of problems with the specifics of that thought experiment as it's
> very well behaved, but maybe it can give an idea of why using two
> travelers--one going forward in time while the other goes backwards in
> time--is so key of an idea!

Many NP-complete problems, including TSP, have easy, well behaved cases
that can be solved in polynomial time using locally optimal decisions. I
don't think you should spend much time on those cases when designing a
general algorithm. In general, an algorithm design that is not based on
thinking about difficult cases of the problem it is supposed to solve
probably won't work for them.

The problem I have with understanding the utility of the two travelers
is seeing how they influence each other in anything other than tiny
problems. If you have e.g. 10,000 nodes, for the first 50 iterations
each traveler bars less than 1% of the choices the other traveler could
least one globally suboptimal choice long before the time the number of
unvisited nodes is small enough that each traveler significantly
constrains the other.

At a more basic level, you seem to be running a simple greedy algorithm
for each traveler, and I don't see anything to backtrack, look ahead, or
otherwise ensure that locally suboptimal choices win if they are needed
for the globally optimal solution.

Patricia