In article <(E-Mail Removed)>,

(E-Mail Removed)
says...

> Gerald Rosenberg wrote:

>

> > Have not been able to Google very well for an answer, since I haven't a

> > usable name for the algorithm/type of problem.

> >

> > In sum, I need to determine the least common denominator for the spacing

> > of a one dimensional array of integers where the integers have a noise

> > component.

>

> Could you state the problem more clearly? Do you need a single LCD for a set

> of integers? Do you need to find the most frequently occurring values in a

> set?

>

> > In practical terms, I have the Y-axis pixel locations of lines of text

> > on a page (which are approximations) and need to determine whether any

> > two adjacent text lines are single spaced, 1.5 spaced, or multiple

> > spaced.

>

> Create a histogram of all the values and examine them yourself for patterns,
Interesting. Will look into that. Thanks.

> then decide on an appropriate strategy to achieve what you are trying to

> accomplish, which you don't bother to say.
Did "need to determine whether any two adjacent text lines are single

spaced, 1.5 spaced, or multiple spaced" not relate what I am trying to

accomplish?

>

> Another poster has recommended a fourier transform, but I think this is

> overkill. A histogram approach will work for any case except many integers

> with little in common with each other. I don't think this is what you face.

>

> >

> > Seems like there should be an analytic solution, but auto-correlation

> > doesn't seem right. Some kind of quantized best-fit?

>

> Why not state the problem to be solved before hypothesizing about a

> solution?
Sure: In practical terms, I have the Y-axis pixel locations of lines of

text on a page (which are approximations) and need to determine whether

any two adjacent text lines are single spaced, 1.5 spaced, or multiple

spaced.

> >

> > Rather than continuing to guess, does anyone know the name of the

> > algorithm for solving this type of problem.

>

> What type of problem is that? You have only discussed one aspect of the data

> set, and you haven't stated a problem to be solved at all.

>
OK. World peace through analysis of existing imaged document

collections.

Documents are imaged, OCR'd, and PDF'd. The PDF is a

given. Now I need to figure out the document structure from an analysis

of the PDF command and data stream.

A big problem, much of it solved. Now I am just tackling a very

specific aspect where I "have the Y-axis pixel [baseline] locations of

lines of text on a page (which are approximations [I.e., contain a noise

component]) and need to determine whether any two adjacent text lines

are single spaced, 1.5 spaced, or multiple spaced."

No doubt in the relm of mathematics (at least I expect) people have

investigated this class of problem and have proposed generalized

algorithms to solve it. Could not guess the name or a functional

description well enough to find it by Google. Thought that the good

folk here at cljp, in their acknowledged wide ranging knowledge of all

things algorithmic, might know a name for this class of problem, or

provide a pointer to suitable algorithms.