Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Is there a known algorithm for this?

Reply
Thread Tools

Is there a known algorithm for this?

 
 
Gerald Rosenberg
Guest
Posts: n/a
 
      09-19-2004
Have not been able to Google very well for an answer, since I haven't a
usable name for the algorithm/type of problem.

In sum, I need to determine the least common denominator for the spacing
of a one dimensional array of integers where the integers have a noise
component.

In practical terms, I have the Y-axis pixel locations of lines of text
on a page (which are approximations) and need to determine whether any
two adjacent text lines are single spaced, 1.5 spaced, or multiple
spaced.

Seems like there should be an analytic solution, but auto-correlation
doesn't seem right. Some kind of quantized best-fit?

Rather than continuing to guess, does anyone know the name of the
algorithm for solving this type of problem. Is there a Java package
that can solve this kind of problem? I have looked at Colt, but it does
not provide a direct solution.

Thanks,
Gerald
 
Reply With Quote
 
 
 
 
Niels Ull HarremoŽs
Guest
Posts: n/a
 
      09-19-2004

"Gerald Rosenberg" <(E-Mail Removed)> skrev i en meddelelse
news:(E-Mail Removed) t...
> Have not been able to Google very well for an answer, since I haven't a
> usable name for the algorithm/type of problem.
>
> In sum, I need to determine the least common denominator for the spacing
> of a one dimensional array of integers where the integers have a noise
> component.
>
> In practical terms, I have the Y-axis pixel locations of lines of text
> on a page (which are approximations) and need to determine whether any
> two adjacent text lines are single spaced, 1.5 spaced, or multiple
> spaced.
>
> Seems like there should be an analytic solution, but auto-correlation
> doesn't seem right. Some kind of quantized best-fit?


Try doing a one-dimensional fourier transformation look for the low
frequency components?


 
Reply With Quote
 
 
 
 
Thomas G. Marshall
Guest
Posts: n/a
 
      09-19-2004
Gerald Rosenberg coughed up:
> Have not been able to Google very well for an answer, since I haven't
> a usable name for the algorithm/type of problem.
>
> In sum, I need to determine the least common denominator for the
> spacing of a one dimensional array of integers where the integers
> have a noise component.
>
> In practical terms, I have the Y-axis pixel locations of lines of text
> on a page (which are approximations) and need to determine whether any
> two adjacent text lines are single spaced, 1.5 spaced, or multiple
> spaced.
>
> Seems like there should be an analytic solution, but auto-correlation
> doesn't seem right. Some kind of quantized best-fit?
>
> Rather than continuing to guess, does anyone know the name of the
> algorithm for solving this type of problem. Is there a Java package
> that can solve this kind of problem? I have looked at Colt, but it
> does not provide a direct solution.
>
> Thanks,
> Gerald


You should try this post in comp.programming, if you want the algorithmic
help, sans java-specific experience. Many there are java guys, but many are
not, but they're there to help with algorithms.

--
Everythinginlifeisrealative.Apingpongballseemssmal luntilsomeoneramsitupyourn
ose.


 
Reply With Quote
 
Paul Lutus
Guest
Posts: n/a
 
      09-19-2004
Gerald Rosenberg wrote:

> Have not been able to Google very well for an answer, since I haven't a
> usable name for the algorithm/type of problem.
>
> In sum, I need to determine the least common denominator for the spacing
> of a one dimensional array of integers where the integers have a noise
> component.


Could you state the problem more clearly? Do you need a single LCD for a set
of integers? Do you need to find the most frequently occurring values in a
set?

> In practical terms, I have the Y-axis pixel locations of lines of text
> on a page (which are approximations) and need to determine whether any
> two adjacent text lines are single spaced, 1.5 spaced, or multiple
> spaced.


Create a histogram of all the values and examine them yourself for patterns,
then decide on an appropriate strategy to achieve what you are trying to
accomplish, which you don't bother to say.

Another poster has recommended a fourier transform, but I think this is
overkill. A histogram approach will work for any case except many integers
with little in common with each other. I don't think this is what you face.

>
> Seems like there should be an analytic solution, but auto-correlation
> doesn't seem right. Some kind of quantized best-fit?


Whyt not state the problem to be solved before hypothesizing about a
solution?

>
> Rather than continuing to guess, does anyone know the name of the
> algorithm for solving this type of problem.


What type of problem is that? You have only discussed one aspect of the data
set, and you haven't stated a problem to be solved at all.

--
Paul Lutus
http://www.arachnoid.com

 
Reply With Quote
 
Gerald Rosenberg
Guest
Posts: n/a
 
      09-19-2004
In article <ZWh3d.5096$%42.1041@trndny08>,
(E-Mail Removed) om says...
> Gerald Rosenberg coughed up:
> > Have not been able to Google very well for an answer, since I haven't
> > a usable name for the algorithm/type of problem.
> >
> > In sum, I need to determine the least common denominator for the
> > spacing of a one dimensional array of integers where the integers
> > have a noise component.
> >
> > In practical terms, I have the Y-axis pixel locations of lines of text
> > on a page (which are approximations) and need to determine whether any
> > two adjacent text lines are single spaced, 1.5 spaced, or multiple
> > spaced.
> >
> > Seems like there should be an analytic solution, but auto-correlation
> > doesn't seem right. Some kind of quantized best-fit?
> >
> > Rather than continuing to guess, does anyone know the name of the
> > algorithm for solving this type of problem. Is there a Java package
> > that can solve this kind of problem? I have looked at Colt, but it
> > does not provide a direct solution.
> >
> > Thanks,
> > Gerald

>
> You should try this post in comp.programming, if you want the algorithmic
> help, sans java-specific experience. Many there are java guys, but many are
> not, but they're there to help with algorithms.
>
>


Thanks, will repost there.

 
Reply With Quote
 
Gerald Rosenberg
Guest
Posts: n/a
 
      09-19-2004
In article <(E-Mail Removed)>, (E-Mail Removed)
says...
> Gerald Rosenberg wrote:
>
> > Have not been able to Google very well for an answer, since I haven't a
> > usable name for the algorithm/type of problem.
> >
> > In sum, I need to determine the least common denominator for the spacing
> > of a one dimensional array of integers where the integers have a noise
> > component.

>
> Could you state the problem more clearly? Do you need a single LCD for a set
> of integers? Do you need to find the most frequently occurring values in a
> set?
>
> > In practical terms, I have the Y-axis pixel locations of lines of text
> > on a page (which are approximations) and need to determine whether any
> > two adjacent text lines are single spaced, 1.5 spaced, or multiple
> > spaced.

>
> Create a histogram of all the values and examine them yourself for patterns,


Interesting. Will look into that. Thanks.

> then decide on an appropriate strategy to achieve what you are trying to
> accomplish, which you don't bother to say.


Did "need to determine whether any two adjacent text lines are single
spaced, 1.5 spaced, or multiple spaced" not relate what I am trying to
accomplish?

>
> Another poster has recommended a fourier transform, but I think this is
> overkill. A histogram approach will work for any case except many integers
> with little in common with each other. I don't think this is what you face.
>
> >
> > Seems like there should be an analytic solution, but auto-correlation
> > doesn't seem right. Some kind of quantized best-fit?

>
> Why not state the problem to be solved before hypothesizing about a
> solution?


Sure: In practical terms, I have the Y-axis pixel locations of lines of
text on a page (which are approximations) and need to determine whether
any two adjacent text lines are single spaced, 1.5 spaced, or multiple
spaced.

> >
> > Rather than continuing to guess, does anyone know the name of the
> > algorithm for solving this type of problem.

>
> What type of problem is that? You have only discussed one aspect of the data
> set, and you haven't stated a problem to be solved at all.
>


OK. World peace through analysis of existing imaged document
collections. Documents are imaged, OCR'd, and PDF'd. The PDF is a
given. Now I need to figure out the document structure from an analysis
of the PDF command and data stream.

A big problem, much of it solved. Now I am just tackling a very
specific aspect where I "have the Y-axis pixel [baseline] locations of
lines of text on a page (which are approximations [I.e., contain a noise
component]) and need to determine whether any two adjacent text lines
are single spaced, 1.5 spaced, or multiple spaced."

No doubt in the relm of mathematics (at least I expect) people have
investigated this class of problem and have proposed generalized
algorithms to solve it. Could not guess the name or a functional
description well enough to find it by Google. Thought that the good
folk here at cljp, in their acknowledged wide ranging knowledge of all
things algorithmic, might know a name for this class of problem, or
provide a pointer to suitable algorithms.
 
Reply With Quote
 
Paul Lutus
Guest
Posts: n/a
 
      09-19-2004
Gerald Rosenberg wrote:

/ ...

> Did "need to determine whether any two adjacent text lines are single
> spaced, 1.5 spaced, or multiple spaced" not relate what I am trying to
> accomplish?


No, that is a statement of a bit of data you need to solve the problem you
don't state.

>
>>
>> Another poster has recommended a fourier transform, but I think this is
>> overkill. A histogram approach will work for any case except many
>> integers with little in common with each other. I don't think this is
>> what you face.
>>
>> >
>> > Seems like there should be an analytic solution, but auto-correlation
>> > doesn't seem right. Some kind of quantized best-fit?

>>
>> Why not state the problem to be solved before hypothesizing about a
>> solution?

>
> Sure: In practical terms, I have the Y-axis pixel locations of lines of
> text on a page (which are approximations) and need to determine whether
> any two adjacent text lines are single spaced, 1.5 spaced, or multiple
> spaced.


What problem is this a part of? What good thing are you slowly working
toward by categorizing these line spacings?

--
Paul Lutus
http://www.arachnoid.com

 
Reply With Quote
 
Gerald Rosenberg
Guest
Posts: n/a
 
      09-19-2004
In article <(E-Mail Removed)>, (E-Mail Removed)
says...
> Gerald Rosenberg wrote:
>
> / ...
>
> > Did "need to determine whether any two adjacent text lines are single
> > spaced, 1.5 spaced, or multiple spaced" not relate what I am trying to
> > accomplish?

>
> > OK. World peace through analysis of existing imaged document
> > collections. Documents are imaged, OCR'd, and PDF'd. The PDF is a
> > given. Now I need to figure out the document structure from an analysis
> > of the PDF command and data stream.
> >


Your suggestion regarding histograms was helpful.

Thanks,
Gerald
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ Gurus - Is C++ a good choice for public API(s)??? Are there cleanways to solve known problems there? arijit79@gmail.com C++ 11 06-14-2009 02:54 PM
Are there any known issues for visiting asp.net pages on Macintosh? Jack ASP .Net 7 10-05-2005 10:00 AM
Is there a package known to render "forum posts" Tony Morris Java 0 03-12-2005 09:07 AM
Are there any known issues with Thunderbird? James Computer Support 1 01-08-2005 03:09 AM
is there any known issue about gcc3.2 handling exception? John Black C++ 1 08-28-2004 06:11 AM



Advertisments