Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > newbie question on sequencing

Reply
Thread Tools

newbie question on sequencing

 
 
Mar Thomas
Guest
Posts: n/a
 
      08-26-2003
Heres my problem. I have an xml which looks like this
<myfile>
<parent num=1.00>
<child num=1.01>
<child num=1.02>
<child num=1.03>
</parent>
<parent num=1-a.00>
<child num=1-a.01>
<child num=1-a.02>
<child num=1-a.03>
</parent>
<parent num=A>
<child num=a>
<child num=b>
<child num=c>
</parent>
</myfile>

You will notice that the numbering structure changes for every element. How
can I find out

1. What the sequence is for each element
2. If there are any numbers missing in the each of the sequences

Can my XML parser help me get this info. I dont know where to start

Thanks


 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      08-26-2003
On Tue, 26 Aug 2003 14:31:23 -0400, "Mar Thomas" <(E-Mail Removed)>
wrote or quoted :

><parent num=1.00>
> <child num=1.01>
> <child num=1.02>
> <child num=1.03>
></parent>
><parent num=1-a.00>
> <child num=1-a.01>
> <child num=1-a.02>
> <child num=1-a.03>
></parent>
><parent num=A>
> <child num=a>
> <child num=b>
> <child num=c>
></parent>


Let's break the problem in two. Problem one, extract a sequence you
want to analyse from the XML. .e.g. "1.01" 1.02", 1.03" or "a", "b",
"c".

Now for the analysis:

1. use a regex to see if a sequence follows a known pattern. Apply to
the regex to each value in turn for each of your patterns. See
http://mindprod.com/jgloss/regex.html

2. Now you have identified the pattern, you can create a generator of
the expected value given the previous value. If they don't match, you
have a break.
--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
 
Reply With Quote
 
 
 
 
Brad BARCLAY
Guest
Posts: n/a
 
      08-27-2003
Mar Thomas wrote:

> You will notice that the numbering structure changes for every element. How
> can I find out
>
> 1. What the sequence is for each element
> 2. If there are any numbers missing in the each of the sequences
>
> Can my XML parser help me get this info. I dont know where to start


About all that an XML parser is going to be able to give you is the
daata itself. The parser doesn't necessarily know nor care what the
data actually is, so long as it conforms to the relevent DTD.

I assume you're trying to determine the sequencing programatically?

There are two things you really need to accomplish here -- the first is
a regular expression that encompasses the "language" of the values, and
the second is to create a dictionary ordering for the value elements so
that you can properly increment them.

For the first, you'll want to start by defining the relevent alphabet
for the language. To do this, you'll want to inspect the elements and
identify:

- The letter elements
- The numerical elements
- The symbol elements

Before you go much further, try to determine wether or not there are
going to be _any_ rules for the numbering -- ie: are non-alphanumerics
considered static seperators that are unchanging, or can they too be
incremented? If the former, things are a bit easier -- if a
non-alphanumeric occurs in the numbering, it will be unchanging in its
"position" throughout all members, making the construction of the
regular expression defining that numbering easier. If they can be an
active element of the numbering, things are somewhat more difficult.

As well, you'll have to try to determine what is to happen when a
letter or number identifier reaches its maximum amount for the given
number of digits. For example, if you have the following numbering in
your XML file:

'1'
'2'
'3'

...we can probably safely assume that '4' is next. But what comes
after '9'? Will it be '10' (adding another digit where one didn't exist
before?), 'A' (retaining single-digitedness, but either switching to
letters, _or_ assuming a hexidecimal representation), or will this not
be allowed?


Similar goes for letters. What comes after 'z'? 'A'? 'aa'?
Undefined? Nothing?

If you're working with numerical values, are you going to assume
they're decimal? If you only have as input the numbers 1 - 3 as above,
you could be working with octal values, where there is no '8' or '9'
digits. If you know for certain that only decimal values will be
allowed, this makes such issues quite a bit easier.

All of these factors will determine your regular expression
construction which, if you don't have any rules, can be a difficult
thing to construct algorithmically (as to correctly achieve the ends you
desire, it's not enough to create an expression that accepts the values
present, and the values presumed. ".*" will accept your values (and
everything else while you're at it). What you need to do is create an
expression which excepts _exactly_ your language -- ie: it will accept
all the allowable elements of the language, but nothing that isn't part
of the language).

Once you have those in place, you can use them to ensure that the
elements are consistent with the language they appear to be part of.

The next step is to have some dictionary rules in place for
incrementing and comparison. Assuming the common right-hand-digit
incrementing system the common numerical systems use, doing this will be
easy -- you can use a straight ASCII increment for all non-seperator
(static) elements, incrementing just as you would if you were working
with decimal numbers. To verify that the elements present do indeed
form a series, simply read the first value, increment it by one, and
check to see if that equals to the next value. If it does, it's in
sequence. If not, it's not (or you've made an incorrect assumption as
to the values).

You've asked a very difficult set of questions -- ones which have no
specific answers (aand no real "optimal" answer). For any "word" in a
language, there are an infinite number of grammers that can contain that
"word", most of which will also contain invalid values, and many of
which will reject valid values in the same language. You're trying to
devine a whole language based on a few elements. The only way you can
be precise in this instance is if you assume that those values are the
_only_ acceptable values in the language, and you construct a regular
expression that accepts exactly and only those values -- which doesn't
appear to be what you want.

The long and the sort of it being, unless you have some really explicit
rules, or create an XML entity (or attribute) where the developer can
define the regular expression in use for their numbering language, any
solution you come up with is going to be imprecise, and may be
error-prone with certain types of numberings.

(It should also be noted here that there are a lot of languages which
regular expressions _cannot_ define. These include anything that
requires some form of "memory" between states -- something which would
need a grammer instead of a finite automata).

HTH!

Brad BARCLAY

--
=-=-=-=-=-=-=-=-=
From the OS/2 WARP v4.5 Desktop of Brad BARCLAY.
The jSyncManager Project: http://www.jsyncmanager.org


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Efficiently sequencing a stream of objects Chris Java 11 03-22-2007 12:02 PM
Web form sequencing questions - vb.net & tab order Tom Houston via .NET 247 ASP .Net 0 03-13-2005 03:28 AM
IsPostBack Sequencing problem in dynamically created controls on web page Kamal Jeet Singh ASP .Net 1 09-24-2004 09:00 AM
IsPostBack Sequencing problem in dynamically created controls on web page. Kamal Jeet Singh ASP .Net 1 09-23-2004 04:18 PM
Composite Control sequencing question Cathead ASP .Net Web Controls 0 09-11-2003 04:28 AM



Advertisments