![]() |
|
|
|
#1 |
|
Appologies for the basic XML question...
I have some sales transaction data that is being generated from various sources. This is aggregated in a file every day and uploaded to a server. Then, the data is supposed to be fetched from the individual files and dumped in to a database (in to a single table). The data in the files is uniform for every record (i.e. date, time, price, item, code, etc.). Nothing fancy. I have the following questions regarding use of XML for these files: 1. Basically this data could as easily be stored in a delimited flat- file (as it is now). So what is the advantage in using XML in this case? (I know the advantages of XML in general, but in this case where the data never changes, I am not sure of the advantage). Actually, the XML markup bloats the file making transfer time more... 2. Should a DTD be used? A XML Schema? Or no need for this. I am asking this because suppose in the future some new information is added to the records, I think the processing script would need to know what version of the XML file is being used and process accordingly? 3. Should every record in the data be stores as a single node with data as attributes? (see example below). I think this is an age old dilemma in XML but not sure of the answer... 4. Does the use of XML make the task of dumping the records in to the database easier? (I think that either using existing classes or available utilities there is no real effort in doing this?) 5. Should any type of XML transformation be considered? (or when...) Sample flat file data: TNo, Date, Time, Price, Qty 100, 010107, 1020, 3.2, 7 Should XML look like this? <record TNo="100" Date="010107" Time="1020" Price="3.2" Qty="7"></ record> Or like this? <record> <Tno>100</TNo> <Date>010107</Date> <Time>1020</Time> <Price>3.2</Price> <Qty>7</Qty> </record> ElderUberGeek |
|
|
|
|
#2 |
|
Posts: n/a
|
ElderUberGeek wrote:
> 1. Basically this data could as easily be stored in a delimited flat- > file (as it is now). So what is the advantage in using XML in this > case? (I know the advantages of XML in general, but in this case where > the data never changes, I am not sure of the advantage). Actually, the > XML markup bloats the file making transfer time more... Yes, the markup is annoying if you are used to delimited flat files. Are there any chinese characters in the data ? Are their currency symbols like € ? Then XML has the advantage that handling Unicode characters is clearly defined. > 2. Should a DTD be used? A XML Schema? Or no need for this. I am > asking this because suppose in the future some new information is > added to the records, I think the processing script would need to know > what version of the XML file is being used and process accordingly? DTD is more standard, but restricted in what it can specify. Schema is not quite as standard as DTD, but more powerful in what it can specify. > 3. Should every record in the data be stores as a single node with > data as attributes? (see example below). I think this is an age old > dilemma in XML but not sure of the answer... Yes it is an old problem. Date, Time, Price, Qty are simple enough to be stored in attributes. But article descriptions may be better placed into the node's text. > Should XML look like this? > <record TNo="100" Date="010107" Time="1020" Price="3.2" Qty="7"></ > record> Yes. > Or like this? > <record> > <Tno>100</TNo> > <Date>010107</Date> > <Time>1020</Time> > <Price>3.2</Price> > <Qty>7</Qty> > </record> If the tool you are using can handle this easily, you shouldnt worry too much. =?UTF-8?B?SsO8cmdlbiBLYWhycw==?= |
|
|
|
#3 |
|
Posts: n/a
|
On 1 Jul, 17:31, ElderUberGeek <aribl...@gmail.com> wrote:
> I have some sales transaction data that is being generated from > various sources. This is aggregated in a file every day and uploaded > to a server. When I do this, I tend to use either RSS or Atom (I'd suggest Atom for new work). They're both XML, it's just that they've already defined much of the DTD/Schema I need. With RSS / Atom + Dublin Core I find that I can solve many of my similar problems without needing to write any (or much) new code. Andy Dingley |
|