Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Parsing XML and storing attributes in MySQL

Reply
Thread Tools

Parsing XML and storing attributes in MySQL

 
 
jkugler
Guest
Posts: n/a
 
      06-12-2006
Hello,

I am trying to store huge amounts of data from xml files and put them
into a MySQL database. The xml files all are in this format:

<?xml version="1.0" encoding="UTF-8"?>
<snp_submission xsi:schemaLocation="http://www.hapmap.org
http://hapmap.cshl.org/xml-schema/2003-1
1-04/hapmap.xsd"
lsid="urn:LSID:ncbi.nlm.nih.gov:dbSNP/DCC_batch:34_chrom11_DELETE:005"
xmlns="http
://www.hapmap.org"
xmlnssi="http://www.w3.org/2001/XMLSchema-instance">
<batch_info>
<class>DELETE</class>
<group lsid="urn:lsid:dcc.hapmap.cshl.org:LabgroupBSNP: 1"/>
<date_created>2005-09-15</date_created>
<contact>
<name>Steve Sherry</name>
<email></email>
</contact>
</batch_info>
<snps>
<snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs246 1544:111">
<snp_class>
<bac-overlap/>
</snp_class>
<sequence>

<flank_5>caaaggaatataaatcattctattataaagatacatgcacA GGgctgggtgcagtggctcacacctgtaatccc
agcactttgggaggccaaggcgggtggatcacctgaggacaggagtttga gaccagcctagccaacatggggaaactccatctctactaaaaatacaaa
aattagccaggtatagtggtgcacacctgtaataccagctactttggagg ctgaggcaggagaatcgctggaacccaggaggcagaggtcaaagtgagc
caagatcataccattgcactccagcctgggcaacaagagcaaaactccat cttaaaaaaatatatatatatacatatacatacatatatatacacatat
atatacatatatacagatattatatatgtaaatgtatatatatgtgtata tatatacacatatatatacatattataactacatatatatacacacaca
catacatatacatgcacacatatgtttattgcagcactatttacgataga aaatacatggaatcctcccaaatgcccatcaatgatatattggataaag
aaaatgtgatatatattcaccatggaatactatgcagccgttaaaataaa tgagatcatgttctttgcagggacatggatgaagctggaagccatcacc
ctcagcaaactaacacaggaaaagaaaaccaaacaccacatgttctcagt cgtaagagggagttgaacaatgagagcaaacacatggatacatggaggg
gaacaacacacaccagggcctctcagcgggacaggggtaggagaCCATCA GGACAaacacgtggatacatggaggggaacaacacacaccagggcctct
cagggggacagggggtaggagaccatcaagacaaacacgtggatacatgg aggggaacaacacacaccagggcctctcagggggacagggggtaggaga
ccatcaggacaaacacgtggatacatggaggggaa</flank_5>
<variation>
<allele base="C"/>
<allele base="T"/>
</variation>

<flank_3>aacacacaccagggcctctcagggggacagggggtaggaga ccatcaRgacaaacacgtggRtacatggagggg
aacaacacacaccagggcctctcagggggacggggggtagRagaccatca ggacaaatagctaatgcatgcagggcctcatacctaggtgatgggttga
tgggtgcagcaaaccaccatggcacacatttacctatgtatcaaacctaY actttctgcacgtgtatcccagaacataaaataaaatttaaaaaatata
taCACTGATTCATGATCTCCTttctctccttctgaaacactctttaaaac tttttagcatttccccctctgtcttccatgtctcctaactacatgtttc
ttattttccatgtctttattcctgtgttcattttggatagccccttctga cctatattacagtttactagttcactcttcaactgcttctaacatacta
atattctgttaaaaccattcatttgggtttaaatttcaattatgttattc tctatggacattctatttgttttcttttaatcttcttggccattctcta
gagtttcctgttccattatgatatttttaattttttgttttactttaaac atactaaatatagttattttattttattttctgtatctgatactttcaa
taactgcagtctttgctagtcttttttctgtgctcttgctcatagttttt ttcatttgttttCATGATTagaaaaacagagagagaagaaggagagtaa
agggaggaggcggaggaggagaaaagaagaaagcagagaagaagggacag agaaaaaaaggaagTTGGTTCTAACGTTTCTCTAACAACTGGCTTCAGT
GAAACACTCCCACCTTGTGGATTTTTAGGTTATTGAAATTAACCAGTCTT Ctgggtgcagcacaccaacatggcacatgtatacatatgtaacaaacct
gcactttgtgcacatgtaccctaaaacttaaagta</flank_3>
<length_class value="full-length"/>
</sequence>
<genomic_locations>
<genomic_location active="true" type="exact">
<assembly_version>ncbi_build34</assembly_version>
<coordinates chrom="11" start="60749" stop="60749"
strand="forward"/>
</genomic_location>
</genomic_locations>
<neighbour_snps>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs246 1545:101"/>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs246 1546:101"/>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs246 1547:101"/>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs268 6858:100"/>
</neighbour_snps>
</snp>


I am trying to extract the following information: lsid, allele base,
cooordinates chrom, start, and strand. I would ideally like to use a
perl script to pull this data out and put it in MySQL, but I have had
no luck with XML:arser since the information I am looking for is
embedded in the element's attributes.

If anyone has any ideas on how to get this data out of xml and into
MySQL, it would be much appreciated.

Thanks so much,

--James

 
Reply With Quote
 
 
 
 
=?ISO-8859-1?Q?J=FCrgen_Kahrs?=
Guest
Posts: n/a
 
      06-12-2006
jkugler wrote:

> I am trying to store huge amounts of data from xml files and put them
> into a MySQL database. The xml files all are in this format:


This is genomic data, so you probably have
GigaBytes of this stuff. Very few tools are
capable of handling such amounts in acceptable
time with acceptable resource usage.

> I am trying to extract the following information: lsid, allele base,
> cooordinates chrom, start, and strand. I would ideally like to use a
> perl script to pull this data out and put it in MySQL, but I have had
> no luck with XML:arser since the information I am looking for is
> embedded in the element's attributes.


Have a look at what Andrew Schorr did to store
GigaByte of XML data into PostgreSQL:

http://home.vrweb.de/~juergen.kahrs/...nto-PostgreSQL

There is currently no interface for connecting
to MySQL. But Andrew explained how such an interface
should be implemented, when he answered a similar
question today in comp.lang.awk:

> Please take a look at the PostgreSQL API that I implemented
> for xgawk: http://sourceforge.net/projects/xmlgawk. I think
> a Mysql extension next would be nice.
>
> I had thought of building a general xgawk database API
> over the libdbi library (http://sourceforge.net/projects/libdbi),
> but I found that the libdbi API did not offer all of the PostgreSQL
> API features that I needed to access. But as libdbi matures,
> that may be a good layer to build upon.
>
> But xgawk would certainly be a great platform for you to
> use in building a new database access mechanism. Contributions
> are welcome.
>

 
Reply With Quote
 
 
 
 
Joe Kesselman
Guest
Posts: n/a
 
      06-12-2006
If you're dealing with gigabytes of data, you may want to look at IBM's
new XML capabilities in DB2... or at "exploding" the XML into a standard
database representation and operating on that. (XML is a good
interchange format, but often is not the best back-end representation.)

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
 
Reply With Quote
 
=?ISO-8859-1?Q?J=FCrgen_Kahrs?=
Guest
Posts: n/a
 
      06-12-2006
jkugler wrote:

> I am trying to extract the following information: lsid, allele base,
> cooordinates chrom, start, and strand. I would ideally like to use a


I just had a look at the data. This is a structural
outline of your XML snippet:

snps
snp lsid='urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs246 1544:111'
snp_class
bac-overlap
sequence
flank_5
variation
allele base='C'
allele base='T'
flank_3
length_class value='full-length'
genomic_locations
genomic_location active='true' type='exact'
assembly_version
coordinates chrom='11' start='60749' stop='60749' strand='forward'
neighbour_snps
neighbour_snp lsid='urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs246 1545:101'
neighbour_snp lsid='urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs246 1546:101'
neighbour_snp lsid='urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs246 1547:101'
neighbour_snp lsid='urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs268 6858:100'

Why don't you convert it to CSV format and import
the CSV data into your data base ?

> perl script to pull this data out and put it in MySQL, but I have had
> no luck with XML:arser since the information I am looking for is
> embedded in the element's attributes.


You probably mean lines likes this one:

coordinates chrom='11' start='60749' stop='60749' strand='forward'

This is particularly easy to handle in XMLgawk.
I guess it would take about 10 to 20 lines of
XMLgawk to convert your data to CSV format.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What libraries should I use for MIME parsing, XML parsing, and MySQL ? John Levine Ruby 0 02-02-2012 11:15 PM
web.xml / XML schema issue, why do some XML schema attributes disappear asciz@starmail.com Java 3 02-20-2007 09:56 AM
storing pointer vs storing object toton C++ 11 10-13-2006 11:08 AM
Parsing XML and storing attributes in MySQL using Perl jkugler Perl Misc 5 06-12-2006 11:32 PM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM



Advertisments