Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > XML, JDom and regular expressions ...

Thread Tools

XML, JDom and regular expressions ...

Posts: n/a
Hi everybody,

I'm helping a friend with a parsing problem using JDom. As we're latin
people , we have in our xml files characters like "" or "".
That far, no problem.

But we have to modifiing XML files using alphabets that don't support
these characters, such as UTF-8 (but non only this one). In fact, our
company re-used xml files previously developped by another company ("not
latin"). Inserting data was not a problem, but today modifiing isn't so
easy. And with this configuration, JDom throws exception, even if we add
these lines :

Format format=Format.getPrettyFormat();

So we're not able to generate a DOM document ! And so we can't modify
our documents !

Then we decided to modify the line :
<?xml version="1.0" encoding="utf-8"?> (for example)
by something like :
<?xml version="1.0" encoding="iso-8859-1"?>

But as we can't know before reading the file the alphabet type, we
decided to use a regular expression.

As I'm more skilled in PHP than in Java, I developped that pattern in
PHP (tested and working) :
that should be replaced by :

But I don't succeed in translating it in Java.
Using that syntax :

Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(string);
string = m.replaceAll(replace);

pattern = "(<\\?xml[^>]+encoding=\")([^>]+)(\"?[^>]+\\?>)";
replace = "\\1iso-8859-1\\3";

does not work ...

Can someone help me to translate my pattern from a PHP syntax to a Java
syntax ?


Ps : I already read ....

Reply With Quote
Mike Lischke
Posts: n/a
Pimousse wrote

>Then we decided to modify the line :
><?xml version="1.0" encoding="utf-8"?> (for example)
>by something like :
><?xml version="1.0" encoding="iso-8859-1"?>

Why on earth would you want to switch from Unicode to ANSI when dealing with several languages? This is exactly the wrong direction unless you are forced to use ANSI (latin-1 or whatever). I recommend to use utf-8 instead. It is a bit tricky to store a file with JDOM in UTF-8 but nonetheless possible and works like a charm if you know how.

Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
java.lang.NoSuchMethodError: org.jdom.Element: method getParent()Lorg/jdom/Element Tinker Java 4 10-09-2005 03:12 PM
JDOM: java.lang.NoClassDefFoundError: org/jdom/Content Bernd Oninger Java 4 06-21-2004 09:08 PM
JDOM: java.lang.NoClassDefFoundError: org/jdom/Content Bernd Oninger XML 3 06-21-2004 09:08 PM
Add custom regular expressions to the validation list of available expressions Jay Douglas ASP .Net 0 08-15-2003 10:19 PM
Help with JDOM, turn org.jdom.Document -> org.w3c.dom.Document? Wendy S Java 1 08-04-2003 11:48 PM