/ Forside / Teknologi / Udvikling / Java / Nyhedsindlæg
Login
Glemt dit kodeord?
Brugernavn

Kodeord


Reklame
Top 10 brugere
Java
#NavnPoint
molokyle 3688
Klaudi 855
strarup 740
Forvirret 660
gøgeungen 500
Teil 373
Stouenberg 360
vnc 360
pmbruun 341
10  mccracken 320
XML serialisering med JAXP - hjælp ... :-)
Fra : Michael Berg


Dato : 22-02-04 13:39

Hej Alle!

Jeg har postet et indlæg i comp.lang.java.programmer omkring serialisering
af DOM objekter via JAXP, men har indtil videre ikke fået nogle
tilbagemeldinger. Så jeg håber der et eller andet sted her i gruppen sidder
en xml guru som kan hjælpe mig med det.

Indlægget er på engelsk - lev med det ..

Mvh Michael

###

I'm trying to serialize an xml document with JAXP. The xml may or may not
contain international characters, and so I want any text elements to be
UTF-8 encoded. Consider the following (a brief summary is included below the
code):

---- code begin ----

org.w3c.dom.Document doc =
javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder().
newDocument();

org.w3c.dom.Element el = doc.createElement("element");
el.setAttribute("attr1","attr1value");
el.appendChild(doc.createTextNode("Danish < æøå > characters!"));
doc.appendChild(el);

javax.xml.transform.TransformerFactory transformerFactory =
javax.xml.transform.TransformerFactory.newInstance();
javax.xml.transform.Transformer transformer =
transformerFactory.newTransformer();

transformer.setOutputProperty(javax.xml.transform.OutputKeys.INDENT,"yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount","4
");

java.io.StringWriter xmlout = new java.io.StringWriter();
javax.xml.transform.stream.StreamResult result = new
javax.xml.transform.stream.StreamResult(xmlout);
transformer.transform(new javax.xml.transform.dom.DOMSource(doc),result);

System.out.println(xmlout.getBuffer());

---- code end ----

So, I'm creating a document (DOM), setting an attribute and appending a text
node with international characters (and a couple of brackets just for fun).
Then I create a transformer instance, I ask it to indent the output nicely
and finally to actually serialize my DOM into xml.

When I run this code (in a jsp file on a tomcat 4.1.x server with the latest
xerces2-j version installed) I get this output:

<?xml version="1.0" encoding="UTF-8"?>
<element attr1="attr1value">Danish &lt; æøå &gt; characters!</element>

Okay. So I got the < and > escaped as I expected. However, the international
characters have not been encoded to UTF-8 or anything else for that matter.
In fact, the above isn't even a valid xml document, and several parsers I
tried (including Microsoft XML) rejects it because of the illegal character
data.

Clearly there is a mismatch between what the xml encoding specification
(UTF-8) and what's actually appearing in the
text nodes of the document. It's very curious that JAXP will transform a DOM
into a result that isn't valid.

Interestingly, when I run the same code interactively inside my WebSphere
Studio Application Developer 5 (using what is known as a scrapbook page), I
get this:

<?xml version="1.0" encoding="UTF-8"?>
<element attr1="attr1value">Danish &lt; &#230;&#248;&#229; &gt;
characters!</element>

Well. I'm not sure that #230 is a correct UTF-8 encoding of "æ" (in fact I'm
sure it isn't), but at least the document is now valid and even Microsoft
XML will parse it without complaints.

I am hoping that someone out there can shed some light on this problem and
tell me what I am doing wrong. Exactly how do I instruct JAXP to encode the
text nodes in my DOM so that it doesn't break my XML parser?

Regards,
Michael Berg
www.hyperpal.com




 
 
Michael Berg (22-02-2004)
Kommentar
Fra : Michael Berg


Dato : 22-02-04 19:09

Hej alle,

Problemet skyldes brugen af en StringWriter som opsamling af XML outputtet.
StringWriters har deres helt egen idéer om hvordan strenge skal encodes, så
benyt i stedet for en OutputStreamWriter - såsom fx:

java.io.ByteArrayOutputStream baos = new java.io.ByteArrayOutputStream();
javax.xml.transform.stream.StreamResult result = new
javax.xml.transform.stream.StreamResult(
new java.io.OutputStreamWriter(
baos,
"UTF-8"
)
);

// baos.toString() indeholder XML'en

/Michael
www.hyperpal.com

> Hej Alle!
>
> Jeg har postet et indlæg i comp.lang.java.programmer omkring serialisering
> af DOM objekter via JAXP, men har indtil videre ikke fået nogle
> tilbagemeldinger. Så jeg håber der et eller andet sted her i gruppen
sidder
> en xml guru som kan hjælpe mig med det.
>
> Indlægget er på engelsk - lev med det ..
>
> Mvh Michael
>
> ###
>
> I'm trying to serialize an xml document with JAXP. The xml may or may not
> contain international characters, and so I want any text elements to be
> UTF-8 encoded. Consider the following (a brief summary is included below
the
> code):
>
> ---- code begin ----
>
> org.w3c.dom.Document doc =
>
javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder().
> newDocument();
>
> org.w3c.dom.Element el = doc.createElement("element");
> el.setAttribute("attr1","attr1value");
> el.appendChild(doc.createTextNode("Danish < æøå > characters!"));
> doc.appendChild(el);
>
> javax.xml.transform.TransformerFactory transformerFactory =
> javax.xml.transform.TransformerFactory.newInstance();
> javax.xml.transform.Transformer transformer =
> transformerFactory.newTransformer();
>
>
transformer.setOutputProperty(javax.xml.transform.OutputKeys.INDENT,"yes");
>
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount","4
> ");
>
> java.io.StringWriter xmlout = new java.io.StringWriter();
> javax.xml.transform.stream.StreamResult result = new
> javax.xml.transform.stream.StreamResult(xmlout);
> transformer.transform(new javax.xml.transform.dom.DOMSource(doc),result);
>
> System.out.println(xmlout.getBuffer());
>
> ---- code end ----
>
> So, I'm creating a document (DOM), setting an attribute and appending a
text
> node with international characters (and a couple of brackets just for
fun).
> Then I create a transformer instance, I ask it to indent the output nicely
> and finally to actually serialize my DOM into xml.
>
> When I run this code (in a jsp file on a tomcat 4.1.x server with the
latest
> xerces2-j version installed) I get this output:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <element attr1="attr1value">Danish &lt; æøå &gt; characters!</element>
>
> Okay. So I got the < and > escaped as I expected. However, the
international
> characters have not been encoded to UTF-8 or anything else for that
matter.
> In fact, the above isn't even a valid xml document, and several parsers I
> tried (including Microsoft XML) rejects it because of the illegal
character
> data.
>
> Clearly there is a mismatch between what the xml encoding specification
> (UTF-8) and what's actually appearing in the
> text nodes of the document. It's very curious that JAXP will transform a
DOM
> into a result that isn't valid.
>
> Interestingly, when I run the same code interactively inside my WebSphere
> Studio Application Developer 5 (using what is known as a scrapbook page),
I
> get this:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <element attr1="attr1value">Danish &lt; &#230;&#248;&#229; &gt;
> characters!</element>
>
> Well. I'm not sure that #230 is a correct UTF-8 encoding of "æ" (in fact
I'm
> sure it isn't), but at least the document is now valid and even Microsoft
> XML will parse it without complaints.
>
> I am hoping that someone out there can shed some light on this problem and
> tell me what I am doing wrong. Exactly how do I instruct JAXP to encode
the
> text nodes in my DOM so that it doesn't break my XML parser?
>
> Regards,
> Michael Berg
> www.hyperpal.com
>
>
>



Søg
Reklame
Statistik
Spørgsmål : 177459
Tips : 31964
Nyheder : 719565
Indlæg : 6408192
Brugere : 218881

Månedens bedste
Årets bedste
Sidste års bedste