|
Post by outspaced on Jul 17, 2008 22:02:35 GMT
Well, I suppose I could do more with this, I never intended it to be any real project, it was just supposed to be sample code for people to look at as an example of how to muck about with the XML. Let me think on this a bit. Well, I meant that since you're looking at explaining how to decode the XML into a usable/readable format, going far enough to show how to decode the non-[a-zA-Z0-9] characters might also be helpful for other potential software developers. But you might have a better plan in mind, so that's only a suggestion.
|
|
|
Post by alderaine on Jul 18, 2008 15:33:27 GMT
When I was playing with trying to read the XML files, I wrote a text-based reader that processes all of the content available in Highway Hollocaust (and hence a fair chunk of Lone Wolf) I dealt with the XMLised characters by pure text manipulation: Pos = InStr(TempText, "<ch.eacute/>") If Pos > 0 Then TempText = Left$(TempText, Pos - 1) & "é" & Right$(TempText, Len(TempText) - Pos - 11) found = True End If
I am absolutely certain there are better ways using XML manipulation / XSLTs / etc - and they will definitely need to be included in the "how to read the AON XML" document.
|
|
Ryan
Kai Lord
Posts: 69
|
Post by Ryan on Jul 19, 2008 8:29:18 GMT
When I was playing with trying to read the XML files, I wrote a text-based reader that processes all of the content available in Highway Hollocaust (and hence a fair chunk of Lone Wolf) I dealt with the XMLised characters by pure text manipulation: Pos = InStr(TempText, "<ch.eacute/>") If Pos > 0 Then TempText = Left$(TempText, Pos - 1) & "é" & Right$(TempText, Len(TempText) - Pos - 11) found = True End If I am absolutely certain there are better ways using XML manipulation / XSLTs / etc - and they will definitely need to be included in the "how to read the AON XML" document. Maybe it's a silly question, but can I ask why these special characters are handled in such a funny way? Can't you just write out the HTML codes for them and be done with it?
|
|
|
Post by Taryn on Jul 19, 2008 19:21:37 GMT
Maybe the program outputs only as unformatted text and not HTML?
|
|
|
Post by alderaine on Jul 21, 2008 9:07:27 GMT
exactly - it was a basic text reader written for a basic job. I have another program that processes the HTML files for a different purpose, simply displaying them as web pages. The point is that we need to provide examples of how to process the XML for a variety of purposes (including entirely text-based platforms)
|
|
|
Post by jsager on Jul 21, 2008 15:33:05 GMT
Can anyone provide me with a list of the "special character" tags that they use?
|
|
Ryan
Kai Lord
Posts: 69
|
Post by Ryan on Jul 22, 2008 8:47:25 GMT
exactly - it was a basic text reader written for a basic job. I have another program that processes the HTML files for a different purpose, simply displaying them as web pages. The point is that we need to provide examples of how to process the XML for a variety of purposes (including entirely text-based platforms) Sorry, perhaps I should clarify: Why are there handled in a funny way in the source XML itself? XML offers a way to represent these characters natively, without having to implement new ENTITYs, etc. There should be no need for any special implementation in the XML or the rendering, AFAIK...
|
|
|
Post by alderaine on Jul 22, 2008 8:55:01 GMT
One for Thomas or Outspaced? I can only imagine it is due to the publishing & translation aspects, but I do not know for certain. I've posted a message to the mailing list to ask for the list of characters.
|
|
|
Post by jsager on Jul 22, 2008 9:30:39 GMT
UTF-16 XML could certainly handle it, I agree it's a bit weird, but I've got bigger fish to fry as long as I have a list I can convert against.
|
|
|
Post by Thomas Wolmer on Jul 22, 2008 12:23:32 GMT
exactly - it was a basic text reader written for a basic job. I have another program that processes the HTML files for a different purpose, simply displaying them as web pages. The point is that we need to provide examples of how to process the XML for a variety of purposes (including entirely text-based platforms) Sorry, perhaps I should clarify: Why are there handled in a funny way in the source XML itself? XML offers a way to represent these characters natively, without having to implement new ENTITYs, etc. There should be no need for any special implementation in the XML or the rendering, AFAIK... We used to have XML character entities, but we switched to elements. The reason as given in the version history of our DTD: Before this, we had to deal with several DTD includes with different versions of the entities. Here is our current list of characters-as-elements: <!ENTITY % character.content "ch.apos | ch.nbsp | ch.iexcl | ch.cent | ch.pound | ch.curren | ch.yen | ch.brvbar | ch.sect | ch.uml | ch.copy | ch.ordf | ch.laquo | ch.not | ch.shy | ch.reg | ch.macr | ch.deg | ch.plusmn | ch.sup2 | ch.sup3 | ch.acute | ch.micro | ch.para | ch.middot | ch.cedil | ch.sup1 | ch.ordm | ch.raquo | ch.frac14 | ch.frac12 | ch.frac34 | ch.iquest | ch.Agrave | ch.Aacute | ch.Acirc | ch.Atilde | ch.Auml | ch.Aring | ch.AElig | ch.Ccedil | ch.Egrave | ch.Eacute | ch.Ecirc | ch.Euml | ch.Igrave | ch.Iacute | ch.Icirc | ch.Iuml | ch.ETH | ch.Ntilde | ch.Ograve | ch.Oacute | ch.Ocirc | ch.Otilde | ch.Ouml | ch.times | ch.Oslash | ch.Ugrave | ch.Uacute | ch.Ucirc | ch.Uuml | ch.Yacute | ch.THORN | ch.szlig | ch.agrave | ch.aacute | ch.acirc | ch.atilde | ch.auml | ch.aring | ch.aelig | ch.ccedil | ch.egrave | ch.eacute | ch.ecirc | ch.euml | ch.igrave | ch.iacute | ch.icirc | ch.iuml | ch.eth | ch.ntilde | ch.ograve | ch.oacute | ch.ocirc | ch.otilde | ch.ouml | ch.divide | ch.oslash | ch.ugrave | ch.uacute | ch.ucirc | ch.uuml | ch.yacute | ch.thorn | ch.yuml | ch.ampersand | ch.lsquot | ch.rsquot | ch.ldquot | ch.rdquot | ch.minus | ch.endash | ch.emdash | ch.ellips | ch.lellips | ch.blankline | ch.percent | ch.thinspace | ch.frac116 | ch.plus">
|
|
|
Post by Desert Lynx on Jul 22, 2008 15:51:28 GMT
For example, we have generated LaTeX and PML in the past. Neither of these used Unicode (at the time?) so we needed a way to represent characters in an encoding agnostic way.
|
|
Ryan
Kai Lord
Posts: 69
|
Post by Ryan on Jul 23, 2008 10:55:58 GMT
For example, we have generated LaTeX and PML in the past. Neither of these used Unicode (at the time?) so we needed a way to represent characters in an encoding agnostic way. I believe, though, that these have since been moved to support Unicode (correct me if I'm wrong). In fact, most any implementations (worth implementing) nowadays should support Unicode. In any case, though, I understand that it's extra work to convert it back, though it may save developers extra work on the implementation side. Up to you guys, though.
|
|
|
Post by alderaine on Jul 24, 2008 8:42:15 GMT
We can convert it back as part of the Common Navigation Files project if you like - we will have to start updating the XML eventually anyway.
|
|