In article <slrnaji5f4.18l.bjensen@ask.ask>, bjensen@nospam.dk says...
> Det er ellers en dårlig ting at forstå, for Java er ikke baseret
> på Unicode, men på den ægte delmængde af Unicode som kaldes Basic
> Multilingual Plane (BMP), og som kan repræsenteres med kun 16 bit
> per tegn.
Java and Unicode (uddrag fra Beginning Java 2 SDK 1.4, Ivor Horton/Wrox)
Programming to support languages that use anything other than the Latin
character set has always been a major problem. There are a variety of 8-
bit character sets defined for many national languages, but if you want
to combine the Latin character set and Cyrillic in the same context, for
example, things can get difficult. If you want to handle Japanese as
well, it becomes impossible with an 8-bit character set because with 8
bits you only have 256 different codes so there just aren't enough
characters necessary for almost all languages to be encoded. It uses a
16-bit code to represent a character (so each character occupies two
bytes), and with 16 bits up to 65,535 non-zero character codes can be
distinguished. With so many character codes available, there is enough
to allocate each major national character set its own set of codes,
including character sets such as Kanji which is used for Japanese, and
which requires thousand of character coes. It doesn't end there though.
Unicode supports three encoding forms that allow up to a million
additional characters to be represented.
As we shall see in Chapter 2, Java sourcecode is in Unicode characters.
Comments, identifiers (names - see Chapter 2), and character and string
literals can all use any characters in the Unicode set that represent
letters. Java also supports Unicode internally to represent characters
and strings, to the framework is there for a comprehensive international
language capability in a program. The normal ASCII set that you are
probably familiar with corresponds to the first 128 characters of the
Unicode set. Apart from being aware that each character occupies two
bytes, you can ignore the fact that you are handling Unicode characters
in the main, unless of course you are building an application that
supports multiple languages from the outset.
--------------------------------------------
(Gider ikke høre om stavefejl, det er bare CC'et fra mine øjne til mine
fingre fra bogen
)
Desuden står der så i kapitel 2, side 62, et afsnit der hedder:
--------------------------------------------
Character Escape Sequences
If you are using an ASCII text editor you will only be able to enter
characters directly that are defined within ASCII. You can define
Unicode characters by specifying the hexadecimal representation of the
character codes in an escape sequence. An escape sequence is simply an
alternative means of specifying a character, often by its code. A
backslash indicates the start of an escape sequence, and you create an
escape sequence for Unicode character by preceding the four hexadecimal
digits of the character by \u. Since the Unicode coding for the letter X
is 0x0058 (the low order byte is the same as the ASCII code), you could
also declare and define myCharacter with the statement:
char myCharacter = '\u0058';
You can enter any Unicode character in this way, although it is not
exactly user-friendly for entering a lot of characters.
--------------------------------------------
Det KUNNE jo også være manden bruger en ASCII baseret editor, og bruger
command-line compileren, hvorved han vil opnå besvær med karakter-
sættene, da editoren skriver ASCII og compileren forudsætter Unicode.
Hvis det er tilfældet, så prøv at henvise direkte til hex-koden for æ, ø
og å, og se om det måske løser problemet
Håber det kan hjælpe !
--
Jacob Saaby Nielsen