N
Theory on Character EncodingEach character is actually represented in memory by a certain bit sequence. The terminal interprets that bit sequence to show the corresponding character.This means that the application that sends the bits to the terminal should know how the terminal will interpret the bits, to show the correct characters.The ASCII code is a very old standard that sets 7-bit codes (which can be seen as numbers between 0 and 127) for English alphabet letters and a lot of more symbols. Thus, for example, the numerical code 97 (bit pattern) 1100001) represents the letter a.Being so old and extended this code is admitted by all terminals, all publishers and all software in general. That's why the "Hello" text comes out correctly, because it is encoded in ASCII in memory like the bytes sequence 72, 111, 108, 97which is sent to the terminal, which, by supporting also the ASCII standard shows the "H", "o", "l", and "a".The problem appears as soon as you get out of the characters provided in the ASCII code, such as the symbol ¿or the ñ or accented vowels, or already put any character of other alphabets such as Cyrillic, Arabic, Chinese, etc.To encode any alphabet the current standard is called Unicode and is not limited to 7-bit codes, but uses many more (because millions of possible characters can represent).Explanation to your problemIn any case, your problem is not related to Unicode. What is happening here is that your program emits characters to the terminal using a code called cp1252while the terminal interprets them using another different encoding called cp437.Coding cp1252 is the standard in Windows for a long time. You can see https://en.wikipedia.org/wiki/Windows-1252 that in this codification, to the character ¿ corresponds to the code BF (in this table the codes are shown in hexadecimal; this binary number would be 10111111 and in decimal it would be 191).On the other hand encoding cp437 was the one that used MS-DOS, the first PC operation, and that Windows still holds for the console. You can see https://en.wikipedia.org/wiki/Code_page_437 that in this coding, code 191 represents the character ┐. This therefore explains what happens to you. And you can prove that if your messages include eñes or accents, they will also go wrong (and you can entertain yourself by observing the tables mentioned above to see how everything fits).SolutionThe simplest solution is to change the code table in the windows console, so it fits the one that is using your Java program. Just type the following in the terminal:chcp 1252
The command chcp (change code page) allows you to modify the way the terminal interprets the codes it receives. As you can imagine, put 1252 makes it happen to use Windows encoding, so you'll see correctly the output of the program whenever the program emits the characters using this same encoding.You may find other programs that broadcast in Unicode (specifically in UTF-8 that is a way to encode Unicode as bytes sequences). Today is the standard and as I said it supports many more alphabets. To change the windows console so that UTF-8 support you have to write chcp 65001 (and also the terminal should use a font that has characters in the supported alphabets, so you probably need to change the default source in the terminal, through its Properties menu).