Unicode has been developed as a standard way of representing the characters in all the world’s basic languages.
It was designed to make it easier to exchange files, between computer users who write using different alphabets, and to make documents that do not use the Western, Latin alphabet, readable all over the world, regardless of which program, or operating system is used to view them with.
Before Unicode was introduced, people used standard eight bit (one byte) fonts, which allowed the representation of only 256 different characters. When creating complex documents which included, for example, special symbols and foreign-language characters, you would have to use several different fonts to get all the required characters.
The problems would start when you emailed this to somebody else, and they didn’t have the same fonts installed on their system, that you had used to create it with. Parts of the document would be unreadable.
A further problem arises when two different encodings are used for the same set of characters.
For instance, Cyrillic web pages are often encoded in either KO18-R, or Windows 1251, which both have the same characters, but in different positions in the font table. If you sent a KO18-R document, or email, to somebody whose system was set up for Win 1251, they might not be able to read it.
Unicode was designed to get around these problems by using more than eight bits to represent a character. By using more than one byte, virtually every symbol, or character that exists in any language, can be allocated its own, unique number (or code point), so that the character can be represented by the same number, whatever operating system, or program you are using anywhere in the world.
If your system supports Unicode, a word processor or web browser that loads a Unicode document, will check all the code points in the document against a look up table, which tells it whether that character is defined in any of its installed fonts. If so, it retrieves the character’s details, and renders it to the screen.
Unicode makes the sharing of documents and web pages, across national boundaries, much easier, and is a God-send to those who regularly communicate with people who don’t use our alphabet, or are interested in studying foreign languages.
So how does RISC OS bear up to the challenge of supporting this international standard ?
To Install Unicode on RISC OS 4
- First download a copy of the Cyberbit font from here.
- Now download a copy of John-Mark Bell’s font converter from here. Extract the program using !SparkPlug.
- Load TTF2f, and Select-click on its icon to open the dialogue box.
- Drag the Cyberbit file icon into the top writeable box, and change the RISC OS name to Cyberbit.
- Click on the drop-down menu, and select !Boot.Resources!Fonts as the destination for the converted font.
- Now click on the Convert button.
BE WARNED, however, it takes about ten minutes to complete on a StrongARM RiscPC, and, for the first two or three minutes, nothing appears to be happening!
Do NOT assume that the program has crashed, and hit the On/Off button. Just be patient, and wait for it to finish. - Now download the Fonts (Unicode Font Manager) module, from the Video section of the downloads area of the RISC OS Open website, and get ROMFonts as well.
- Extract Fonts and RomFonts, using SparkPlug.
- Find the !Boot application on your hard drive, hold the Shift key down, and double click on it.
Double click on Choices, Users, Single, Boot and then PreDesk.
Drag Fonts and ROMFonts into PreDesk. - Create a little Obey file with the following two lines -
and save it in PreDesk.RMEnsure ROMFonts 0.75 RMLoad <Obey$Dir>.ROMFonts
RMEnsure FontManager 3.66 RMLoad <Obey$Dir>.Fonts - Do Ctrl-Shift-F12, (or select Shutdown from the Task icon on the iconbar), to reboot your computer.
- Now, load NetSurf and head on over to the webpage of Nippon Television, to check you’ve got it working properly.
If so, you should see something like this:
Unicode and !PDF
In quite an important development, Chris Gransden has released a new version of his popular PDF reader (version 3.02), which can display Unicode characters in PDF files.
To do this, you have to click on the Render as Sprite icon on the PDF toolbar, and wait for the current page to open in Paint.
Previously, you would just have seen rows of ‘garbage’ where the Unicode characters were.
There is still the problem that the resulting sprite file is not nearly as useful for high-quality printing, as a draw file would be.
But it is an important step forward.
Getting Unicode Text into a Word Processor
Although at the moment, NetSurf is the only major program on RISC OS, that fully supports Unicode, there is one very useful trick that allows you to get Unicode text off the web, and into a word processor, like EasiWriter.
With the required page loaded in NetSurf, click Menu, and then go Page->Export->Draw.
Load the Draw version of the page, and click on it with the Selection tool.
At the moment, everything on the page is grouped into one big object. But, if you hold down the Control key and press U a few times, the page will begin to break up into separate objects, which allows you to select only the part of the screen that you want, with the Selection tool.
Now click Menu, and do File->Save->Selection, to save just the part of the screen you want to use.
If you want to change the size of the text, double click on the new file, select the text and use Transform->Magnify to do so.
Et viola! The resulting Draw file can now be dropped into EasiWriter.
Unicode imported into EasiWriter, courtesy of Draw
At the moment, we still cannot edit the text, but by using NetSurf’s Export Draw capabilities, we can effectively cut and paste from the internet into EasiWriter. And because Draw is a vector format, we can not only resize the Unicode text, but should also get much better print quality than we would, if we simply snap-shotted part of the screen with Paint, and used the resulting Sprite file in EasiWriter.
For The Future
Although viewing Unicode documents is becoming much easier for RISC OS users, thanks to the efforts of Chris Gransden and the NetSurf team, we still have a long way to go. We still have no easy way of actually creating Unicode content.
What we really need is a !Chars type application, which can be configured to present useful sets of all the Unicode characters, and which allows users to enter them into a document by clicking on a palette. (Luckily, since I started writing this article, Paul Sprangers in the Netherlands, has started work on such a program, called KeyMap).
What we also need, is for at least one of the RISC OS word processors (TechWriter being the obvious candidate) to be upgraded, so that it can handle Unicode characters internally, and save Unicode documents as well.
At the moment, we can extract blocks of Unicode text out of NetSurf, and drop them into TechWriter, but we cannot edit them.
However, if enough users request this as a feature they’d like to see, maybe this will change . . .
Finally, don’t take continued software development on RISC OS for granted. If you regularly download the latest version of NetSurf from their web site, and bits and pieces from RISC OS Open, make sure you make a donation to one of the bounties set up by the latter, to finance further development of the operating system. And remember to buy one of the £5 CDs from NetSurf, next time you see them at a RISC OS show!
If you want software development on RISC OS to continue, find a way of showing your support for it.
Many thanks to John-Mark Bell of the NetSurf team, and Steve Revill of RISC OS Open for help researching this article.