What to Do with Those Russians (Texts)


Intro

As usually, Russians can't do things in a straightforward way. Currently, there are 4 widely spread codings in use ( that I know of )
KOI-8
Mostly in use within Internet, widely accepted as de-facto e-mail standard
Alternative PC
MS-DOS users invented it to use stolen software and view pseudographics as it is in ACSII
Some MS-Windows coding
Which I've little knowledge of
ISO-8859-5
Like ISO network protocols VERY rare beast.
I believe, a couple others do exist :)
Naturally, when dealing with Russian text, one never can be sure that the thing is readable. You must accept this strange situation, as you have to accept Russia.

As for me, I prefer Unices and therefore KOI.
It's relatively easy to russify Xterm and X 11 Window System in general. Since the most popular Web browsers supporting forms are various Mosaic flavours, so I refer to Mosaic localization document, not exhaustive but useful guide.

If You Are Lazy

There is fast and dirty method to extract ( and to a certain degree read and understand ) Russian text with WORA. Apply function to character field you want to select from a database. You'll get KOI text with 8-th bit cut and having some resemblance to russian spelling ( it was one of the reasons for introducing this coding - to cure the problems arising from dumb american 7-bit sendmails and their ascii-centric owners ). The evident candidates are
INITCAP
Function capitalizes words
LOWER
Transform text to lower case
UPPER
Transform text to upper case
Acting as described above you can also define a condition for information retrieval ( specify "where" clause ). Naturally, you should be able to translate russian word to 7-t ascii. For example, look at WORA form fragment for "DUBNA_PHONES" table
WHERE
    LOWER
      last_name   LIKE   'iwan%'
That'll return to you records with "last_name" field beginning with "iwan", "IWAN", "IwAn" and alike strings. Note that usually those names ("iwan") are being written as "Ivan". Seems complicated, but for the most cases you should not think about it.

I'd like to add that search conditions for text inclusion should conform to Oracle strange notion of regular expressions.

See also how to use WORA forms document.


ocr