Levels of language support on the web
- Standard representation for text
- Typically this is Unicode (UTF-8, UTF-16)
- It may be a defined standard, e.g. Shift-JIS, ISO-8859-1
- Font encodings (hacked fonts with replaced glyphs) are unlikely to be supported.
Example: Burmese encoding in non-Standard Zawgyi.
- Font support (usually Unicode)
- Downloadable fonts for desktop and laptop computers
- Web fonts for web sites
- To show text on a mobile device, a font for the script is needed. Note that
such devices generally do not allow users to add fonts.
-
Keyboard support: A way to enter text
- May be available on operating system or via an installed application
- Web-based keyboard in browser and web apps. Example: Google Input Tools
- Mobile device text keyboard - no suggestions
- Soft keyboard + text suggestions: this requires a word list or dictionary, with appropriate licensing
- High quality search: requires text processing and analysis
- Language identification (from text)
- Segmentation: breaking text into words
- Determining the "stems" of words, as in a language with various forms, e.g.,
"housing" & "houses" -> "house"
- In some cases, conversion to Unicode encoding
- Handwritten character input
- Optical character recognition (OCR)
- Highest level: requires extensive data and processing.
- Full user interface translation in products
- Machine Translation
- Speech to text (STT) or Voice understanding
- Text to speech (TTS)