String processing for Ada

STRINGS EDIT
version 3.8
by Dmitry A. Kazakov
(mailbox@dmitry-kazakov.de)

This library is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

As a special exception, if other files instantiate generics from this unit, or you link this unit with other files to produce an executable, this unit does not by itself cause the resulting executable to be covered by the GNU General Public License. This exception does not however invalidate any other reasons why the executable file might be covered by the GNU Public License.

The package Strings_Edit provides I/O facilities. The following I/O items are supported by the package:

Generic axis scales support;
Integer numbers (generic, package Integer_Edit);
Integer sub- and superscript numbers;
ISO 8601 representations of time and duration;
Floating-point numbers (generic, package Float_Edit);
Roman numbers (the type Roman);
Strings;
Ada-style quoted strings;
Base64 encoding;
Object identifiers and distinguished names;
RFC 8439 (ChaCha20 cipher, Poly1305 digest, AEAD);
UTF-8 encoded strings and conversions to older encoding standards;
Unicode maps and sets;
Wildcard pattern matching.

The major differences to the standard Image/Value attributes and Text_IO procedures are:

For numeric types, the base is neither written nor read. For instance, output of 23 as hexadecimal gives 17, not 16#17#.
Get procedures do not skip blank characters around input tokens, except the cases when the blank characters are required by the syntax.
Get procedures use the current string position pointer, so that they can be consequently called advancing the pointer as the tokes are recognized.
Numeric get procedures allow to specify the expected value range. When the actual value is out of the range then depending on procedure parameters, either Constraint_Error is propagated or the value is forced to the nearest range boundary.
Put procedures also use the current string position pointer, which allows to call them consequently.
The format used for floating-number output is based on the number precision, instead of rather typographic approach of Text_IO. The precision can be specified either as the number of valid digits of the current base (i.e. relative) or as the position of the last valid digit (i.e. absolute). For instance, 12.345678 with relative precision 3 gives 12.3. With absolute precision -3, it gives 12.346.

Strings edit project is a part of the simple components for Ada project and can be obtained with it. Alternatively the latest version can be here.

			ARM		Intel
Download Strings Edit		Platform:	64-	32-	64-	32bit
Fedora packages	precompiled and packaged using RPM
CentOS packages	precompiled and packaged using RPM
Debian packages	precompiled and packaged for dpkg
Ubuntu packages	precompiled and packaged for dpkg
Source distribution (any platform)	strings_3_8.tgz (tar + gzip, Windows users may use WinZip)

1. Input from String

1.1. Get procedures

Get procedures are used to scan strings. The first two parameters are always Source and Pointer. Source is the string to be scanned. Pointer indicates the current position. After successful completion it is advanced to the first string position following the recognized item. The value of Pointer shall be in the range Source'First..Source'Last+1. The Layout_Error exception is propagated when this check fails. The third parameter usually accepts the value. The following example shows how to use get procedures:

package Edit_Float is new Float_Edit (Float); use Edit_Float; . . . Line : String (1..512); -- A line to parse Pointer : Integer; Value : Float; TabAndSpace : Ada.Strings.Maps.Character_Set := To_Set (" " & Ada.Characters.Latin_1.HT); begin . . . Pointer := Line'First; Get (Line, Pointer, TabAndSpace); -- Skip tabs and spaces Get (Line, Pointer, Value); -- Get number Get (Line, Pointer, TabAndSpace); -- Skip tabs and spaces . . .

The numeric get procedures have additional parameters controlling the range of the input value. The parameters First and Last define the range of the expected value. The exception Constraint_Error is propagated when the value is not in the range. The exception can be suppressed using the parameters ToFirst and ToLast, which cause the input value to be substituted by the corresponding margin when the parameter is True.

The numeric get procedures may have the parameter Base of the subtype NumberBase. The parameter defines the base of the expected number (2..16). Note that the base specification may not appear in the input.

1.2. Value functions

Each get procedure returning some value has a corresponding function Value . The function Value has the same parameter profile with the exception that the parameter Pointer is absent and the value is returned via result. Unlike Get the function Value tolerates spaces and tabs around the converted value. The whole string should be matched, otherwise, the exception Data_Error is propagated.

2. Output into String

2.1. Put procedures

Put procedures place something into the output string Destination. The string is written starting from Destination (Pointer). The parameter Field defines the output size. When it has the value zero, then the output size is defined by the output item. Otherwise the output is justified within the field and the parameter Justify specifies output alignment and the parameter Fill gives the pad character. When Field is greater than Destination'Last - Pointer + 1, the latter is used instead. After successful completion Pointer is advanced to the first character following the output or to Destination'Last + 1.

The numeric put procedures may have the parameter Base of the subtype NumberBase. The parameter defines the base of the output (2..16). Note that the base specification will not appear in the output.

2.2. Image functions

Image functions convert a value into string. Unlike standard S'Image they do not place an extra space character.

3. String I/O

The package Strings_Edit provides basic tools for string I/O.

procedure Get ( Source : String; Pointer : in out Integer; Blank : Character := ' ' );

This procedure skips the character Blank starting from Source (Pointer). Pointer is advanced to the first non-Blank character or to Source'Last + 1. The exception Layout_Error is propagated if the value of Pointer is not in the range Source'First..Source'Last + 1.

procedure Get ( Source : String; Pointer : in out Integer; Blanks : Character_Set );

This procedure skips all the characters of the set Blanks starting from Source (Pointer). Pointer is advanced to the first non-blank character or to Source'Last + 1. The exception Layout_Error is propagated if the value of Pointer is not in the range Source'First..Source'Last + 1. See also Strings_Edit.UTF8.Maps.Get, which is an UTF-8 equivalent of this subprogram.

procedure Put ( Destination : in out String; Pointer : in out Integer; Value : Character; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' );

This procedure places the specified character (Value parameter) into the output string Destination. The string is written starting from the Destination (Pointer). The exception Layout_Error is propagated if the value of Pointer is not in Destination'Range or there is no room for the output.

procedure Put ( Destination : in out String; Pointer : in out Integer; Value : String; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' );

This procedure places the specified by the Value parameter string into the output string Destination. The string is written starting from the Destination (Pointer). The exception Layout_Error is propagated if the value of Pointer is not in Destination'Range or there is no room for the output.

The package also provides the following string operations:

function Is_Prefix (Prefix, Source : String) return Boolean;

This function returns true if Prefix is a prefix of Source. An empty string is a prefix of any string.

function Is_Prefix (Prefix, Source : String; Pointer : Integer) return Boolean;

This function returns true if Prefix is a prefix of Source (Pointer..Source'Last). An empty string is a prefix of any substring. The result is false if Pointer is not in the range Source'First..Source'Last + 1.

function Is_Prefix ( Prefix, Source : String; Map : Character_Mapping ) return Boolean;

This function returns true if Prefix is a prefix of Source with respect to the mapping represented by Map. An empty string is a prefix of any string.

function Is_Prefix ( Prefix, Source : String; Pointer : Integer; Map : Character_Mapping ) return Boolean;

This function returns true if Prefix is a prefix of Source (Pointer..Source'Last)with respect to the mapping represented by Map. An empty string is a prefix of any substring. The result is false if Pointer is not in the range Source'First..Source'Last + 1.

function Trim ( Source : String; Blank : Character := ' ' ) return String;

This function returns the content of Source with the character Blank removed from both ends of.

function Trim ( Source : String; Blanks : Character_Set ) return String;

This function returns the content of Source with the characters from the set Blanks removed from both ends of. See also Strings_Edit.UTF8.Maps.Trim which is an UTF-8 equivalent of this procedure.

3.1. Quoted strings

The child package Strings_Edit.Quoted provides functions for handling quoted strings. A quoted string is put in quotation marks, while each quotation mark within the string is doubled. This allows unambiguously restore the original string from its quotation.

function Get_Quoted ( Source : String; Pointer : access Integer; Mark : Character := '"' ) return String;

This function gets a quoted string. String (Pointer.all) is the first character of the string. Pointer is advanced to the the first character following the input, note that it is an access to integer rather than pöain integer, because functions in Ada cannot have in out parameters. The parameter Marks specifies the quotation marks to use. Within the body of a quoted text this character is doubled. The result is the original quoted text with quotation marks around it removed. The quotation marks within the text are halved. The exception Data_Error is propagated when the string at Pointer.all does not contain a Mark character or else when no closing Mark character appears before the string end. The exception Layout_Error is propagated if the value of Pointer.all is not in the range Source'First..Source'Last + 1.

procedure Put_Quoted ( Destination : in out String; Pointer : in out Integer; Text : String; Mark : Character := '"'; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' );

This procedure puts Text in Mark quotes and places the result into String starting from the position indicated by Pointer. Pointer is advanced to the the first character following the output. Mark characters are doubled within the string body. The exception Layout_Error is propagated if there is no room for output or Pointer is not in Source'First..Source'Last + 1.

function Quote ( Text : String; Mark : Character := '"' ) return String;

This function returns Text quoted using the Mark character.

4. Roman I/O

The child package Roman_Edit provides I/O routines for roman numbers. The type Roman is defined there as follows:

type Roman is range 1..3999;

The following subroutines are declared for the type:

procedure Get ( Source : in String; Pointer : in out Integer; Value : out Roman; First : Roman := Roman'First; Last : Roman := Roman'Last; ToFirst : Boolean := False; ToLast : Boolean := False );

This procedure gets a roman number from the string Source. The process starts from Source (Pointer). The exception Constraint_Error is propagated if the number is not in the range First..Last. Data_Error indicates a syntax error in the number. End_Error is raised when no number was detected. Layout_Error is propagated when Pointer is not in the range Source'First .. Source'Last + 1. See also description of get procedures.

function Value ( Source : String; First : Roman := Roman'First; Last : Roman := Roman'Last; ToFirst : Boolean := False; ToLast : Boolean := False ) return Roman;

This function gets the roman number from the string Source. The number can be surrounded by spaces and tabs. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. Also Data_Error indicates a syntax error in the number. The exception Constraint_Error is propagated if the number is not in the range First..Last. End_Error is raised when no number was detected.

procedure Put ( Destination : in out String; Pointer : in out Integer; Value : Roman; LowerCase : Boolean := False; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' );

This procedure places the number specified by the parameter Value into the output string Destination. The string is written starting from Destination (Pointer). The parameter LowerCase determines whether upper or lower case letters should be used. The exception Layout_Error is propagated when Pointer is not in Destination'Range or there is no room for the output.

function Image ( Value : Roman; LowerCase : Boolean := False ) return String;

This function converts Value to string. The parameter LowerCase indicates whether upper or lower case letters shall be used.

5. Integer I/O

The package Strings_Edit has a generic child package Integer_Edit:

generic type Number is range <>; package Strings_Edit.Integer_Edit is ...

It is parameterized by an integer type. There is also package Strings_Edit.Integers which is an instance of Integer_Edit with the type Integer as the parameter. The generic package has the following subprograms:

procedure Get ( Source : in String; Pointer : in out Integer; Value : out Number'Base; Base : NumberBase := 10; First : Number'Base := Number'First; Last : Number'Base := Number'Last; ToFirst : Boolean := False; ToLast : Boolean := False );

This procedure gets an integer number from the string Source. The process starts from Source (Pointer). The parameter Base indicates the base of the expected number. The exception Constraint_Error is propagated if the number is not in the range First..Last. Data_Error indicates a syntax error in the number. End_Error is raised when no number was detected. Layout_Error is propagated when Pointer is not in the range Source'First .. Source'Last + 1. See also description of get procedures.

function Value ( Source : String; Base : NumberBase := 10; First : Number'Base := Number'First; Last : Number'Base := Number'Last; ToFirst : Boolean := False; ToLast : Boolean := False ) return Number'Base;

This function gets an integer number from the string Source. The number can be surrounded by spaces and tabs. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. Also Data_Error indicates a syntax error in the number. The exception Constraint_Error is propagated if the number is not in the range First..Last. End_Error is raised when no number was detected.

procedure Put ( Destination : in out String; Pointer : in out Integer; Value : Number'Base; Base : NumberBase := 10; PutPlus : Boolean := False; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' );

This procedure places the number specified by the parameter Value into the output string Destination. The string is written starting from Destination (Pointer). The parameter Base indicates the number base used for the output. The base itself does not appear in the output. The parameter PutPlus indicates whether the plus sign should be placed if the number is positive. The exception Layout_Error is propagated when Pointer is not in Destination'Range or there is no room for the output. For example the code:

Text : String (1..20) := (others =>'#'); Pointer : Positive := Text'First; . . . Put (Text, Pointer, 5, 2, True, 10, Center, '@');

will set Pointer to 11 and overwrite the first 10 characters of the string Text:

@ @ @ + 1 0 1 @ @ @ # # # # # # # # # #

function Image ( Value : Number'Base; Base : NumberBase := 10; PutPlus : Boolean := False ) return String;

This function converts Value to string. The parameter Base indicates the number base used for the output. The base itself does not appear in the output. The parameter PutPlus indicates whether the plus sign should be placed if the number is positive.

The package Strings_Edit.Integers is an instance of Strings_Edit.Integer_Edit with the type Integer as the parameter.

6. Floating-point I/O

The package Strings_Edit has a generic child package Float_Edit:

generic type Number is digits <>; package Strings_Edit.Float_Edit is ...

The package is parametrized by a floating-point type. There is also package Strings_Edit.Floats which is an instance of Float_Edit with the type Float as the parameter. The package defines the following subprograms:

procedure Get ( Source : in String; Pointer : in out Integer; Value : out Number'Base; Base : NumberBase := 10; First : Number'Base := Number'First; Last : Number'Base := Number'Last; ToFirst : Boolean := False; ToLast : Boolean := False );

This procedure gets a number from the string Source. The process starts from Source (Pointer). The number in the string may be in either floating-point or fixed-point format. The point may be absent. The mantissa can have base 2..16 (defined by the parameter Base). The exponent part (if appears) is introduced by 'e' or 'E'. It is always decimal of Base radix. Space characters are allowed between the mantissa and the exponent part as well as in the exponent part around the exponent sign. If Base has the value 15 or 16 the exponent part shall be separated by at least one space character from the mantissa. The exception Constraint_Error is propagated if the number is not in the range First..Last. Data_Error indicates a syntax error in the number. End_Error is raised when no number was detected. Layout_Error is propagated when Pointer is not in the range Source'First .. Source'Last + 1. See also description of get procedures.

function Value ( Source : String; Base : NumberBase := 10; First : Number'Base := Number'First; Last : Number'Base := Number'Last; ToFirst : Boolean := False; ToLast : Boolean := False ) return Number'Base;

This function gets a floating-point number from the string Source. The number can be surrounded by spaces and tabs. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. Also Data_Error indicates a syntax error in the number. The exception Constraint_Error is propagated if the number is not in the range First..Last. End_Error is raised when no number was detected.

procedure Put ( Destination : in out String; Pointer : in out Integer; Value : Number'Base; Base : NumberBase := 10; PutPlus : Boolean := False; RelSmall : Positive := MaxSmall; AbsSmall : Integer := -MaxSmall; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' );

This procedure places the number specified by the parameter Value into the output string Destination. The string is written starting from Destination (Pointer). The parameter Base indicates the number base used for the output. Base itself does not appear in the output. The exponent part (if used) is always decimal. PutPlus indicates whether the plus sign should be placed if the number is positive. There are two ways to specify the output precision:

The parameter RelSmall determines the number of the right Base places of the mantissa. For instance, with RelSmall=3 and Base=10, the number 1.234567e+01 is represented as 12.3.
The parameter AbsSmall determines the rightmost correct digit of the value. For example, with AbsSmall=0, Base=10, the number 1.234567e+01 is represented as 12 (i.e. 12.34567 rounded to 10⁰).

From two parameters RelSmall and AbsSmall, the procedure chooses one, that specifies the minimal number of mantissa digits, but no more than the machine representation of the number allows. If the point would appear in the rightmost position it is omitted. The pure zero is always represented as 0. If the desired number of digits may be provided in the fixed-point format then the exponent part is not used. For example, 1.234567e-04 gives 0.0001234567 because fixed- and floating-point formats have the same length. But 1.234567e-05 will be shown in the floating-point format. For bases 15 and 16 the exponent part is separated from the mantissa by space (to avoid ambiguity: F.Ee+2 is F.EE + 2 or F.E * 16**2?). The exception Layout_Error is propagated when Pointer is not in Destination'Range or there is no room for the output.

function Image ( Value : Number'Base; Base : NumberBase := 10; PutPlus : Boolean := False RelSmall : Positive := MaxSmall; AbsSmall : Integer := -MaxSmall; ) return String;

This procedure converts the parameter Value to String. The parameter Base indicates the number base used for the output. Base itself does not appear in the output. The exponent part (if used) is always decimal. PutPlus indicates whether the plus sign should be placed if the number is positive. For precision parameters see Put.

The package Strings_Edit.Floats is an instance of Strings_Edit.Float_Edit with Float as the parameter. The package Strings_Edit.Long_Floats is an instance of Strings_Edit.Float_Edit with Long_Float as the parameter.

7. UTF-8

The package Strings_Edit.UTF8 is the parent package for dealing with Unicode Transformation Format UTF-8 encoded strings. Ada 95 supports Latin-1 (type Character) and UCS-2 (Wide_Character) of ISO 10646 BMP. Ada 2005 introduces UCS-4 encoding (Wide_Wide_Character). This variety of encodings when used in one program imposes certain difficulties. Further many applications and libraries use rather UTF-8, which has sufficient advantages over UCS. For these reasons UTF-8 support is provided here.

Since UTF-8 was designed for backward compatibility with 7-bit ASCII applications and is a multi-byte encoding format, I chose not to introduce a separate string type for UTF-8. Conventional Ada strings are used instead. It is important to note:

The set of Latin-1 code points (Ada Character) 0..FF₁₆ is a subset of UCS-2 code points (Wide_Character) 0..FFFF₁₆, which in turn is a subset of UTF-8 code points 0..10FFFF₁₆;
Not every sequence of bytes represents a valid UTF-8 encoded string;
Latin-1 characters with the code points 80₁₆..FF₁₆ have two bytes representation in UTF-8;
UCS-2 characters may require up to 3 bytes in UTF-8.

The package defines the type UTF8_Code_Point that represents the Unicode code space:

type Code_Point is mod 2**32; subtype UTF8_Code_Point is Code_Point range 0..16#10FFFF#;

The following subroutines are provided by the package:

procedure Get ( Source : String; Pointer : in out Integer; Value : out UTF8_Code_Point );

This procedure decodes one UTF-8 code point from the string Source. It starts at Source (Pointer). After successful completion Pointer is advanced to the first character following the input. The result is returned through the parameter Value.

*Exceptions*
Data_Error	Illegal UTF-8 string Source
End_Error	Nothing found. Pointer = Source'Last + 1
Layout_Error	Pointer is not in Source'First..Source'Last + 1

procedure Get_Backwards ( Source : String; Pointer : in out Integer; Value : out UTF8_Code_Point );

This procedure decodes one UTF-8 code point from the string Source in reverse. It starts at Source (Pointer - 1) assuming that it is the last octet of an UTF-8 encoded character. After successful completion Pointer is moved to the first character of the input. The result is returned through the parameter Value.

*Exceptions*
Data_Error	Illegal UTF-8 string Source
End_Error	Nothing found. Pointer = Source'First
Layout_Error	Pointer is not in Source'First..Source'Last + 1

function Image (Value : UTF8_Code_Point) return String;

This function is a simplified version of the procedure Put. It returns UTF-8 encoded Value.

function Length (Source : String) return Natural;

This procedure evaluates the length of a UTF-8 encoded string in code points. Data_Error is propagated when Source is not a valid UTF-8 string.

procedure Put ( Destination : in out String; Pointer : in out Integer; Value : UTF8_Code_Point );

This procedure puts one UTF-8 code point into the string Source starting from the position Source (Pointer). Pointer is then advanced to the first character following the output. Layout_Error is propagated when Pointer is not in Destination'Range or there is no room for output. Note that parameters Field, Justify and Fill usual for other Put-procedures would have no meaning here.

procedure Skip ( Source : String; Pointer : in out Integer; Count : Natural := 1 );

This procedure skips Count UTF-8 encoded code points in the string Source starting from Source (Pointer). After successful completion Pointer indicates is the first character following the skipped UTF-8 encoded sequence.

*Exceptions*
Data_Error	Illegal UTF-8 string Source
End_Error	Less than Count characters detected before the string end
Layout_Error	Pointer is not in Source'First..Source'Last + 1

function Value (Source : String) return UTF8_Code_Point;

This function decodes one UFT-8 code point stored in Source. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. It is also propagated when Source is not a legal UTF-8 string.

type Code_Points_Range is record Low : UTF8_Code_Point; High : UTF8_Code_Point; end record;

This type represents a range of code points Low..High:

Full_Range : constant Code_Points_Range;

A range that contains all code points.

type Code_Points_Ranges is array (Positive range <>) of Code_Points_Range;

An array of code points ranges.

7.1. Handling UTF-8 strings

The package Strings_Edit.UTF8.Handling provides the following conversion functions between UTF-8 encoded strings and Ada strings:

function To_String (Value : String) return String; function To_String ( Value : String; Substitute : Character ) return String;

These functions convert a UTF-8 encoded string to Latin-1 character string (standard Ada string). The parameter Substitute specifies the character that substitutes non-Latin-1 code points in Value. If omitted Constraint_Error is propagated when a non-Latin-1 code point appears in Value.

*Exceptions*
Constraint_Error	Non-Latin-1 code point detected
Data_Error	Illegal UTF-8 string Value

function To_UTF8 (Value : Character ) return String; function To_UTF8 (Value : String ) return String; function To_UTF8 (Value : Wide_Character) return String; function To_UTF8 (Value : Wide_String ) return String;

These functions convert the parameter Value to a UTF-8 encoded string. The parameter can be Character, String, Wide_Character or Wide_String. The result of a character conversion can be from 1 to 3 bytes long. Note that Ada's Character has Latin-1 encoding which differs from UTF-8 in the code positions greater than 127.

function To_Wide_String (Value : String) return Wide_String; function To_Wide_String ( Value : String; Substitute : Wide_Character ) return Wide_String;

These functions convert a UTF-8 encoded string to UCS-2 character string (Ada's Wide_String). The parameter Substitute specifies the character that substitutes non-UCS-2 code positions in Value. If omitted Constraint_Error is propagated when a non-UCS-2 code point appears in Value.

*Exceptions*
Constraint_Error	Non-UCS-2 code point detected
Data_Error	Illegal UTF-8 string Value

7.2. Generic integer I/O of UTF-8 strings

The package Strings_Edit.UTF8.Integer_Edit provides integer I/O for special encodings of digits, such as subscript and superscript.

generic type Number is range <>; with procedure Get_Digit ( Source : String; Pointer : in out Integer; Digit : out Natural ) is <>; with procedure Get_Sign ( Source : String; Pointer : in out Integer; Sign_Of : out Sign ) is <>; with procedure Put_Digit ( Destination : in out String; Pointer : in out Integer; Digit : Script_Digit ) is <>; with procedure Put_Sign ( Destination : in out String; Pointer : in out Integer; Sign_Of : Sign ) is <>; package Strings_Edit.UTF8.Integer_Edit is ...

The generic parameters of the package are:

Number is the integer type to use;
Get_Digit is the procedure to get one UTF-8 encoded digit from the string Source. The result is greater than 9 if the no digit was detected. Otherwise Pointer is advanced to the first character following the input in Source. Get_Digit can but is not obliged to check whether Pointer is in Source'Range.
Get_Sign gets UTF-8 encoded sign from the string Source. Pointer is advanced to the first character following the input in Source. Same as Get_Digit it is not obliged to check whether Pointer is in Source'Range.
Put_Digit places one UTF-8 encoded digit into the string Destination. The character Destination (Pointer) is the last character of the output. Pointer is moved back to the character preceding the first character of the output. Layout_Error is propagated when Destination has no room for output.
Get_Sign places UTF-8 encoded sign into Destination. It works similar to Put_Digit.

The package provides the following procedures and functions:

procedure Get           ( Source : in String;              Pointer : in out Integer;              Value   : out Number'Base;              Base    : Script_Base := 10;              First   : Number'Base := Number'First;              Last    : Number'Base := Number'Last;              ToFirst : Boolean     := False;              ToLast : Boolean     := False           );
function Value       ( Source : String;          Base : Script_Base := 10;          First : Number'Base := Number'First;          Last : Number'Base := Number'Last;          ToFirst : Boolean := False;          ToLast : Boolean := False       ) return Number'Base;
procedure Put         ( Destination : in out String;            Pointer    : in out Integer;            Value    : Number'Base;            Base       : Script_Base := 10;            PutPlus : Boolean := False         );
function Image       ( Value   : Number'Base;         Base    : Script_Base := 10;         PutPlus : Boolean := False       ) return String;

These subroutines work exactly as ones of String_Edit.Integer_Edit with the difference that the number base is specified by the parameter of Script_Base type defined in Strings_Edit.UTF8 as an integer type with the range 2..10.

7.3. Subscript UTF-8 integer I/O

The generic package Strings_Edit.UTF8.Subscript.Integer_Edit is a specialization of Strings_Edit.UTF8.Integer_Edit for integer I/O of subscript numbers.

generic type Number is range <>; package Strings_Edit.UTF8.Subscript.Integer_Edit is ...

The package provides the subroutines described in Strings_Edit.UTF8.Integer_Edit.

A necessary note. If you plan to use sub- and superscripts under Microsoft Windows XP, you probably will have a problem with displaying the corresponding glyphs. The reason for this is that the standard Windows font Tahoma does not contain glyphs for Unicode sub- and superscripts. You can solve this problem by choosing a font which has them. A good candidate, very close to Tahoma is Arial Unicode MS. Go to the Control Panel → Display → Appearance → Advanced and there change the font by clicking on the corresponding sample texts.

This package has a non-generic instantiation with the type Integer: Strings_Edit.Integers.Subscript.

7.4. Superscript UTF-8 integer I/O

The generic package Strings_Edit.UTF8.Superscript.Integer_Edit is a specialization of Strings_Edit.UTF8.Integer_Edit for integer I/O of superscript numbers.

generic type Number is range <>; package Strings_Edit.UTF8.Superscript.Integer_Edit is ...

The package provides the subroutines described in Strings_Edit.UTF8.Integer_Edit.

This package has a non-generic instantiation with the type Integer: Strings_Edit.Integers.Superscript.

7.5. Wildcard-matching of UTF-8 strings

The package Strings_Edit.UTF8.Wildcards is provides the following subprograms:

function Match ( Text : String; Pattern : String; Wide_Space : Boolean := False; Blanks : Character_Set := SpaceAndTab ) return Boolean;

The function matches the string Text against the wildcard pattern Pattern. Both Text and Pattern are UTF-8 encoded strings. Pattern may contain asterisk characters (*) treated as wildcards to match any (possibly empty) sequence of UTF-8 characters. The number of wildcards in Pattern is not limited. Additionally if the parameter Wide_Space is true, space characters in Pattern match any non-empty sequence of characters from the set Blanks. Note that when Blanks contain non-ASCII characters (with the code points 128..255), those will match any UTF-8 characters starting with this octet. When Wide_Space is false Blanks is ignored and space matches as an ordinal character. The result of the function is true when Pattern matches all Text. A typical use of this function is to filter file names using patterns like *.txt. The result is undefined (either true or false) when Text and/or Pattern are illegal UTF-8 strings.

function Match ( Text : String; Pattern : String; Map : Unicode_Mapping; Wide_Space : Boolean := False; Blanks : Character_Set := SpaceAndTab ) return Boolean;

This function has one additional parameter Map which is the mapping used to convert two non-blank code points before they are compared. Blank code points when Wide_Space is true are compared as-is. The following example to match ignoring case the function can be used as follows:

Match (Text, Pattern, Strings_Edit.UTF8.Maps.Constants.Lower_Case_Map);

The package Strings_Edit.UTF8.Wildcards.Case_Insensitive is provides the function

function Match_Insensitive ( Text : String; Pattern : String; Wide_Space : Boolean := False; Blanks : Character_Set := SpaceAndTab ) return Boolean;

implemented the described above way.

7.6. Case mapping

The package Strings_Edit.UTF8.Mapping provides Unicode mapping of code points and UTF-8 encoded strings. It provides the following subprograms:

function Has_Case (Value : UTF8_Code_Point) return Boolean;

The function returns true if the code point Value has upper or lower case equivalents different from Value. For all x, Has_Case (x) = Is_Lowercase (x) or Is_Uppercase (x). Note that not all Unicode code points have equivalents (simple case mapping). Also there exist points different from either of the equivalents. I.e. x /= To_Lower (x) and x /= To_Upper (x). Refer to Unicode standard for further information.

function Is_Lowercase (Value : UTF8_Code_Point) return Boolean;

The function returns true if the code point Value is lower case.

function Is_Uppercase (Value : UTF8_Code_Point) return Boolean;

The function returns true if the code point Value is upper case.

function To_Lowercase (Value : UTF8_Code_Point) return UTF8_Code_Point;

The function returns a lowercase equivalent of Value. The result is Value if no equivalent exists.

function To_Lowercase (Value : String) return String;

The function converts its argument to lower case. Constraint_Error is propagated when Value is an illegal UTF-8 string.

function To_Uppercase (Value : UTF8_Code_Point) return UTF8_Code_Point;

The function returns an uppercase equivalent of Value. The result is Value if no equivalent exists.

function To_Uppercase (Value : String) return String;

The function converts its argument to upper case. Constraint_Error is propagated when Value is an illegal UTF-8 string.

Implementation nodes. The implementation is based on the upper and lower case mappings as defined by the Unicode standard. Presently these mappings are specified in the UnicodeData.txt file which can be downloaded at the link. It contains about three thousand of Unicode code points which have upper or lower case equivalents. The implementation has an internal sorted array of mappings searched binary. I.e. the efficiency is O(log₂3·10³). In case of future changes and extensions of this file, the subdirectory test_strings_edit contains a an utility program strings_edit-utf8-mapping_generator.adb which can be used to adjust the implementation of the package. In order to do this, the utility must be built. Using GNAT Ada compiler for instance:

>gnatmake -I ../ strings_edit-utf8-mapping_generator.adb

Then it is called as follows:

>strings_edit-utf8-mapping_generator ../strings_edit-utf8-mapping.adb UnicodeData.txt

This will replace the internal representation of Unicode case mappings in the source code of the package Strings_Edit.UTF8.Categorization.

7.7. Unicode categorization

The package Strings_Edit.UTF8.Categorization provides code points categorization as defined by the Unicode standard. The enumeration type General_Category represents the categories:

type General_Category is (Lu, ...);

The type has the following values:

Value	Description	Value	Description
`Lu`	Uppercase letter	`Sm`	Math symbol
`Ll`	Lowercase letter	`Sc`	Currency symbol
`Lt`	Titlecase letter	`Sk`	Modifier symbol
`Lm`	Modifier letter	`So`	Other symbol
`Lo`	Other letter	`Zs`	Space separator
`Mn`	Non-spacing mark	`Zl`	Line separator
`Mc`	Spacing combining mark	`Zp`	Page separator
`Me`	Enclosing mark	`Cc`	Control
`Nd`	Decimal digit (number)	`Cf`	Format
`Nl`	Letter (number)	`Cs`	Surrogate
`No`	Other number	`Co`	Private use
`Pc`	Connector punctuation	`Cn`	Not assigned
`Pd`	Dash punctuation
`Ps`	Open punctuation
`Pe`	Close punctuation
`Pi`	Initial quote punctuation
`Pf`	Final quote punctuation
`Po`	Other punctuation

subtype Letter is General_Category range Lu..Lo; subtype Mark is General_Category range Mn..Me; subtype Mumber is General_Category range Nd..No; subtype Punctuation is General_Category range Pc..Po; subtype Symbol is General_Category range Sm..So; subtype Separator is General_Category range Zs..Zp; subtype Other is General_Category range Cc..Cn;

function Category (Value : UTF8_Code_Point) return General_Category;

The function returns the category of Value.

Additionally the package defines the following indicator functions for commonly used sets of code points:

function Is_Alphanumeric (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a letter (Lu...Lo) or else a decimal digit (Nd) code point.

function Is_Control (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a control code (Cc) point.

function Is_Digit (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a decimal digit (Nd) code point.

function Is_Identifier_Extend (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is represents a character valid in the body of an Ada 2005 identifier additionally to the characters valid at the identifier beginning (ARM 2.3(3.1/2)).

function Is_Identifier_Start (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is represents a character valid at the beginning of an Ada 2005 identifier (ARM 2.3(3/2)).

function Is_ISO_646 (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is represents an ASCII character (ISO 646, 7-bit).

function Is_Letter (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a letter (Lu..Lo) code point.

function Is_Lower (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a lowercase letter (Ll) code point. This function is equivalent to Is_Lowercase.

function Is_Other_Format (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a format (Cf) code point. Such code points are usually ignored when strings are compared as words. For example, soft hyphen (AD₁₆) has this category.

function Is_Space (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a space (Zs) code point.

function Is_Subscript_Digit (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a subscript decimal digit code point.

function Is_Superscript_Digit (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a superscript decimal digit code point.

function Is_Title (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a title case letter (Lt) code point.

function Is_Upper (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is an uppercase letter (Lu) code point. This function is equivalent to Is_Uppercase.

Implementation nodes. The implementation is based on the general category values defined by the Unicode standard. Presently these mappings are specified in the UnicodeData.txt file which can be downloaded at the link. In case of future changes and extensions of this file, the subdirectory test_strings_edit contains a an utility program strings_edit-utf8-categorization_generator.adb which can be used to adjust the implementation of the package. In order to do this, the utility must be built. Using GNAT Ada compiler for instance:

>gnatmake -I ../ strings_edit-utf8-categorization_generator.adb

Then it is called as follows:

>strings_edit-utf8-categorization_generator ../strings_edit-utf8-categorization.adb UnicodeData.txt

This will replace the internal representation of Unicode case mappings in the source code of the package Strings_Edit.UTF8.Categorization.

7.8. Blocks

The package Strings_Edit.UTF8.Blocks provides ranges of code points for the Unicode blocks. See Blocks.txt file. The names of the ranges used in the package match the names used Blocks.txt after substitution spaces and hyphens to underline. For example, "Basic Latin" of Blocks.txt is named Basic_Latin in the package. The ranges of code points can be used for construction of code points sets (see To_Set and set-theoretic operations declared in Strings_Edit.UTF8.Maps).

7.9. Sets and maps

The package Strings_Edit.UTF8.Maps provides sets and maps of code points. The package mimics the standard library package Ada.Strings.Maps (ARM A.4.2) augmented for dealing with Unicode in UTF-8 encoding. The operations of Ada.Strings.Maps are extended onto the cases when sets and ranges are intermixed. Because Unicode sets can be potentially very large the implementation supports composition of an indicator function with a set of sorted ranges in order to reduce required space by conjunction (and). Similarly for maps two representations are supported. One by a sorted array and another by a function. Reference counting is used to provide efficient assignments of sets and maps.

7.9.1. Sets

The type Unicode_Set represents sets of code points.

type Unicode_Set is private;

The following type defines an access to indicator function of a code points set.:

type Unicode_Indicator_Function is access function (Value : UTF8_Code_Point) return Boolean;

The type Unicode_Set has the following operations defined:

function "not" (Right : Unicode_Set) return Unicode_Set; function "not" (Right : String) return Unicode_Set; function "not" (Right : Code_Points_Range) return Unicode_Set; function "and" (Left, Right : Unicode_Set) return Unicode_Set; function "and" (Left : Unicode_Set; Right : Code_Points_Range) return Unicode_Set; function "and" (Left : Code_Points_Range; Right : Unicode_Set) return Unicode_Set; function "and" (Left : Unicode_Set; Right : String) return Unicode_Set; function "and" (Left : String; Right : Unicode_Set) return Unicode_Set; function "or" (Left, Right : Unicode_Set) return Unicode_Set; function "or" (Left : Unicode_Set; Right : Code_Points_Range) return Unicode_Set; function "or" (Left : Code_Points_Range; Right : Unicode_Set) return Unicode_Set; function "or" (Left : Unicode_Set; Right : String) return Unicode_Set; function "or" (Left : String; Right : Unicode_Set) return Unicode_Set; function "xor" (Left, Right : Unicode_Set) return Unicode_Set; function "xor" (Left : Unicode_Set; Right : Code_Points_Range) return Unicode_Set; function "xor" (Left : Code_Points_Range; Right : Unicode_Set) return Unicode_Set; function "xor" (Left : Unicode_Set; Right : String) return Unicode_Set; function "xor" (Left : String; Right : Unicode_Set) return Unicode_Set; function "-" (Left, Right : Unicode_Set) return Unicode_Set; function "-" (Left : Unicode_Set; Right : Code_Points_Range) return Unicode_Set; function "-" (Left : Code_Points_Range; Right : Unicode_Set) return Unicode_Set; function "-" (Left : Unicode_Set; Right : String) return Unicode_Set; function "-" (Left : String; Right : Unicode_Set) return Unicode_Set;

These functions provide set-theoretic operations on two sets or a set and a range of points or else an UTF-8 encoded string. When one of the arguments is a string then it is treated as a set consisting of the code points found in the string. Data_Error is propagated when a string parameter is not a valid UTF-8 string. A - B is defined as A and not B. A range of points is considered empty if its lower bound is higher than the upper bound.

function "=" (Left, Right : Unicode_Set) return Boolean; function "=" (Left : Unicode_Set; Right : Code_Points_Range) return Boolean; function "=" (Left : Code_Points_Range; Right : Unicode_Set) return Boolean; function "=" (Left : Unicode_Set; Right : String) return Boolean; function "=" (Left : String; Right : Unicode_Set) return Boolean; function "<" (Left, Right : Unicode_Set) return Boolean; function "<" (Left : Unicode_Set; Right : Code_Points_Range) return Boolean; function "<" (Left : Code_Points_Range; Right : Unicode_Set) return Boolean; function "<" (Left : Unicode_Set; Right : String) return Boolean; function "<" (Left : String; Right : Unicode_Set) return Boolean; function "<=" (Left, Right : Unicode_Set) return Boolean; function "<=" (Left : Unicode_Set; Right : Code_Points_Range) return Boolean; function "<=" (Left : Code_Points_Range; Right : Unicode_Set) return Boolean; function "<=" (Left : Unicode_Set; Right : String) return Boolean; function "<=" (Left : String; Right : Unicode_Set) return Boolean; function ">" (Left, Right : Unicode_Set) return Boolean; function ">" (Left : Unicode_Set; Right : Code_Points_Range) return Boolean; function ">" (Left : Code_Points_Range; Right : Unicode_Set) return Boolean; function ">" (Left : Unicode_Set; Right : String) return Boolean; function ">" (Left : String; Right : Unicode_Set) return Boolean; function ">=" (Left, Right : Unicode_Set) return Boolean; function ">=" (Left : Unicode_Set; Right : Code_Points_Range) return Boolean; function ">=" (Left : Code_Points_Range; Right : Unicode_Set) return Boolean; function ">=" (Left : Unicode_Set; Right : String) return Boolean; function ">=" (Left : String; Right : Unicode_Set) return Boolean;

These functions provide relational operations on two sets or a set and a range of points or else an UTF-8 encoded string. When one of the arguments is a string then it is treated as a set consisting of the code points found in the string. Data_Error is propagated when a string parameter is not a valid UTF-8 string. The operations < and <= are defined in the sense ⊂ and ⊆ correspondingly.

function Cardinality (Set : Unicode_Set) return Natural;

This function returns the number of elements in Set.

function Choose ( Set : Unicode_Set; Indicator : Unicode_Indicator_Function ) return Unicode_Set;

This function returns a set consisting of the elements of Set chosen by the function Indicator. When Indicator is null the result is Set. When this function creates a new set, its representation is based on a ranges list and does not refer to Indicator. This should be only be used for compact sets.

procedure Get ( Source : String; Pointer : in out Integer; Blanks : Unicode_Set );

This procedure skips all code points from the set Blank starting from Source (Pointer). After completion Pointer is either Source'Last + 1, or the first character of the first code point outside Blanks, or else to the first improperly encoded character. Layout_Error is propagated when Pointer is not in the range Source'First..Source'Last + 1.

function Is_Empty (Set : Unicode_Set) return Boolean; function Is_Range (Set : Unicode_Set) return Boolean; function Is_Singleton (Set : Unicode_Set) return Boolean; function Is_Universal (Set : Unicode_Set) return Boolean;

These functions test a set for being empty, a range, a singleton or a full set of code points.

function Is_In ( Element : Character; Set : Unicode_Set ) return Boolean; function Is_In ( Element : Wide_Character; Set : Unicode_Set ) return Boolean; function Is_In ( Element : UTF8_Code_Point; Set : Unicode_Set ) return Boolean;

These functions provide membership tests. The first parameter can be a code point, a Latin-1 character or a wide character.

function Is_Subset ( Elements : Code_Points_Range; Set : Unicode_Set ) return Boolean renames "<="; function Is_Subset ( Elements : Code_Points_Ranges; Set : Unicode_Set ) return Boolean; function Is_Subset ( Elements : String; Set : Unicode_Set ) return Boolean renames "<="; function Is_Subset ( Elements : Unicode_Set; Set : Unicode_Set ) return Boolean renames "<=";

These functions provide subset tests. The first parameter can be a code points range, a set, an UTF-8 encoded string. When the parameter is a string then the result true if all code points of the string belong to the set. Data_Error is propagated when it is not a valid UTF-8 string.

function To_Ranges (Set : Unicode_Set) return Code_Points_Ranges;

This function returns an array of disjoint ascending ranges representing the set. The result is an empty array if the parameter is an empty set.

function To_Set (Singleton : UTF8_Code_Point) return Unicode_Set; function To_Set (Singleton : Character) return Unicode_Set; function To_Set (Singleton : Wide_Character) return Unicode_Set; function To_Set (Span : Code_Points_Range) return Unicode_Set; function To_Set (Ranges : Code_Points_Ranges) return Unicode_Set; function To_Set (Low, High : UTF8_Code_Point) return Unicode_Set; function To_Set (Sequence : String) return Unicode_Set; function To_Set (Indicator : Unicode_Indicator_Function) return Unicode_Set;

These functions convert

a code point;
a Latin-1 character;
a wide character,
a range of code points,
an array of code points;
an UTF-8 string;
an indicator function (an access to)

to the corresponding set. When the parameter is a string then the set will contain all and nothing but all characters from the string. Data_Error exception is propagated when the argument is not a properly encoded UTF-8 string. When the parameter is an array of ranges, the result is the union of them. When the parameter specifies an indicator function the result is a set corresponding to the function. It is the universal set when Indicator is null. Differently to Choose the result refers to Indicator.

function To_Sequence (Set : Unicode_Set) return String;

This function returns an UTF-8 encoded string corresponding to the code points of the set. Each code point of Set is represented in the string. They are ordered in ascending order. The following relation holds To_Set (To_Sequence (x)) = x. Constraint_Error is propagated when the result is too large to be represented as a string.

function Trim (Source : String; Blanks : Unicode_Set) return String;

This function returns the content of Source with the characters representing UTF-8 code points from the set Blanks removed from both ends of it. Data_Error is propagated when Source is not a valid UTF-8 string.

generic with function Indicator (Value : UTF8_Code_Point) return Boolean; function Generic_Choose (Set : Unicode_Set) return Unicode_Set;

This is a generic variant of the function Choose.

Null_Set : constant Unicode_Set; Universal_Set : constant Unicode_Set;

The empty and the universal set constants.

To use Unicode_Set effectively one should consider its implementation. A set code points is represented by a conjunction of a sorted array of ranges and an indicator function. A code point belongs to the set when both the indicator function returns true and it is in the array. Such set can be constructed for example like:

Cyrillic_Letters : constant Unicode_Set := To_Set (Is_Letter'Access) and Cyrillic;

Here Is_Letter is an indicator function, which selects only letters. Cyrillic is a range of code points defined in Strings_Edit.UTF8.Blocks. When this set is combined with other sets of ranges using only intersection the implementation will keep this representation. When other operations like complement and disjunction get involved, the representation can be flattened by removing the indicator function from it. For large disjoint sets it might be very inefficient. The set is also flattened when the operation Choose is applied, which might be necessary to do if the indicator function is not declared at the library level, for example. For the example above set of Cyrillic letters could be obtained represented by an array of ranges:

Cyrillic_Letters : constant Unicode_Set := Choose (To_Set (Cyrillic), Is_Letter'Access);

7.9.2. Maps

The type Unicode_Mapping represents a mapping of the set of Unicode code points to itself.

type Unicode_Mapping is private;

The following the type defines access to a mapping function.

type Unicode_Mapping_Function is access function (Value : UTF8_Code_Point) return UTF8_Code_Point;

It has the following operations defined on Unicode_Mapping:

function Is_Prefix ( Prefix : String; Source : String; Map : Unicode_Mapping ) return Boolean;

This function returns true if Prefix is a prefix of Source with respect to the mapping represented by Map. An empty string is a prefix of any string. Data_Error is propagated when Prefix or Source are not properly encoded UTF-8 strings.

function Is_Prefix ( Prefix : String; Source : String; Pointer : Integer; Map : Unicode_Mapping ) return Boolean;

function Value ( Map : Unicode_Mapping; Element : Character ) return UTF8_Code_Point; function Value ( Map : Unicode_Mapping; Element : Wide_Character ) return UTF8_Code_Point; function Value ( Map : Unicode_Mapping; Element : UTF8_Code_Point ) return UTF8_Code_Point;

These functions return the code point corresponding to the parameter Element in the mapping Map. The parameter can be a code point, Latin-1 or wide character.

function To_Domain (Map : Unicode_Mapping) return String;

This function returns an UTF-8 string of ascending code points x such that Value (Map, x) /= x. Constraint_Error is propagated when the result is too large to be represented as a string.

function To_Mapping (From, To : String) return Unicode_Mapping;

This function creates a new mapping. The parameters are UTF-8 encoded strings. For n^th code point of From the resulting mapping yields the n^th code point of To. For all other code points the mapping acts as an identity mapping. When From contains repeating code points or else the numbers of code points in From and To differ Translation_Error is propagated. Data_Error is propagated when From or To is an invalid UTF-8 string.

function To_Mapping (Map : Unicode_Mapping_Function) return Unicode_Mapping;

This function creates a new mapping from a function Map. The result is identity mapping when Map is null.

function To_Range (Map : Unicode_Mapping) return String;

The result is an UTF-8 string of code points x such that the original of x is not x. I.e. x such that Value (Map, y) = x and y /= x. The points in the result are ordered by y. I.e. x₁ precedes x₂ iff y₁ < y₂ and Value (Map, y₁) = x₁, Value (Map, y₂) = x₂. Constraint_Error is propagated when the result is too large to be represented as a string.

Identity : constant Unicode_Mapping;

This mapping maps each code point to itself.

7.9.3. Constants

The package Strings_Edit.UTF8.Maps.Constants defines some commonly used sets:

Alphanumeric_Set : constant Unicode_Set; Blanks_Set : constant Unicode_Set; Control_Set : constant Unicode_Set; Digit_Set : constant Unicode_Set; Identifier_Extend_Set : constant Unicode_Set; Identifier_Start_Set : constant Unicode_Set; ISO_646_Set : constant Unicode_Set; Letter_Set : constant Unicode_Set; Lower_Set : constant Unicode_Set; Other_Format_Set : constant Unicode_Set; Space_Set : constant Unicode_Set; Subscript_Digit_Set : constant Unicode_Set; Superscript_Digit_Set : constant Unicode_Set; Title_Set : constant Unicode_Set; Upper_Set : constant Unicode_Set;

See Strings_Edit.UTF8.Categorization for information about the code points contained by the sets. The package defines the following maps:

Lower_Case_Map : constant Unicode_Mapping; Upper_Case_Map : constant Unicode_Mapping;

7.10. 8-bit encodings

The following packages provide means to recode into and from UTF-8 the legacy encodings which use 8-bit octet as an encoding element. Note that the encoding of Ada Character is also 8-bit ISO/IEC 8859-1. For Ada Character and String the conversions to and from UTF-8 are provided in the package Strings_Edit.UTF8.Handling.

7.10.1. Windows-1250

The package Strings_Edit.UTF8.Windows_1250 subprograms for dealing with Windows-1250 encoding (Central Europe).

function From_Windows_1250 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_Windows_1250 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_Windows_1250 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in Windows-1250 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (129₁₀, 131₁₀, 136₁₀, 144₁₀, 152₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_Windows_1250 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_Windows_1250 ( Value : String; [ Substitute : Character ] ) return String; function To_Windows_1250 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to Windows-1250 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.2. Windows-1251

The package Strings_Edit.UTF8.Windows_1251 subprograms for dealing with Windows-1251 encoding. The encoding provides Cyrillic characters used in Eastern European languages.

function From_Windows_1251 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_Windows_1251 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_Windows_1251 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in Windows-1251 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains the invalid code (152₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_Windows_1251 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_Windows_1251 ( Value : String; [ Substitute : Character ] ) return String; function To_Windows_1251 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to Windows-1251 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.3. Windows-1252

The package Strings_Edit.UTF8.Windows_1252 subprograms for dealing with Windows-1252 encoding. The encoding is largely Latin-1 except for some code positions, so it is different for Ada's Character encoding.

function From_Windows_1252 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_Windows_1252 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_Windows_1252 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in Windows-1252 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (129₁₀, 141₁₀, 143₁₀, 144₁₀, 157₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_Windows_1252 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_Windows_1252 ( Value : String; [ Substitute : Character ] ) return String; function To_Windows_1252 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to Windows-1252 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.4. Windows-1253

The package Strings_Edit.UTF8.Windows_1253 subprograms for dealing with Windows-1253 encoding (Greek language).

function From_Windows_1253 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_Windows_1253 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_Windows_1253 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in Windows-1253 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (129₁₀, 136₁₀, 138₁₀, 140₁₀.. 144₁₀, 152₁₀, 154₁₀, 156₁₀.. 159₁₀, 170₁₀, 210₁₀, 255₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_Windows_1253 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_Windows_1253 ( Value : String; [ Substitute : Character ] ) return String; function To_Windows_1253 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to Windows-1253 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.5. Windows-1254

The package Strings_Edit.UTF8.Windows_1254 subprograms for dealing with Windows-1254 encoding (Turkish language).

function From_Windows_1254 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_Windows_1254 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_Windows_1254 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in Windows-1254 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (129₁₀, 141₁₀..146₁₀, 157₁₀,158₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_Windows_1254 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_Windows_1254 ( Value : String; [ Substitute : Character ] ) return String; function To_Windows_1254 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to Windows-1254 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.6. Windows-1255

The package Strings_Edit.UTF8.Windows_1255 subprograms for dealing with Windows-1255 encoding (Hebrew language).

function From_Windows_1255 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_Windows_1255 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_Windows_1255 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in Windows-1255 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (129₁₀, 138₁₀, 140₁₀..144₁₀, 154₁₀, 156₁₀..159₁₀, 217₁₀..223₁₀, 251₁₀, 252₁₀, 255₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_Windows_1255 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_Windows_1255 ( Value : String; [ Substitute : Character ] ) return String; function To_Windows_1255 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to Windows-1255 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.7. Windows-1256

The package Strings_Edit.UTF8.Windows_1256 subprograms for dealing with Windows-1256 encoding (Arabic script).

function From_Windows_1256 (Value : Character) return Code_Point; function From_Windows_1256 (Value : String ) return String; function From_Windows_1256 (Value : String ) return Wide_String;

These functions convert the parameter Value encoded in Windows-1256 to a code point, UTF-8 or Wide_String.

function To_Windows_1256 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_Windows_1256 ( Value : String; [ Substitute : Character ] ) return String; function To_Windows_1256 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to Windows-1256 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.8. Windows-1257

The package Strings_Edit.UTF8.Windows_1257 subprograms for dealing with Windows-1257 encoding (Baltic languages).

function From_Windows_1257 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_Windows_1257 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_Windows_1257 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in Windows-1257 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (129₁₀, 131₁₀, 136₁₀, 138₁₀, 140₁₀, 144₁₀, 150₁₀, 152₁₀, 154₁₀, 160₁₀, 161₁₀, 165₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_Windows_1257 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_Windows_1257 ( Value : String; [ Substitute : Character ] ) return String; function To_Windows_1257 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to Windows-1257 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with String Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.9. Windows-1258

The package Strings_Edit.UTF8.Windows_1258 subprograms for dealing with Windows-1258 encoding (Vietnamese script).

function From_Windows_1258 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_Windows_1258 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_Windows_1258 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in Windows-1258 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (129₁₀, 138₁₀, 141₁₀..144₁₀, 154₁₀, 157₁₀, 158₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_Windows_1258 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_Windows_1258 ( Value : String; [ Substitute : Character ] ) return String; function To_Windows_1258 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to Windows-1258 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with String Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.10. KOI8

The package Strings_Edit.UTF8.KOI8 subprograms for dealing with KIO8 encoding (RFC 1489). KOI8 was the most used encoding for Latin/Cyrillic alphabet intermix prior to Unicode.

function From_KOI8 (Value : Character) return Code_Point; function From_KOI8 (Value : String ) return String; function From_KOI8 (Value : String ) return Wide_String;

These functions convert the parameter Value encoded in KOI8 to a code point, UTF-8 or Wide_String.

function To_KOI8 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_KOI8 ( Value : String; [ Substitute : Character ] ) return String; function To_KOI8 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to KOI8 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.11. ISO/IEC 8859-2

The package Strings_Edit.UTF8.ISO_8859_2 subprograms for dealing with ISO/IEC 8859-2 encoding (Central European languages).

function From_ISO_8859_2 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_ISO_8859_2 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_ISO_8859_2 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in ISO/IEC 8859-2 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (128₁₀..159₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_ISO_8859_2 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_ISO_8859_2 ( Value : String; [ Substitute : Character ] ) return String; function To_ISO_8859_2 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to ISO/IEC 8859-2 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.12. ISO/IEC 8859-3

The package Strings_Edit.UTF8.ISO_8859_3 subprograms for dealing with ISO/IEC 8859-2 encoding (Southern European languages).

function From_ISO_8859_3 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_ISO_8859_3 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_ISO_8859_3 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in ISO/IEC 8859-3 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (128₁₀..159₁₀, 164₁₀, 174₁₀, 190₁₀, 195₁₀, 208₁₀, 225₁₀, 240₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_ISO_8859_3 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_ISO_8859_3 ( Value : String; [ Substitute : Character ] ) return String; function To_ISO_8859_3 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to ISO/IEC 8859-3 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.13. ISO/IEC 8859-4

The package Strings_Edit.UTF8.ISO_8859_4 subprograms for dealing with ISO/IEC 8859-4 encoding (Northern European languages).

function From_ISO_8859_4 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_ISO_8859_4 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_ISO_8859_4 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in ISO/IEC 8859-4 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (128₁₀..159₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_ISO_8859_4 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_ISO_8859_4 ( Value : String; [ Substitute : Character ] ) return String; function To_ISO_8859_4 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to ISO/IEC 8859-4 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.14. ISO/IEC 8859-5

The package Strings_Edit.UTF8.ISO_8859_5 subprograms for dealing with ISO/IEC 8859-5 encoding (languages using Cyrillic alphabets).

function From_ISO_8859_5 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_ISO_8859_5 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_ISO_8859_5 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in ISO/IEC 8859-5 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (128₁₀..159₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_ISO_8859_5 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_ISO_8859_5 ( Value : String; [ Substitute : Character ] ) return String; function To_ISO_8859_5 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to ISO/IEC 8859-5 encoding. The parameter Substitute specifies the character that substitutes invalid code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.15. ISO/IEC 8859-6

The package Strings_Edit.UTF8.ISO_8859_6 subprograms for dealing with ISO/IEC 8859-6 encoding (Arabic language).

function From_ISO_8859_6 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_ISO_8859_6 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_ISO_8859_6 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in ISO/IEC 8859-6 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (128₁₀..159₁₀, 174₁₀..186₁₀, 188₁₀..190₁₀, 192₁₀, 219₁₀..223₁₀, 243₁₀..255₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_ISO_8859_6 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_ISO_8859_6 ( Value : String; [ Substitute : Character ] ) return String; function To_ISO_8859_6 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to ISO/IEC 8859-6 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.16. ISO/IEC 8859-7

The package Strings_Edit.UTF8.ISO_8859_7 subprograms for dealing with ISO/IEC 8859-7 encoding (Greek language).

function From_ISO_8859_7 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_ISO_8859_7 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_ISO_8859_7 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in ISO/IEC 8859-7 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (128₁₀..159₁₀, 174₁₀, 210₁₀, 255₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_ISO_8859_7 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_ISO_8859_7 ( Value : String; [ Substitute : Character ] ) return String; function To_ISO_8859_7 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to ISO/IEC 8859-7 encoding. The parameter Substitute specifies the character that substitutes invalid code points in Value. If omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with String Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.17. ISO/IEC 8859-8

The package Strings_Edit.UTF8.ISO_8859_8 subprograms for dealing with ISO/IEC 8859-8 encoding (Hebrew language).

function From_ISO_8859_8 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_ISO_8859_8 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_ISO_8859_8 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in ISO/IEC 8859-8 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (128₁₀..159₁₀, 161₁₀, 191₁₀..222₁₀, 251₁₀, 252₁₀, 255₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_ISO_8859_8 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_ISO_8859_8 ( Value : String; [ Substitute : Character ] ) return String; function To_ISO_8859_8 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to ISO/IEC 8859-8 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.18. ISO/IEC 8859-9

The package Strings_Edit.UTF8.ISO_8859_9 subprograms for dealing with ISO/IEC 8859-5 encoding (Turkish languages).

function From_ISO_8859_9 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_ISO_8859_9 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_ISO_8859_9 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in ISO/IEC 8859-9 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (128₁₀..159₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_ISO_8859_9 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_ISO_8859_9 ( Value : String; [ Substitute : Character ] ) return String; function To_ISO_8859_9 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to ISO/IEC 8859-9 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.19. ISO/IEC 8859-10

The package Strings_Edit.UTF8.ISO_8859_10 subprograms for dealing with ISO/IEC 8859-10 encoding (Nordic languages).

function From_ISO_8859_10 ( Value : Character; [ Substitute : Code_Point ] ) return Code_Point; function From_ISO_8859_10 ( Value : String; [ Substitute : Code_Point ] ) return String; function From_ISO_8859_10 ( Value : String; [ Substitute : Wide_Character ] ) return Wide_String;

These functions convert the parameter Value encoded in ISO/IEC 8859-10 to a code point, UTF-8 or Wide_String. Constraint_Error is propagated when Value contains an invalid code (128₁₀..159₁₀). The variants with the parameter Substitute do not raise exception and use the code point specified by the parameter instead.

function To_ISO_8859_10 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_ISO_8859_10 ( Value : String; [ Substitute : Character ] ) return String; function To_ISO_8859_10 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to ISO/IEC 8859-10 encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If Value is omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with string Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.20. Mac OS Roman

The package Strings_Edit.UTF8.MacOS_Roman subprograms for dealing with Mac OS Roman encoding.

function From_MacOS_Roman (Value : Character) return Code_Point; function From_MacOS_Roman (Value : String ) return String; function From_MacOS_Roman (Value : String ) return Wide_String;

These functions convert the parameter Value encoded in Mac OS Roman to a code point, UTF-8 or Wide_String.

function To_MacOS_Roman ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_MacOS_Roman ( Value : String; [ Substitute : Character ] ) return String; function To_MacOS_Roman ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to Mac OS Roman encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with String Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.10.21. ITU T.61

The package Strings_Edit.UTF8.ITU_T61 subprograms for dealing with ITU-T T.61 character set encoding.

function From_ITU_T61 (Value : Character) return Code_Point; function From_ITU_T61 (Value : String ) return String; function From_ITU_T61 (Value : String ) return Wide_String;

These functions convert the parameter Value encoded in T.61 to a code point, UTF-8 or Wide_String.

function To_ITU_T61 ( Value : Code_Point; [ Substitute : Character ] ) return Character; function To_ITU_T61 ( Value : String; [ Substitute : Character ] ) return String; function To_ITU_T61 ( Value : Wide_String; [ Substitute : Character ] ) return String;

These functions convert the parameter Value to T.61encoding. The parameter Substitute specifies the character that substitutes unsupported code points in Value. If omitted Constraint_Error is propagated when an unsupported code point appears in Value. The variant with String Value uses UTF-8 encoding. Data_Error is propagated when Value is an invalid UTF-8 string.

7.11. 16-bit encodings

7.11.1. RADIX-50

The package Strings_Edit.UTF8.RADIX50 subprograms for dealing with DEC RADIX-50 encoding used mainly in the FILES-11 file system. The valid code points of the encoding include capital letters A..Z, digits 0..9, space, dollar ($), point (.) and percent (%).

function From_RADIX50 (Value : Wide_Character) return String;

This function converts the parameter Value from RADIX-50. The source is a sequence of 16-bit elements (words) represented by Wide_Character. Each word (Wide_Character) contains 3 RADIX-50 characters.

function To_RADIX50 ( Value : String; [ Substitute : Character ] ) return Wide_String;

These functions convert Value to a sequence of RADIX-50 words. The parameter Substitute specifies the character that substitutes invalid code points in Value. Missing 1 or 2 characters in the source are padded with spaces.

*Exceptions*
Constraint_Error	Invalid code points appear in Value and no Substitute given
Data_Error	Illegal UTF-8 string Value
Use_Error	Substitute is not one of '`A`'..'`Z`', '`0`'..'`9`', '`$`', '`.`', '`%`'

8. Fields

The package Strings_Edit.Fields can be used to write new Put-procedures, when the output size cannot be easily estimated. It contains two subprograms Get_Output_Field and Adjust_Output_Field. Get_Output_Field is used to calculate the available space in the output string. It raises Layout_Error exception as necessary. The program can then output into that space and call Adjust_Output_Field to move the output within the output field, fill and advance the string pointer. The following code fragment shows how it could be made:

procedure Put ( Destination : in out String; Pointer : in out Integer; Value : Something; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' ) is Out_Field : constant Natural := Get_Output_Field (Destination, Pointer, Field); subtype Output is String (Pointer..Pointer + Out_Field - 1); Text : Output renames Destination (Pointer..Pointer + Out_Field - 1); Index : Integer := Pointer; begin -- -- The output for Value is done in Text using Index as the pointer -- Adjust_Output_Field ( Destination, Pointer, Index, Out_Field, Field, Justify, Fill ); end Put;

9. Generic axis scales

When an axis of the plotted curve need to be annotated with the values, it is desirable that the ticks supplied with values have "good" figures, like 0.5 or 0.1 etc. The generic package Strings_Edit.Generic_Scale can be used to ease implementation of such plotters:

generic type Value is digits <>; package Strings_Edit.Generic_Scale is ...

Its formal parameter is the type of the axis values. The package provides the type Scale:

type Scale is record Minor : Value'Base; Low_Value : Value; Low_Tick : Natural; Ticks : Natural; Small : Integer; end record;

which describes the axis appearance:

here the fields of Scale are:

Low_Value is the value of the first tick within the specified range [Low, High];
Low_Tick is the number of the first tick. It is in 0..Ticks. For the figure above Low_Tick is 4, the fourth and the last minor tick;
Minor is the length of the minor tick;
Ticks is the number of minor ticks between two major ones. On the figure above it is 4;
Small is the absolute precision of the major tick. This value can be passed to as the AbsSmall parameter of Put procedure to output a major tick value. Note that when Small > 0 it is better to use 0 instead, otherwise an exponential notation will appear. Advanced plotting would use Small to extract the multiplier. For instance when Small is 3, i.e. thousands, then the values of the ticks can be multiplied by 10³ before output and AbsSmall=0 used instead of 3. The axis label could then contain 10³ factor.

The function Create evaluates Scale for the given range of values:

function Create (Low, High : Value; Count : Natural) return Scale;

The parameters Low and High determine the interval of values. When Low ≥ High, Constraint_Error is propagated. The parameter Count is the desired number of major ticks on the scale. Typically it is determined from the scale length in the plot units divided to the optimal major tick length in the same units. (Note that the function outcome is independent on whatever plot units are used.) The result number of major ticks is greater or equal to Count. When Count is 0, it is treated as if it were 1. The major tick length is selected to be n·10^k where n=1, 2, 5. Here k determines the field Small of the result. Thus when, for example, n is 2 and k is -3 the values corresponding to the major ticks would be like 0.102, 0.104, 0.106 etc. I.e. the major tick step is 2·10^-3. The number m of minor ticks depends on n It is m=1 when n=1 or n=2 and m=4, when n=5. The field Ticks of the result is m. The field Minor of the result is n·10^k/m.

A typical axis plot using Scale might look as follows:

Ticks : Scale := Create (Low, High, Size / Major_Tick_Size); Minor : Natural := Ticks.Low_Tick; Value : Number := Ticks.Low_Value; begin while position of Value in the plot range [in plot units] loop if Minor = 0 or else Minor > Ticks.Ticks then -- Major tick draw major tick at the position of Value [in plot units] draw its value Image (Value, AbsSmall => Scale.Small); Minor := 1; else -- Minor tick draw minor tick at the position of Value [in plot units] Minor := Minor + 1; end if; Value := Value + Ticks.Minor; end loop;

Note that the field Small specifies the absolute precision. Therefore, very narrow ranges of large absolute values would probably require a shift to avoid ticks values like 956.611, 956.612, 956.613,... (with Small=-3). Such cases can be detected as log₁₀(Low) >> Small. The difference between these two values indicates how many decimal places would appear before the last one, corresponding to the major tick "heartbeat."

10. String streams

The package Strings_Edit.Streams provides an implementation of streams to read from and write to strings. The package declares the type String_Stream:

type String_Stream (Length : Natural) is new Root_Stream_Type with record Position : Positive := 1; Data : String (1..Length); end record;

The field Position is the position at which the string is to read or to write. The field Data is the string backing the stream. When written stream elements are placed into Data starting from Position. Where read they are taken from Data at Position. In both cases Position is advanced. The implementation of Write propagates End_Error exception when there is no room for output. Note that initially the position is set to 1, which means that the stream is ready to be written, but also is filled with garbage. The stream is used as follows:

Input stream scenario: First, the procedure Set is called to set the contents of the stream. Then T'Read and/or T'Input attributes are used to input from the stream. End_Error is propagated when reading is attempted beyond the end of the contents set using Set.
Output stream scenario. The stream is created or else Rewind is called on an existing object. Then T'Write.and/or T'Output attributes are used to output into the stream. Once the contents is set, Get is called to extract it from the stream as a string. End_Error is propagated upon writing outside the stream buffer.
Input/output scenario: The stream is written. Then Rewind is called. After that the written data can be read back.

function Get (Stream : String_Stream) return String;

This function returns written contents of Stream. It is used together with the attributes T'Write.and T'Output. First the stream is written, then Get is called to obtain its contents.

function Get_Size (Stream : String_Stream) return Stream_Element_Count;

This function returns number of stream elements available to write or to read.

procedure Rewind (Stream : in out String_Stream);

This procedure sets Stream Position to 1. This operation undoes read and write operations done before.

procedure Set (Stream : in out String_Stream; Content : String);

This procedure sets Stream to contain Content. The next read operation will yield the first character of Content. Set is an operation inverse to the attributes T'Read and T'Input, which it should be used with. First the buffer contents is set using this procedure. Then the stream is read out. Constraint_Error is propagates when Stream'Length < Content'Length.

Note, this implementation requires that Stream_Element'Size be a multiple of Character'Size and the latter be a multiple of Storage_Element'Size.

10.1. Signed integers stream I/O

The package Strings_Edit.Streams.Generic_Integer provides a portable stream I/O for signed integers using chain codes. A chain code is variable length. Lesser absolute values require shorter sequences of stream elements to encode.

generic type Number is range <>; package Strings_Edit.Streams.Generic_Integer is ...

The integer number is encoded as follows. The value is first converted to a sequence of bits. The first bit of the sequence is 0 when the value is positive or 1 when negative. The following bits is the little-endian sequence of the absolute value. The sequence ends with the last non-zero bit. Each seven bits of the sequence are packet into an octet. The most significant bit of the octet is 0 for the last octet and 1 otherwise. The following example illustrates encoding principle:

-7512₁₀ 7512₁₀= 1_1101_0101_1000₂-7512₁₀-> 1011_0001₂, 0001_1101₂

Bits colored blue indicate whether the octet end the sequence. The package provides the following subprograms:

procedure Get ( Data : Stream_Element_Array; Pointer : in out Stream_Element_Offset; Value : out Number );

This procedure gets a value from Data starting at Data (Pointer). Pointer is advanced beyond the input value. Layout_Error is propagated when Pointer is out of the range Data'First..Data'Last+1. End_Error is propagated when there is not enough data. Data_Error is propagated when the encoded value is too large.

function Input ( Stream : access Root_Stream_Type'Class ) return Number;

This function can be used as an implementation of the Number'Input stream attribute.

procedure Output ( Stream : access Root_Stream_Type'Class; Value : Number );

This procedure can be used as an implementation of the Number'Output stream attribute.

procedure Put ( Data : in out Stream_Element_Array; Pointer : in out Stream_Element_Offset; Value : Number );

This procedures puts a value into Data starting at Data (Pointer). Pointer is advanced beyond the input value. Layout_Error is propagated when Pointer is out of the range Data'First..Data'Last+1. End_Error is propagated when there is no room for output.

The following instances of the package are provided:

Package	Number
Strings_Edit.Streams.Integers	Integer
Strings_Edit.Streams.Integers_32	Interfaces.Integer_32
Strings_Edit.Streams.Integers_64	Interfaces.Integer_64

10.2. Unsigned integers stream I/O

The package Strings_Edit.Streams.Generic_Unsigned provides a portable stream I/O for non-negative integers using chain codes. A chain code is variable length. Lesser absolute values require shorter sequences of stream elements to encode:

generic type Number is range <>; package Strings_Edit.Streams.Generic_Unsigned is ...

The unsigned integer number is encoded as follows. The value is first converted to a sequence of bits. The following bits is the little-endian sequence of the absolute value. The sequence ends with the last non-zero bit. Each seven bits of the sequence are packet into an octet. The most significant bit of the octet is 0 for the last octet and 1 otherwise. The following example illustrates encoding principle:

7512₁₀= 1_1101_0101_1000₂7512₁₀-> 1111_1000₂, 0011_1010₂

The package provides the following subprograms:

procedure Get ( Data : Stream_Element_Array; Pointer : in out Stream_Element_Offset; Value : out Number );

function Input ( Stream : access Root_Stream_Type'Class ) return Number;

This function can be used as an implementation of the Number'Input stream attribute.

procedure Output ( Stream : access Root_Stream_Type'Class; Value : Number );

This procedure can be used as an implementation of the Number'Output stream attribute. Constraint_Error is propagated when Value is negative.

procedure Put ( Data : in out Stream_Element_Array; Pointer : in out Stream_Element_Offset; Value : Number );

This procedure puts a value into Data starting at Data (Pointer). Pointer is advanced beyond the input value. Constraint_Error is propagated when Value is negative. Layout_Error is propagated when Pointer is out of the range Data'First..Data'Last+1. End_Error is propagated when there is no room for output.

The following instances of the package are provided:

Package	Number
Strings_Edit.Streams.Naturals	Natural

10.3. Modular number stream I/O

The package Strings_Edit.Streams.Generic_Modular provides a portable stream I/O for modular numbers using chain codes. The format is same as described for Strings_Edit.Streams.Generic_Unsigned. The package specification is:

generic type Number is mod <>; package Strings_Edit.Streams.Generic_Unsigned is ...

The package provides the following subprograms:

procedure Get ( Data : Stream_Element_Array; Pointer : in out Stream_Element_Offset; Value : out Number );

function Input ( Stream : access Root_Stream_Type'Class ) return Number;

This function can be used as an implementation of the Number'Input stream attribute.

procedure Output ( Stream : access Root_Stream_Type'Class; Value : Number );

This procedure can be used as an implementation of the Number'Output stream attribute. Constraint_Error is propagated when Value is negative.

procedure Put ( Data : in out Stream_Element_Array; Pointer : in out Stream_Element_Offset; Value : Number );

The following instances of the package are provided:

Package	Number
Strings_Edit.Streams.Unsigneds_32	Interfaces.Unsigned_32
Strings_Edit.Streams.Unsigneds_64	Interfaces.Unsigned_64

10.4. Recoding UTF-8 streams

The package Strings_Edit.UTF8.Recoding_Streams provides a stream that can be used to recode the original stream into and from UTF-8:

type Encoding_Type is ( ISO_8859_1, ISO_8859_2, ISO_8859_3, ISO_8859_4, ISO_8859_5, ISO_8859_6, ISO_8859_7, ISO_8859_8, ISO_8859_9, ISO_8859_10, Windows_1250, Windows_1251, Windows_1252, Windows_1253, Windows_1254, Windows_1255, Windows_1256, Windows_1257, Windows_1258, KOI8, MacOS_Roman );

This enumeration data type specifies the encoding of the original stream.

type Recoding_Stream ( Encoded : access Root_Stream_Type'Class; Method : Encoding_Type; Decoding_Default : Code_Point; Encoding_Default : Character ) is new Root_Stream_Type with private;

Reading. Reading from the stream causes reading from the encoded stream specified by the discriminant Encoded. The obtained octets are decoded according to the encoding specified by the discriminant Method and then recoded into an UTF-8 stream. The result delivered to the reader of the stream.

Writing. Written stream elements are considered UTF-8 octets. The corresponding code points are recoded according to the encoding Method used by the stream Encoded and then are written into it. Data_Error is propagated if the written UTF-8 sequence is invalid.

The discriminant Decoding_Default specifies the code point to be used when an element in the Encoded stream is illegal according to Method. The discrimiant Encoding_Default is the character used to represent code points which have no correspondence in the encoding Method.

11. Lexicographical comparisons

The package Strings_Edit.Lexicographical_Order provides comparisons of strings using lexicographical order. The package provides the following types:

type Precedence is (Less, Equal, Greater);

and the following operations:

function Compare_Textually (Left, Right : String) return Precedence;

This function compares two strings as texts. If strings contain chains of digits. These are logically replaced by single symbol considered lexicographically greater than a non-numeric character. Thus strings ab123 and ab44 are considered same. The string abc precedes ab1. On the basis of this function Boolean-valued comparisons are defined:

function Textually_Equal (Left, Right : String) return Boolean; function Textually_Less (Left, Right : String) return Boolean;

Another operation:

function Compare_Lexicographically (Left, Right : String) return Precedence;

This function compares two strings lexicographically. Chains of digits are compared numerically, as decimal numbers. This the string ab44 precedes ab0123. On the basis of this function Boolean-valued comparisons are defined:

function Lexicographically_Equal (Left, Right : String) return Boolean; function Lexicographically_Less (Left, Right : String) return Boolean;

12. Standard encodings

This section describes packages implementing standard encodings.

12.1. Base64 encoding

The package Strings_Edit.Base64 provides an implementation of RFC 4648 Base64 encoding. The package provides the declares the following operations:

function From_Base64 (Text : String) return String;

This function returns decoded string corresponding to the Text encoded in Base64. The text can be padded using = and ==. Data_Error is propagated on decoding errors.

function To_Base64 (Text : String) return String;

This function encodes Text in Base64.

12.1.1. Encoding stream

type Base64_Encoder ( Size : Stream_Element_Count ) is new Root_Stream_Type with private;

The type provides an encoding stream. When written the input is encoded so that the content read from the stream is Base64 encoded. The discriminant Size determines the maximum number of stored encoded octets - 1. Upon writing Status_Error is propagated when the stream object has no room to store encoded characters. The encoding state remains intact and the stream can be read from in order to free space. After that the writing operation can be repeated. Use_Error is propagated when the stream object would have no room even if it were empty.

procedure Flush (Stream : in out Base64_Encoder);

This procedure is called at the end of encoding after the last stream element has been written. Status_Error is propagated when the stream presently has no space available. The operation can be repeated after reading from the stream. Use_Error is propagated when the stream object would have no room even if it were empty.

function Free (Stream : Base64_Encoder) return Stream_Element_Count;

This function returns the number of stream elements that can be safely written into the stream. Note that free written elements are kept in encoded form an thus use more space than the original elements.

function Is_Empty (Stream : Base64_Encoder) return Boolean;

This function returns true if the stream is empty.

function Is_Full (Stream : Base64_Encoder) return Boolean;

This function returns true if the stream is full, i.e. there is no room to write a single stream element.

procedure Reset (Stream : in out Base64_Encoder);

This procedure is called to reset the stream into empty state.

function Used (Stream : Base64_Encoder) return Stream_Element_Count;

This function returns the number of ready stream elements that can be read from it.

12.1.2. Decoding stream

type Base64_Decoder ( Size : Stream_Element_Count ) is new Root_Stream_Type with private;

The type provides a decoding stream. When written the input is decoded from Base64 so that the content read from the stream is decoded. The discriminant Size determines the maximum number of stored decoded octets - 1. Upon writing Status_Error is propagated when the stream object has no room to store decoded stream elements. The decoding state remains intact and the stream can be read from in order to free space. After that the writing operation can be repeated. Use_Error is propagated when the stream object would have no room even if it were empty.

function Free (Stream : Base64_Decoder) return Stream_Element_Count;

This function returns the number of stream elements that can be safely written into the stream. Note that free written elements are kept in decoded form an thus use less space than the original elements.

function Is_Empty (Stream : Base64_Decoder) return Boolean;

This function returns true if the stream is empty.

function Is_Full (Stream : Base64_Decoder) return Boolean;

This function returns true if the stream is full, i.e. there is no room to write a single stream element.

procedure Reset (Stream : in out Base64_Decoder);

This procedure is called to reset the stream into empty state.

function Used (Stream : Base64_Decoder) return Stream_Element_Count;

This function returns the number of ready stream elements that can be read from it.

12.2. Object identifiers

The package Strings_Edit.Object_Identifiers provides an implementation of RFC 3061 object identifiers (OID). The package defines the object identifier type:

type Subindentifier_Type is new Natural; type Object_Identifier is array (Positive range <>) of Subindentifier_Type;

The package provides the following subprograms:

function "<" (Left, Right : Object_Identifier) return Boolean; function "<=" (Left, Right : Object_Identifier) return Boolean; function ">" (Left, Right : Object_Identifier) return Boolean; function ">=" (Left, Right : Object_Identifier) return Boolean;

The object identifiers are ordered per components. E.g. 1.231.4 precedes 2.1.

function Compare (Left, Right : Object_Identifier) return Precedence;

This function returns comparison result.

procedure Get ( Source : String; Pointer : in out Integer; Value : in out Object_Identifier; Last : out Integer );

This procedure gets a value of object identifier from Source starting at Source (Pointer). Pointer is advanced beyond the input value. After successful completion Last points to the last component of the identifier stored in Value.

*Exceptions*
Constraint_Error	The object identifier is too large to store in Value
Data_Error	Syntax error, there is no number following a dot
End_Error	No object identifier found
Layout_Error	Pointer is not in Source'First..Source'Last + 1

function Get ( Source : String; Pointer : access Integer; ) return Object_Identifier;

This function is a variant of procedure Get that returns object identifier as the result. Pointer is advanced beyond the input value.

*Exceptions*
Data_Error	Syntax error, there is no number following a dot
End_Error	No object identifier found
Layout_Error	Pointer is not in Source'First..Source'Last + 1

function Image (Value : Object_Identifier) return String;

This procedure converts the parameter Value to String.

procedure Put ( Destination : in out String; Pointer : in out Integer; Value : Object_Identifier; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' );

This procedure places the object identifier specified by the parameter Value into the output string Destination. The string is written starting from Destination (Pointer). The exception Layout_Error is propagated when Pointer is not in Destination'Range or there is no room for the output.

function Value (Source : String) return Object_Identifier;

This function gets an object identifier number from the string Source. It can be surrounded by spaces and tabs. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. Also Data_Error indicates a syntax error in the number.

*Exceptions*
Data_Error	Not all string matched
End_Error	No object identifier found

12.3. Distinguished names

The package Strings_Edit.Distinguished_Names provides an implementation of RFC 4514 distinguished names (DN). A distinguished name is a sequence of components. Each component is a set of attributes. Each attribute is a pair key-value. A key can be either text or object identifier (OID). The package defines the distinguished name type:

type Distinguished_Name (<>) is private;

The following additional types are defined:

type Attribute_Mode is (OID_Keyed, Text_Keyed);

The attribute's key can be either an object identifier or a name starting with a letter and containing letters, digits and minus (-):

type Attribute_Key ( Mode : Attribute_Mode; Length : Natural ) is record case Mode is when OID_Keyed => Identifier : Object_Identifier (1..Length); when Text_Keyed => Text : String (1..Length); end case; end record;

The keys are compared case insensitive:

function "=" (Left, Right : Attribute_Key) return Boolean; function "<" (Left, Right : Attribute_Key) return Boolean; function "<=" (Left, Right : Attribute_Key) return Boolean; function ">" (Left, Right : Attribute_Key) return Boolean; function ">=" (Left, Right : Attribute_Key) return Boolean;

Equivalence of textual names and object identifiers is not considered. E.g. the name cn and 2.5.4.3 are considered different. For correspondence of names and object identifiers refer to the OID repository.

type Name_Attribute ( Mode : Attribute_Mode; Key_Length : Natural; Value_Length : Natural ) is record Key : Attribute_Key (Mode, Key_Length); Value : String (1..Value_Length); end record;

The pair key - value is represented by the type Name_Attribute.

function Find_Attribute ( Name : Distinguished_Name; Component : Positive; Attribute : Attribute_Key ) return Integer;

This function finds an attribute by its key. The parameter Name is the distinguished name object. Component is the component number 1..Get_Length. Attribute is the key of the attribute to search for within the specified component of the name. When Component is outside the range Constraint_Error is propagated. The result is positive when the attribute is found. It can be accessed using Get_Attribute.

function Get ( Source : String; Pointer : access Integer; ) return Distinguished_Name;

This function gets a distinguished name from Source starting at Source (Pointer). Pointer is advanced beyond the input value. The sequence of components is delimited by commas (,). Attributes in a component are separated by plus (+). The key-value pairs are bound using equality (=). Comma, equality and plus can be surrounded by spaces. The key and value formats are:

A key can contain letters, digits, (-) and must start with a letter;
The value in the textual form is UTF-8 encoded. The characters minus (-), plus (+), equality (=), comma (,), semicolon (;), quotation marks ("), less (<), greater (>), sharp (#), backslash (\) must be escaped, e.g. using backslash (\). Space is allowed within the value but must be escaped if appear at the ends. All octets can be escaped using the form with a hexadecimal pair \hh;
The value in the hexadecimal form begins with sharp (#) and contains two hexadecimal digits per octet.

Here is a list of distinguished name examples:

CN=#41636566 CN=Lu\C4\8Di\C4\87 OU=Sales+CN=J. Smith,DC=example,DC=net CN=James \"Jim\" Smith\, III,DC=example,DC=net UID=jsmith,DC=example,DC=net 1.2.234=jsmith,4.56.9=example,1.1.1.2.777.0=net

*Exceptions*
Data_Error	Syntax error
End_Error	No distinguished name found
Layout_Error	Pointer is not in Source'First..Source'Last + 1

function Get_Attribute ( Name : Distinguished_Name; Component : Positive; Attribute : Positive := 1 ) return Name_Attribute;

This function returns an attribute. The parameter Name is the distinguished name object. Component is the component number 1..Get_Length. Attribute is the attribute position within the component 1..Get_Component_Length. Attributes are positioned in ascending order. Constraint_Error is propagated when either Component or Attribute is out of range.

function Get_Component_Length ( Name : Distinguished_Name; Component : Positive ) return Positive;

This function returns the number of components of the distinguished name object specified by the parameter Name.

function Get_Key ( Name : Distinguished_Name; Component : Positive; Attribute : Positive := 1 ) return Attribute_Key;

This function returns an attribute's key. The parameter Name is the distinguished name object. Component is the component number 1..Get_Length. Attribute is the attribute position within the component 1..Get_Component_Length. Attributes are positioned in ascending order. Constraint_Error is propagated when either Component or Attribute is out of range.

function Get_Length (Name : Distinguished_Name) return Natural;

This function returns the number of components of the distinguished name object specified by the parameter Name.

function Get_Value ( Name : Distinguished_Name; Component : Positive; Attribute : Positive := 1 ) return String;

This function returns an attribute's value. The parameter Name is the distinguished name object. Component is the component number 1..Get_Length. Attribute is the attribute position within the component 1..Get_Component_Length. Attributes are positioned in ascending order. Constraint_Error is propagated when either Component or Attribute is out of range.

function Image (Name : Distinguished_Name) return String;

procedure Put ( Destination : in out String; Pointer : in out Integer; Name : Distinguished_Name; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' );

This procedure places the distinguished name specified by the parameter Name into the output string Destination. The string is written starting from the Destination (Pointer). The exception Layout_Error is propagated if the value of Pointer is not in Destination'Range or there is no room for the output.

procedure Skip ( Destination : in out String; Pointer : in out Integer );

This procedure is similar to Get except that it does not return the distinguished name and only skips it.

*Exceptions*
Data_Error	Syntax error
End_Error	No distinguished name found
Layout_Error	Pointer is not in Source'First..Source'Last + 1

function Subname ( Name : Distinguished_Name; From : Positive; To : Positive ) return Distinguished_Name;

This function returns relative distinguished name starting from the component From and ending with To. Constraint_Error is propagated when the range is empty or else is not in 1..Get_Length.

function Value (Source : String) return Distinguished_Name;

This function gets a distinguished name from the string Source. It can be surrounded by spaces and tabs. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. Also Data_Error indicates a syntax error in the number.

*Exceptions*
Data_Error	Not all string matched
End_Error	No object identifier found

Distinguished names are ordered in the order of components and then by attributes in the component, first by the key, second by the value:

function "=" (Left, Right : Distinguished_Name) return Boolean; function "<" (Left, Right : Distinguished_Name) return Boolean; function "<=" (Left, Right : Distinguished_Name) return Boolean; function ">" (Left, Right : Distinguished_Name) return Boolean; function ">=" (Left, Right : Distinguished_Name) return Boolean;

Equivalence of textual names and object identifiers is not considered. E.g. the name cn and 2.5.4.3 are seen as different.

12.3.1. Construction of distinguished names

The following operations are used to construct a distinguished name:

function "=" ( Key : Object_Identifier / String; Value : String ) return Distinguished_Name; function "=" ( Key : Object_Identifier / String; Value : String ) return Name_Attribute;

These functions create an attribute or a distinguished name of single component from a key and a value.

function "and" ( Left : Distinguished_Name; Right : Name_Attribute ) return Distinguished_Name;

This function adds a new component to the distinguished name.

function "or" ( Left : Distinguished_Name; Right : Name_Attribute ) return Distinguished_Name;

This function adds an attribute to the last component of the distinguished name.

*Exceptions*
Constraint_Error	The name is empty
Name_Error	The attribute's key is already in use

function "&" ( Left : Distinguished_Name; Right : Distinguished_Name ) return Distinguished_Name;

This function concatenates two distinguished names.

The following example illustrates creation of a name:

("OU"="Sales" or "CN"="J. Smith") and "DC"="example" and "DC"="net"

12.4. ISO 8601 time and duration

The package Strings_Edit.ISO_8601 provides ISO 8601 representations of time and duration:

ISO 8601 time

procedure Get ( Source : String; Pointer : in out Integer; Value : out Time );

This procedure gets time in ISO 8601 format from Source starting at Source (Pointer). Pointer is advanced beyond the input value. The result is stored in Value. ISO 8601 formats are supported:

Standard time notation YYYY-MM-DDThh:mm:ss, e.g. 2007-04-06T00:00;
Week number and day of week notation YYYY-Www-DThh:mm:ss, e.g. 2009-W01-1;
Day of year notation YYYY-DDDThh:mm:ss, e.g. 2019-010T10:30Z.

Fractions of second as well as time zone ±hh:mm are supported. Decimal point can be denoted as comma. Shortened forms like 20070406T0000 are recognized. The special case of time specification 24:00 is legal.

*Exceptions*
Constraint_Error	Some of the components are too large to be supported by Ada types
Data_Error	Syntax error
End_Error	No time found in the source
Layout_Error	Pointer is not in Source'First..Source'Last + 1
Time_Error	Time conversion error
Unknown_Zone_Error	Time zone error

function Image ( Value : Time; Fraction : Second_Fraction := 0; Local : Boolean := False ) return String;

This function returns Value in ISO 8601 format. Fraction 0..6 specifies for many digits after decimal point to use for seconds. Local when true specifies if the local time format should be used. Otherwise it is the UTC format.

*Exceptions*
Time_Error	Time conversion error

procedure Put ( Destination : in out String; Pointer : in out Integer; Value : Time; Fraction : Second_Fraction := 0; Local : Boolean := False; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' );

This procedure places time Value into the output string Destination in ISO 8601 format. The string is written starting from the Destination (Pointer). The exception Layout_Error is propagated if the value of Pointer is not in Destination'Range or there is no room for the output. Fraction 0..6 specifies for many digits after decimal point to use for seconds. Local when true specifies if the local time format should be used. Otherwise it is the UTC format.

*Exceptions*
Layout_Error	Pointer is not in Source'First..Source'Last + 1 or there is no room for output
Time_Error	Time conversion error

function Value (Source : String) return Time;

This function gets ISO 8601 time from the string Source. It can be surrounded by spaces and tabs. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. Also Data_Error indicates a syntax error in the number.

*Exceptions*
Constraint_Error	Some of the components are too large to be supported by Ada types
Data_Error	Syntax error or not all string is matched
End_Error	No time found in the source
Time_Error	Time conversion error
Unknown_Zone_Error	Time zone error

ISO 8601 duration

procedure Get ( Source : String; Pointer : in out Integer; Value : out Time );

This procedure gets duration in ISO 8601 format from Source starting at Source (Pointer). Pointer is advanced beyond the input value. The result is stored in Value. ISO 8601 format PyearsYmonthsMweeksWdaysDThoursHminutesMsecondsS. Any field can be absent. For example: P1W denotes one week duration. Fractional values are allowed. Furthermore, decimal point can be denoted as comma. Note that T must appear before the first field indicating day duration. For instance: PT1.5S denotes one and half second duration.

*Exceptions*
Constraint_Error	Some of the components are too large to be supported by Ada types
Data_Error	Syntax error
End_Error	No time found in the source
Layout_Error	Pointer is not in Source'First..Source'Last + 1

function Image ( Value : Time; Fraction : Second_Fraction := 0 ) return String;

This function returns Value in ISO 8601 format. Fraction 0..6 specifies for many digits after decimal point to use for seconds.

procedure Put ( Destination : in out String; Pointer : in out Integer; Value : Time; Fraction : Second_Fraction := 0; Local : Boolean := False; Field : Natural := 0; Justify : Alignment := Left; Fill : Character := ' ' );

*Exceptions*
Layout_Error	Pointer is not in Source'First..Source'Last + 1 or there is no room for output

13. ChaCha20 cipher

The package Strings_Edit.ChaCha20 provides an implementation of the symmetric stream cipher developed by D. J. Bernstein as described in RFC 8439. The following data types and constants are declared in the package:

ChaCha20_Block_Size : constant := 16 * 4; subtype ChaCha20_Key is Stream_Element_Array (1..32); subtype ChaCha20_Nonce is Stream_Element_Array (1..12);

The cipher key is 256-bits long, the cipher nonce is 96-bits long.

type ChaCha20_Cipher is new Ada.Finalization.Controlled with private;

The instance of this type represent a ChaCha20 cipher with its internal state. The following primitive operations are defined:

function Decrypt | Encrypt ( Cipher : access ChaCha20_Cipher; Input : Stream_Element_Array | String ) return Stream_Element_Array | String;

These functions encrypt or decrypt Input. The parameter and result are either both Stream_Element_Array or both String. Since ChaCha20 is a stream cipher Encrypt and Decrypt are identical. Note that the cipher object maintains its state, so it is safe to call these procedures with input arrays of any size consequently.

procedure Decrypt | Encrypt ( Cipher : in out ChaCha20_Cipher; Input : Stream_Element_Array | String; Output : out Stream_Element_Array | String );

These procedures encrypt or decrypt Input into Output. These parameters are any combination of Stream_Element_Array and String. Input and Output must have same length, Constraint_Error is propagated otherwise. Since ChaCha20 is a stream cipher Encrypt and Decrypt are identical.

procedure Decrypt | Encrypt ( Cipher : in out ChaCha20_Cipher; Data : in out Stream_Element_Array | String );

These procedures are in-place variants for encryption or decryption.

function Get_Count (Cipher : ChaCha20_Cipher) return Unsigned_32;

This function returns the current cipher block count. When the count nears 2**32 the key and nonce should be changed.

function Get_Key_Stream ( Cipher : access ChaCha20_Cipher; Full : Boolean := False ) return Stream_Element_Array;

This function returns a portion of the key stream. The elements of the stream key are xor-ed with the input in order to encrypt or decrypt it. The result length is between 1..64 elements depending on the cipher state. When Full is true, a new block is generated and the result is 64 elements long.

procedure Set_Key ( Cipher : in out ChaCha20_Cipher; Key : ChaCha20_Key; Nonce : ChaCha20_Nonce := (others => 0); Count : Unsigned_32 := 0 );

This procedure sets the key and nonce into the cipher. Count is the block count. The block count is incremented for each 256-bit block of encrypted or decrypted data.

13.1. ChaCha20 ciphering streams

The package Strings_Edit.ChaCha20 also provides a ciphering stream:

type ChaCha20_Stream ( Transport : access Root_Stream_Type'Class; Size : Stream_Element_Count ) is new Root_Stream_Type with private;

The stream when read from it reads from the stream Transport and then decrypts the input using the ChaCha20 cipher. The result becomes the stream's input. When the stream is written, the output is encrypted by the ChaCha20 cipher and the result is written into the stream Transport. The discriminant Size determines the output block size. When written the output is encrypted by chunks of Size if the output is larger than Size.

function Get_Count (Cipher : ChaCha20_Stream) return Unsigned_32;

This function returns the current cipher block count. When the count nears 2**32 the key and nonce should be changed.

function Get_Key_Stream ( Cipher : access ChaCha20_Stream; Full : Boolean := False ) return Stream_Element_Array;

procedure Set_Key ( Stream : in out ChaCha20_Stream; Key : ChaCha20_Key; Nonce : ChaCha20_Nonce := (others => 0); Count : Unsigned_32 := 0 );

This procedure sets the key and nonce into the stream's cipher. Count is the block count. The block count is incremented for each 256-bit block of encrypted or decrypted data.

13.2. Poly1305 digests

The package Strings_Edit.ChaCha20.Poly1305 provides an implementation of Poly1305 digest as described in RFC 8439. The implementation does not use any big-numeric library and following the recommendation tries to deploy constant-time arithmetic operations where possible. The package declares the digest subtype as an array of 16 elements:

subtype ChaCha20_Tag is Stream_Element_Array (1..16);

The digest is calculated using the following functions:

function Digest ( Message : Stream_Element_Array | String; Key : ChaCha20_Key; [ Nonce : ChaCha20_Nonce ] ) return ChaCha20_Tag;

These functions calculate the digest of the parameter Message which can be a stream element array or a string using the key from the parameter Key. When both Key and Nonce are specified a one-time key is generated for the digest as described in the section 2.6 of RFC 8439. It is equivalent to the following code fragment:

Cipher : aliased ChaCha20_Cipher; One_Time_Key : ChaCha20_Key; begin Set_Key (Cipher, Key, Nonce); declare Key : constant Stream_Element_Array := Get_Key_Stream (Cipher'Access, True); begin -- First 32 elements of the key stream is the key One_Time_Key := Key (Key'First..Key'First + 31); end;

The digest can be calculated using a stream object:

type Poly1305_Stream is new Root_Stream_Type with private;

by writing portions of the message into it. The process starts when the following procedure is called:

procedure Start ( Stream : in out Poly1305_Stream; Key : ChaCha20_Key; [ Nonce : ChaCha20_Nonce ] );

This procedure sets the key and initiates evaluation of the digest. The variant with both Key and Nonce generates a one-time key as described in the section 2.6 of RFC 8439. The portions of the message are then written into the stream using stream operations or stream attributes. When all message is written the following procedure is called to get the digest:

procedure Stop ( Stream : in out Poly1305_Stream; Digest : out ChaCha20_Tag );

13.3. AEAD

The package Strings_Edit.ChaCha20.AEAD provides an implementation of authenticated encryption with additional data (AEAD) described in RFC 8439. The method uses:

ChaCha20 cipher with a key and nonce set. These must be known to the sender and recipient;
The text to send;
Additional data known on both ends.

The text is encrypted and signed with the additional data. The recipient decrypts the text and verifies the signature. Note that the recipient must know the length of the text in advance and thus the encrypted message length. For this reason there is no stream operations provided since the message length must be fixed.

procedure Encrypt ( Cipher : in out ChaCha20_Cipher; Text : Stream_Element_Array | String; Data : Stream_Element_Array | String; Message : out Stream_Element_Array );

These procedures construct an encrypted authenticated message. Cipher is the cipher with the key and nonce set. Text is the text to send. It can be either a stream element array or string. Data is the additional data. It has the same type as Text. Message is the result. The length of Message is the length of Text + 16, otherwise Use_Error is propagated. These procedures can be called consequently without setting another key-nonce pair into the cipher.

procedure Decrypt ( Cipher : in out ChaCha20_Cipher; Message : Stream_Element_Array; Data : Stream_Element_Array | String; Text : out Stream_Element_Array | String );

These procedures decrypt and verify an encrypted authenticated message. Cipher is the cipher with the key and nonce set. Message is the message. Data is the additional data, same as the data used during construction of the message. Upon successful completion Text is the decrypted text. It can be either a stream element array or string, but must have the same type as Data. The length of Message is the length of Text + 16, otherwise Use_Error is propagated. When authentication check fails Data_Error is propagated. These procedures can be called consequently without setting another key-nonce pair into the cipher.

14. Packages

Package						Provides
Strings_Edit						The basic string I/O
	Base64					RFC 4648 implementation
	ChaCha20					RFC 8439 implementation
		AEAD				RFC 8439 implementation of authenticated encryption with additional data
		Poly1305				RFC 8439 Poly1305 digests implementation
	Distinguished_Names					RFC 4514 distinguished names (DN)
	Fields					Tools for writing new Put-procedures
	Float_Edit					Generic I/O of floating-point numbers
	Floats					I/O of standard Float (instantiation of Float_Edit)
	Generic_Scale					Generic scales for I/O of plot axes
	Integer_Edit					Generic I/O of integer numbers
	Integers					I/O of standard Integer (instantiation of Integer_Edit)
		Subscript				I/O of standard Integer using UTF-8 subscript characters
		Superscript				I/O of standard Integer using UTF-8 superscript characters
	ISO_8601					ISO 8601 time and duration
	Lexicographical_Order					Lexicographical comparisons of strings
	Long_Floats					I/O of standard Long_Float (instantiation of Float_Edit)
	Object_Identifiers					RFC 3061 object identifiers (OID)
	Quoted					I/O of strings put in Ada-style quotes
	Roman_Edit					I/O of roman numbers
	Streams					Stream I/O to and from strings
		Generic_Integer				Stream I/O of signed integers using chained codes
		Generic_Modular				Stream I/O of modular numbers using chained codes
		Generic_Unsigned				Stream I/O of unsigned integers using chained codes
		Integers				Integer stream I/O (instantiation of Generic_Integer with Integer)
		Integers_32				Integer stream I/O (instantiation of Generic_Integer with Integer_32)
		Integers_64				Integer stream I/O (instantiation of Generic_Integer with Integer_64)
		Naturals				Natural stream I/O (instantiation of Generic_Unsigned with Natural)
		Unsigneds_32				Modular stream I/O (instantiation of Generic_Modular with Unsigned_32)
		Unsigneds_64				Modular stream I/O (instantiation of Generic_Modular with Unsigned_64)
	UTF8					The base UTF-8 package. UTF-8 string length, skipping UTF-8 encoded characters
		Blocks				Ranges of code points defined by the Unicode standard
		Categorization				Unicode categorization
		Handling				Conversions of UTF-8 encoded strings to and from standard Ada strings
		Integer_Edit				Generic I/O of integer numbers using UTF-8 characters different from standard ASCII digits
		ISO_8859_2				ISO/IEC 8859-2 encoding conversions
		ISO_8859_3				ISO/IEC 8859-3 encoding conversions
		ISO_8859_4				ISO/IEC 8859-4 encoding conversions
		ISO_8859_5				ISO/IEC 8859-5 encoding conversions
		ISO_8859_6				ISO/IEC 8859-6 encoding conversions
		ISO_8859_7				ISO/IEC 8859-7 encoding conversions
		ISO_8859_8				ISO/IEC 8859-8 encoding conversions
		ISO_8859_9				ISO/IEC 8859-9 encoding conversions
		ISO_8859_10				ISO/IEC 8859-10 encoding conversions
		ITU_T61				ITU-T T.61 encoding conversions
		KOI8				KOI8 encoding conversions
		MacOS_Roman				Mac OS Roman encoding conversions
		Maps				Maps and Sets of Unicode code points
					Constants	Set constants for some commonly used sets
		Mapping				Unicode case mapping
		RADIX50				DEC RADIX-50 encoding conversions
		Recoding_Streams				Recoding UTF-8 streams
		Subscript				Dealing with UTF-8 subscript characters
			Integer_Edit			Generic I/O of integer numbers using UTF-8 subscript characters
		Superscript				Dealing with UTF-8 superscript characters
			Integer_Edit			Generic I/O of integer numbers using UTF-8 superscript characters
		Wildcards				Wildcard matching of UTF-8 encoded strings
				Case_Insensitive		Case-insensitive wildcard matching
		Windows_1250				Windows-1250 encoding conversions
		Windows_1251				Windows-1251 encoding conversions
		Windows_1252				Windows-1252 encoding conversions
		Windows_1253				Windows-1253 encoding conversions
		Windows_1254				Windows-1254 encoding conversions
		Windows_1255				Windows-1255 encoding conversions
		Windows_1256				Windows-1256 encoding conversions
Windows_1257				Windows-1257 encoding conversions
Windows_1258				Windows-1258 encoding conversions

15. Installation

The software does not require special installation. The archive's content can be put in a directory and used as-is. For users of GNAT compiler the software provides gpr project files, which can be used in the Gnat Programming Studio (GPS).

For CentOS, Debian, Fedora, Ubuntu Linux distributions there are pre-compiled packages, see the links on the top of the page.

Project files	Provides	Use in custom project
strings_edit	Strings edit for Ada	`with "strings_edit.gpr";`

GNAT project scenario variables.

Arch_Type specifies the architecture. It can be used in combination with Object_Dir when compiled for multiple architectures:

x86_64
i686
armhf

Development controls debugging information and optimization level:

Debug
Release

Legacy controls Ada language version:

Ada95
Ada2005
Ada2012

Object_Dir controls the location of object files:

. (dot) causes object files to be created in the project's directory;
nested uses paths like ./obj/linux/i686/Debug.

Target_OS controls choice of bindings to the low-level OS primitives (when apply):

Windows or Windows_NT are used for MS-Windows;
Linux or UNIX are used for Linux;
OSX or FreeBSD are used for UNIX-like systems.

16. Changes log

The following versions were tested with the compilers:

GNAT Studio Community 2020 (20200427)
GNAT Community 2019 (20190517-83)
GNAT Community 2018 (20180523-73)
GNAT 8-12

Changes (5 August 2022) to the version 3.7:

Minor bug fixes in Strings_Edit-ISO_8601.

Changes (31 May 2020) to the version 3.6:

Code cleanup;
Adapted to GNAT Studio Community 2020.

The following versions were tested with the compilers:

GNAT Community 2018 (20180523-73)
GNAT 8
GNAT 9

Changes (17 September 2019) to the version 3.5:

Added a variant of procedure Get of Tables.UTF8_Names that does not raise exception.

Changes (4 August 2019) to the version 3.4:

Added the package Strings_Edit.Long_Floats, an instance of String_Edit.Floats with Long_Float;
The package Strings_Edit.UTF8.ITU_T61 provides ITU T.61 encoding conversions;
The package Strings_Edit.Object_Identifiers provides implementation of RFC 3061 object identifiers (OID);
The package Strings_Edit.Distinguished_Names provides implementation of RFC 4514 distinguished names (DN);
The package Strings_Edit.ISO_8601 provides ISO 8601 representations of time and duration;
Encoding and decoding Base64 streams were added to the package Strings_Edit.Base64.

The following versions were tested with the compilers:

GNAT Community 2018 (20180523-73)
GNAT 8

Changes (6 Nov 2019) to the version 3.3:

The package Strings_Edit.UTF8.Windows_1250 provides Windows-1250 encoding conversions;
The package Strings_Edit.UTF8.Windows_1251 provides Windows-1251 encoding conversions;
The package Strings_Edit.UTF8.Windows_1252 provides Windows-1252 encoding conversions;
The package Strings_Edit.UTF8.Windows_1253 provides Windows-1253 encoding conversions;
The package Strings_Edit.UTF8.Windows_1254 provides Windows-1254 encoding conversions;
The package Strings_Edit.UTF8.Windows_1255 provides Windows-1255 encoding conversions;
The package Strings_Edit.UTF8.Windows_1256 provides Windows-1256 encoding conversions;
The package Strings_Edit.UTF8.Windows_1257 provides Windows-1257 encoding conversions;
The package Strings_Edit.UTF8.Windows_1258 provides Windows-1258 encoding conversions;
The package Strings_Edit.UTF8.ISO_8859_2 provides ISO/IEC 8859-2 encoding conversions;
The package Strings_Edit.UTF8.ISO_8859_3 provides ISO/IEC 8859-3 encoding conversions;
The package Strings_Edit.UTF8.ISO_8859_4 provides ISO/IEC 8859-4 encoding conversions;
The package Strings_Edit.UTF8.ISO_8859_5 provides ISO/IEC 8859-5 encoding conversions;
The package Strings_Edit.UTF8.ISO_8859_6 provides ISO/IEC 8859-6 encoding conversions;
The package Strings_Edit.UTF8.ISO_8859_7 provides ISO/IEC 8859-7 encoding conversions;
The package Strings_Edit.UTF8.ISO_8859_8 provides ISO/IEC 8859-8 encoding conversions;
The package Strings_Edit.UTF8.ISO_8859_9 provides ISO/IEC 8859-9 encoding conversions;
The package Strings_Edit.UTF8.ISO_8859_10 provides ISO/IEC 8859-10 encoding conversions;
The package Strings_Edit.UTF8.KOI8 provides KOI8 encoding conversions;
The package Strings_Edit.UTF8.MacOS_Roman provides Mac OS Roman encoding conversions;
The package Strings_Edit.UTF8.RADIX50 provides DEC RADIX-50 encoding conversions;
The package Strings_Edit.UTF8.Recoding_Streams provides streams recoding into/from UTF-8.

Changes (5 Aug 2018) to the version 3.2:

Switched to GNAT Community 2018;
Added implementations of RFC 8439 ChaCha20 cipher, Poly1305 digest, AEAD.

Changes (2 April 2015) to the version 3.1:

ARMv7 (AKA armhf) support.

Changes (22 November 2014) to the version 3.0:

String_Edit.Streams.Generic_Integer package was added for chain-encoded signed numbers;
Instances of String_Edit.Streams.Generic_Integer with Integer, Integer_32, Integer_64 were added;
String_Edit.Streams.Generic_Unsigned package added for chain-encoded unsigned numbers;
Instance of String_Edit.Streams.Generic_Unsigned with Natural was added;
String_Edit.Streams.Generic_Modular package added for chain-encoded modular numbers;
Instances of String_Edit.Streams.Generic_Modular with Unsigned_32, Unsigned_64 were added;
The package Strings_Edit.Base64 was added.

Changes (5 June 2014) to the version 2.9:

Minor bug fixes in generic declarations.

Changes (1 June 2014) to the version 2.8:

Added wildcard matching with character mapping equivalence;
The package Strings_Edit.UTF8.Wildcards.Case_Insensitive provides case-insensitive wildcard matching;
Compiled with GNAT 4.9.

Changes to the version 2.7:

Lexicographical comparisons of strings.

Changes to the version 2.6:

Bug fix in Strings_Edit.Streams.

Changes to the version 2.5:

Fedora and Debian packages are provided for both 32- and 64-bit architectures.

Changes to the version 2.4:

The library is packaged for Fedora and Debian.

Changes to the version 2.3:

Bug fix in Is_Prefix, which uses character maps.

Changes to the version 2.2:

String streams implementation was added.

Changes to the version 2.1:

Is_Prefix versions added that use mappings to compare characters in Latin-1 and UTF-8 modes;
Installation instructions added.

Changes to the version 2.0:

Strings_Edit.UTF8.Blocks defines the Unicode blocks of code points;
Strings_Edit.UTF8.Categorization was added to provide categorization of Unicode code points;
Strings_Edit.UTF8.Maps provides sets and maps of Unicode code points;
Strings_Edit.UTF8.Maps.Constants provides some commonly used sets and maps.

Changes to the version 1.9:

Strings_Edit.UTF8.Wildcards package added to match UTF-8 encoded strings with wildcards;
Strings_Edit.UTF8.Mapping package provides case conversions of UTF-8 encoded strings;
Get_Backwards was added to Strings_Edit.UTF8:
Is_Prefix were added to the Strings_Edit package.

Changes to the version 1.8:

For GNAT users GPS project files were added;
Strings_Edit.Generic_Scale bugfix.

Changes to the version 1.7:

Strings_Edit.Generic_Scale was added to support plotting of graph axis;

Changes to the version 1.6:

Handling of Ada-style quoted strings was added;
Bug fix in Trim (Trim raised Constraint_Error with an empty string).

Changes to the version 1.5:

UTF-8 support
Conversions between Ada and UTF-8 strings
Sub- and superscript integer I/O in UTF-8

Changes to the version 1.4:

Licensing wording was corrected to comply with GMGPL

Changes to the version 1.3:

Bug fix in Strings_Edit.Integer_Edit. Get hung while processing of very large numbers.
Bug fix in Strings_Edit.Float_Edit. Get should correctly deal with big numbers on machines where Float'Machine_Overflows = false.

Changes to the version 1.2:

The child package Strings_Edit.Fields can be used to write Put subroutines.

Changes to the version 1.1:

When fraction is empty, dot is not matched. Thus Get of Float_Edit will match only "2" in "2.".

Changes to the version 1.0:

Non-generic versions of the packages Integer_Edit and Float_Edit were added.
The generic packages Integer_Edit and Float_Edit are made children of Strings_Edit. In the previous version they were nested. The reason was to reuse a preinstantiated version of Integer_Edit within Float_Edit.
I/O routines for roman numbers was moved into a separate child package Roman_Edit.

17. Table of Contents

1 Input from String
1.1. Get procedures
1.2. Value functions

2 Output into String
2.1. Put procedures
2.2. Image functions

3 String I/O
3.1. Quoted strings

4 Roman I/O

5 Integer I/O

6 Floating-point I/O

7 UTF-8
    7.1. Handling UTF-8 strings
    7.2. Generic integer I/O of UTF-8 strings
    7.3. Subscript UTF-8 integer I/O
    7.4. Superscript UTF-8 integer I/O
    7.5. Wildcard-matching of UTF-8 strings
    7.6. Case mapping
    7.7. Unicode categorization
    7.8. Blocks
    7.9. Sets and maps
       7.9.1. Sets
       7.9.2. Maps
       7.9.3. Constants
    7.10. 8-bit encodings
       7.10.1. Windows-1250
       7.10.2. Windows-1251
       7.10.3. Windows-1252
       7.10.4. Windows-1253
       7.10.5. Windows-1254
       7.10.6. Windows-1255
       7.10.7. Windows-1256
       7.10.8. Windows-1257
       7.10.9. Windows-1258
       7.10.10. KOI8
       7.10.11. ISO/IEC 8859-2
       7.10.12. ISO/IEC 8859-3
       7.10.13. ISO/IEC 8859-4
       7.10.14. ISO/IEC 8859-5

       7.10.15. ISO/IEC 8859-6
       7.10.16. ISO/IEC 8859-7
       7.10.17. ISO/IEC 8859-8
       7.10.18. ISO/IEC 8859-9
       7.10.19. ISO/IEC 8859-10
       7.10.20. Mac OS Roman
       7.10.21. ITU T.61
    7.11. 16-bit encodings
       7.11.1. RADIX-50

8 Fields

9 Generic axis scales

10 String streams
    10.1. Signed integers stream I/O
    10.2. Unsigned integers stream I/O
    10.3. Modular numbers stream I/O
    10.4. Recoding UTF-8 streams

11 Lexicographical comparisons

12 Standard encodings
    12.1. Base64 encoding
       12.1.1. Encoding stream
       12.1.2. Decoding stream
    12.2. Object identifiers
    12.3. Distinguished names
       12.3.1. Construction of distinguished names
    12.4. ISO 8601 time and duration

13 ChaCha20 cipher
    13.1. ChaCha20 ciphering streams
    13.2. Poly1305 digests
    13.3. AEAD

14 Packages

15 Installation

16 Changes log

17 Table of contents