STRINGS EDIT
version 2.8
by Dmitry A. Kazakov

(mailbox@dmitry-kazakov.de)
[Home]

This library is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

As a special exception, if other files instantiate generics from this unit, or you link this unit with other files to produce an executable, this unit does not by itself cause the resulting executable to be covered by the GNU General Public License. This exception does not however invalidate any other reasons why the executable file might be covered by the GNU Public License.


The package Strings_Edit provides I/O facilities. The following I/O items are supported by the package:

The major differences to the standard Image/Value attributes and Text_IO procedures are:

Strings edit project is a part of the simple components for Ada project and can be obtained with it. Alternatively the latest version can be here.

Download Strings Edit Platform:   64- 32bit
Fedora packages   precompiled and packaged using RPM     [Download page] [Download page]
Debian packages   precompiled and packaged for dpkg   [Download page] [Download page]
Source distribution (any platform)   strings_2_8.tgz (tar + gzip, Windows users may use WinZip)   [Download]

See also changes log.


[TOC][Next]

1. Input from String

1.1. Get procedures

Get procedures are used to scan strings. The first two parameters are always Source and Pointer. Source is the string to be scanned. Pointer indicates the current position. After successful completion it is advanced to the first string position following the recognized item. The value of Pointer shall be in the range Source'First..Source'Last+1. The Layout_Error exception is propagated when this check fails. The third parameter usually accepts the value. The following example shows how to use get procedures:

package Edit_Float is new Float_Edit (Float);
use Edit_Float;
   . . .
   Line        : String (1..512); -- A line to parse
   Pointer     : Integer;
   Value       : Float;
   TabAndSpace : Ada.Strings.Maps.Character_Set :=
       To_Set (" " & Ada.Characters.Latin_1.HT);
begin
   . . .
   Pointer := Line'First;
   Get (Line, Pointer, TabAndSpace); -- Skip tabs and spaces
   Get (Line, Pointer, Value);       -- Get number
   Get (Line, Pointer, TabAndSpace); -- Skip tabs and spaces
   . . .

The numeric get procedures have additional parameters controlling the range of the input value. The parameters First and Last define the range of the expected value. The exception Constraint_Error is propagated when the value is not in the range. The exception can be suppressed using the parameters ToFirst and ToLast, which cause the input value to be substituted by the corresponding margin when the parameter is True.

The numeric get procedures may have the parameter Base of the subtype NumberBase. The parameter defines the base of the expected number (2..16). Note that the base specification may not appear in the input.

1.2. Value functions

Each get procedure returning some value has a corresponding function Value . The function Value has the same parameter profile with the exception that the parameter Pointer is absent and the value is returned via result. Unlike Get the function Value tolerates spaces and tabs around the converted value. The whole string should be matched, otherwise, the exception Data_Error is propagated.


[Back][TOC][Next]

2. Output into String

2.1. Put procedures

Put procedures place something into the output string Destination. The string is written starting from Destination (Pointer). The parameter Field defines the output size. When it has the value zero, then the output size is defined by the output item. Otherwise the output is justified within the field and the parameter Justify specifies output alignment and the parameter Fill gives the pad character. When Field is greater than Destination'Last - Pointer + 1, the latter is used instead. After successful completion Pointer is advanced to the first character following the output or to Destination'Last + 1.

The numeric put procedures may have the parameter Base of the subtype NumberBase. The parameter defines the base of the output (2..16). Note that the base specification will not appear in the output.

2.2. Image functions

Image functions convert a value into string. Unlike standard S'Image they do not place an extra space character.


[Back][TOC][Next]

3. String I/O

The package Strings_Edit provides basic tools for string I/O.

procedure Get
          (  Source  : String;
             Pointer : in out Integer;
             Blank   : Character := ' '
          );

This procedure skips the character Blank starting from Source (Pointer). Pointer is advanced to the first non-Blank character or to Source'Last + 1. The exception Layout_Error is propagated if the value of Pointer is not in the range Source'First..Source'Last + 1.

procedure Get
          (  Source  : String;
             Pointer : in out Integer;
             Blanks  : Character_Set
          );

This procedure skips all the characters of the set Blanks starting from Source (Pointer). Pointer is advanced to the first non-blank character or to Source'Last + 1. The exception Layout_Error is propagated if the value of Pointer is not in the range Source'First..Source'Last + 1. See also Strings_Edit.UTF8.Maps.Get, which is an UTF-8 equivalent of this subprogram.

procedure Put
          (  Destination : in out String;
             Pointer     : in out Integer;
             Value       : Character;
             Field       : Natural   := 0;
             Justify     : Alignment := Left;
             Fill        : Character := ' '
          );

This procedure places the specified character (Value parameter) into the output string Destination. The string is written starting from the Destination (Pointer). The exception Layout_Error is propagated if the value of Pointer is not in Destination'Range or there is no room for the output.

procedure Put
          (  Destination : in out String;
             Pointer     : in out Integer;
             Value       : String;
             Field       : Natural   := 0;
             Justify     : Alignment := Left;
             Fill        : Character := ' '
          );

This procedure places the specified by the Value parameter string into the output string Destination. The string is written starting from the Destination (Pointer). The exception Layout_Error is propagated if the value of Pointer is not in Destination'Range or there is no room for the output.

The package also provides the following string operations:

function Is_Prefix (Prefix, Source : String) return Boolean;

This function returns true if Prefix is a prefix of Source. An empty string is a prefix of any string.

function Is_Prefix (Prefix, Source : String; Pointer : Integer) return Boolean;

This function returns true if Prefix is a prefix of Source (Pointer..Source'Last). An empty string is a prefix of any substring. The result is false if Pointer is not in the range Source'First..Source'Last + 1.

function Is_Prefix
         (  Prefix, Source : String;
            Map : Character_Mapping
         )  return Boolean;

This function returns true if Prefix is a prefix of Source with respect to the mapping represented by Map. An empty string is a prefix of any string.

function Is_Prefix
         (  Prefix, Source : String;
            Pointer : Integer;
            Map     : Character_Mapping
         )  return Boolean;

This function returns true if Prefix is a prefix of Source (Pointer..Source'Last)with respect to the mapping represented by Map. An empty string is a prefix of any substring. The result is false if Pointer is not in the range Source'First..Source'Last + 1.

function Trim
         (  Source : String;
            Blank  : Character := ' '
         )  return String;

This function returns the content of Source with the character Blank removed from both ends of.

function Trim
         (  Source : String;
            Blanks : Character_Set
         )  return String;

This function returns the content of Source with the characters from the set Blanks removed from both ends of. See also Strings_Edit.UTF8.Maps.Trim which is an UTF-8 equivalent of this procedure.

3.1. Quoted strings

The child package Strings_Edit.Quoted provides functions for handling quoted strings. A quoted string is put in quotation marks, while each quotation mark within the string is doubled. This allows unambiguously restore the original string from its quotation.

function Get_Quoted
         (  Source  : String;
            Pointer : access Integer;
            Mark    : Character := '"'
         )  return String;

This function gets a quoted string. String (Pointer.all) is the first character of the string. Pointer is advanced to the the first character following the input, note that it is an access to integer rather than pain integer, because functions in Ada cannot have in out parameters. The parameter Marks specifies the quotation marks to use. Within the body of a quoted text this character is doubled. The result is the original quoted text with quotation marks around it removed. The quotation marks within the text are halved. The exception Data_Error is propagated when the string at Pointer.all does not contain a Mark character or else when no closing Mark character appears before the string end. The exception Layout_Error is propagated if the value of Pointer.all is not in the range Source'First..Source'Last + 1.

procedure Put_Quoted
          (  Destination : in out String;
             Pointer     : in out Integer;
             Text        : String;
             Mark        : Character := '"';
             Field       : Natural   := 0;
             Justify     : Alignment := Left;
             Fill        : Character := ' '
          );

This procedure puts Text in Mark quotes and places the result into String starting from the position indicated by Pointer. Pointer is advanced to the the first character following the output. Mark characters are doubled within the string body. The exception Layout_Error is propagated if there is no room for output or Pointer is not in Source'First..Source'Last + 1.

function Quote
         (  Text : String;
            Mark : Character := '"'
         )  return String;

This function returns Text quoted using the Mark character.


[Back][TOC][Next]

4. Roman I/O

The child package Roman_Edit provides I/O routines for roman numbers. The type Roman is defined there as follows:

type Roman is range 1..3999;

The following subroutines are declared for the type:

procedure Get
          (  Source  : in String;
             Pointer : in out Integer;
             Value   : out Roman;
             First   : Roman   := Roman'First;
             Last    : Roman   := Roman'Last;
             ToFirst : Boolean := False;
             ToLast  : Boolean := False
          );

This procedure gets a roman number from the string Source. The process starts from Source (Pointer). The exception Constraint_Error is propagated if the number is not in the range First..Last. Data_Error indicates a syntax error in the number. End_Error is raised when no number was detected. Layout_Error is propagated when Pointer is not in the range Source'First .. Source'Last + 1. See also description of get procedures.

function Value
         (  Source  : String;
            First   : Roman   := Roman'First;
            Last    : Roman   := Roman'Last;
            ToFirst : Boolean := False;
            ToLast  : Boolean := False
         )  return Roman;

This function gets the roman number from the string Source. The number can be surrounded by spaces and tabs. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. Also Data_Error indicates a syntax error in the number. The exception Constraint_Error is propagated if the number is not in the range First..Last. End_Error is raised when no number was detected.

procedure Put
          (  Destination : in out String;
             Pointer     : in out Integer;
             Value       : Roman;
             LowerCase   : Boolean   := False;
             Field       : Natural   := 0;
             Justify     : Alignment := Left;
             Fill        : Character := ' '
          );

This procedure places the number specified by the parameter Value into the output string Destination. The string is written starting from Destination (Pointer). The parameter LowerCase determines whether upper or lower case letters should be used. The exception Layout_Error is propagated when Pointer is not in Destination'Range or there is no room for the output.

function Image
         (  Value     : Roman;
            LowerCase : Boolean := False
         )  return String;

This function converts Value to string. The parameter LowerCase indicates whether upper or lower case letters shall be used.


[Back][TOC][Next]

5. Integer I/O

The package Strings_Edit has a generic child package Integer_Edit:

generic
   type Number is range <>;
package Strings_Edit.Integer_Edit is ...

It is parameterized by an integer type. There is also package Strings_Edit.Integers which is an instantiation of Integer_Edit with the type Integer as the parameter. The generic package has the following subprograms:

procedure Get
          (  Source  : in String;
             Pointer : in out Integer;
             Value   : out Number'Base;
             Base    : NumberBase  := 10;
             First   : Number'Base := Number'First;
             Last    : Number'Base := Number'Last;
             ToFirst : Boolean     := False;
             ToLast  : Boolean     := False
          );

This procedure gets an integer number from the string Source. The process starts from Source (Pointer). The parameter Base indicates the base of the expected number. The exception Constraint_Error is propagated if the number is not in the range First..Last. Data_Error indicates a syntax error in the number. End_Error is raised when no number was detected. Layout_Error is propagated when Pointer is not in the range Source'First .. Source'Last + 1. See also description of get procedures.

function Value
         (  Source  : String;
            Base    : NumberBase  := 10;
            First   : Number'Base := Number'First;
            Last    : Number'Base := Number'Last;
            ToFirst : Boolean     := False;
            ToLast  : Boolean     := False
         )  return Number'Base;

This function gets an integer number from the string Source. The number can be surrounded by spaces and tabs. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. Also Data_Error indicates a syntax error in the number. The exception Constraint_Error is propagated if the number is not in the range First..Last. End_Error is raised when no number was detected.

procedure Put
          (  Destination : in out String;
             Pointer     : in out Integer;
             Value       : Number'Base;
             Base        : NumberBase := 10;
             PutPlus     : Boolean    := False;
             Field       : Natural    := 0;
             Justify     : Alignment  := Left;
             Fill        : Character  := ' '
          );

This procedure places the number specified by the parameter Value into the output string Destination. The string is written starting from Destination (Pointer). The parameter Base indicates the number base used for the output. The base itself does not appear in the output. The parameter PutPlus indicates whether the plus sign should be placed if the number is positive. The exception Layout_Error is propagated when Pointer is not in Destination'Range or there is no room for the output. For example the code:

Text    : String (1..20) := (others =>'#');
Pointer : Positive := Text'First;
. . .
Put (Text, Pointer, 5, 2, True, 10, Center, '@');

will set Pointer to 11 and overwrite the first 10 characters of the string Text:

@ @ @ + 1 0 1 @ @ @ # # # # # # # # # #
function Image
         (  Value   : Number'Base;
            Base    : NumberBase := 10;
            PutPlus : Boolean    := False
         )  return String;

This function converts Value to string. The parameter Base indicates the number base used for the output. The base itself does not appear in the output. The parameter PutPlus indicates whether the plus sign should be placed if the number is positive.

The package Strings_Edit.Integers is an instance of Strings_Edit.Integer_Edit with the type Integer as the parameter.


[Back][TOC][Next]

6. Floating-point I/O

The package Strings_Edit has a generic child package Float_Edit:

generic
   type Number is digits <>;
package Strings_Edit.Float_Edit is ...

The package is parametrized by a floating-point type. There is also package Strings_Edit.Floats which is an instantiation of Float_Edit with the type Float as the parameter. The package defines the following subprograms:

procedure Get
          (  Source  : in String;
             Pointer : in out Integer;
             Value   : out Number'Base;
             Base    : NumberBase  := 10;
             First   : Number'Base := Number'First;
             Last    : Number'Base := Number'Last;
             ToFirst : Boolean     := False;
             ToLast  : Boolean     := False
          );

This procedure gets a number from the string Source. The process starts from Source (Pointer). The number in the string may be in either floating-point or fixed-point format. The point may be absent. The mantissa can have base 2..16 (defined by the parameter Base). The exponent part (if appears) is introduced by 'e' or 'E'. It is always decimal of Base radix. Space characters are allowed between the mantissa and the exponent part as well as in the exponent part around the exponent sign. If Base has the value 15 or 16 the exponent part shall be separated by at least one space character from the mantissa. The exception Constraint_Error is propagated if the number is not in the range First..Last. Data_Error indicates a syntax error in the number. End_Error is raised when no number was detected. Layout_Error is propagated when Pointer is not in the range Source'First .. Source'Last + 1. See also description of get procedures.

function Value
         (  Source  : String;
            Base    : NumberBase  := 10;
            First   : Number'Base := Number'First;
            Last    : Number'Base := Number'Last;
            ToFirst : Boolean     := False;
            ToLast  : Boolean     := False
         )  return Number'Base;

This function gets a floating-point number from the string Source. The number can be surrounded by spaces and tabs. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. Also Data_Error indicates a syntax error in the number. The exception Constraint_Error is propagated if the number is not in the range First..Last. End_Error is raised when no number was detected.

procedure Put
          (  Destination : in out String;
             Pointer     : in out Integer;
             Value       : Number'Base;
             Base        : NumberBase := 10;
             PutPlus     : Boolean    := False;
             RelSmall    : Positive   := MaxSmall;
             AbsSmall    : Integer    := -MaxSmall;
             Field       : Natural    := 0;
             Justify     : Alignment  := Left;
             Fill        : Character  := ' '
          );

This procedure places the number specified by the parameter Value into the output string Destination. The string is written starting from Destination (Pointer). The parameter Base indicates the number base used for the output. Base itself does not appear in the output. The exponent part (if used) is always decimal. PutPlus indicates whether the plus sign should be placed if the number is positive. There are two ways to specify the output precision:

From two parameters RelSmall and AbsSmall, the procedure chooses one, that specifies the minimal number of mantissa digits, but no more than the machine representation of the number allows. If the point would appear in the rightmost position it is omitted. The pure zero is always represented as 0. If the desired number of digits may be provided in the fixed-point format then the exponent part is not used. For example, 1.234567e-04 gives 0.0001234567 because fixed- and floating-point formats have the same length. But 1.234567e-05 will be shown in the floating-point format. For bases 15 and 16 the exponent part is separated from the mantissa by space (to avoid ambiguity: F.Ee+2 is F.EE + 2 or F.E * 16**2?). The exception Layout_Error is propagated when Pointer is not in Destination'Range or there is no room for the output.

function Image
         (  Value    : Number'Base;
            Base     : NumberBase := 10;
            PutPlus  : Boolean    := False
            RelSmall : Positive   := MaxSmall;
            AbsSmall : Integer    := -MaxSmall;
         )  return String;

This procedure converts the parameter Value to String. The parameter Base indicates the number base used for the output. Base itself does not appear in the output. The exponent part (if used) is always decimal. PutPlus indicates whether the plus sign should be placed if the number is positive. For precision parameters see Put.

The package Strings_Edit.Floats is an instance of Strings_Edit.Float_Edit with Float as the parameter.


[Back][TOC][Next]

7. UTF-8

The package Strings_Edit.UTF8 is the parent package for dealing with Unicode Transformation Format UTF-8 encoded strings. Ada 95 supports Latin-1 (type Character) and UCS-2 (Wide_Character) of ISO 10646 BMP. Ada 2005 introduces UCS-4 encoding (Wide_Wide_Character). This variety of encodings when used in one program imposes certain difficulties. Further many applications and libraries use rather UTF-8, which has sufficient advantages over UCS. For these reasons UTF-8 support is provided here.

Since UTF-8 was designed for backward compatibility with 7-bit ASCII applications and is a multi-byte encoding format, I chose not to introduce a separate string type for UTF-8. Conventional Ada strings are used instead. It is important to note:

The package defines the type UTF8_Code_Point that represents the Unicode code space:

type Code_Point is mod 2**32;
subtype UTF8_Code_Point is Code_Point range 0..16#10FFFF#;

The following subroutines are provided by the package:

procedure Get
          (  Source  : String;
             Pointer : in out Integer;
             Value   : out UTF8_Code_Point
          );

This procedure decodes one UTF-8 code point from the string Source. It starts at Source (Pointer). After successful completion Pointer is advanced to the first character following the input. The result is returned through the parameter Value.

Exceptions
Data_Error Illegal UTF-8 string Source
End_Error Nothing found. Pointer = Source'Last + 1
Layout_Error Pointer is not in Source'First..Source'Last + 1

procedure Get_Backwards
          (  Source  : String;
             Pointer : in out Integer;
             Value   : out UTF8_Code_Point
          );

This procedure decodes one UTF-8 code point from the string Source in reverse. It starts at Source (Pointer - 1) assuming that it is the last octet of an UTF-8 encoded character. After successful completion Pointer is moved to the first character of  the input. The result is returned through the parameter Value.

Exceptions
Data_Error Illegal UTF-8 string Source
End_Error Nothing found. Pointer = Source'First
Layout_Error Pointer is not in Source'First..Source'Last + 1

function Image (Value : UTF8_Code_Point) return String;

This function is a simplified version of the procedure Put. It returns UTF-8 encoded Value.

function Length (Source : String) return Natural;

This procedure evaluates the length of a UTF-8 encoded string in code points. Data_Error is propagated when Source is not a valid UTF-8 string.

procedure Put
          (  Destination : in out String;
             Pointer     : in out Integer;
             Value       : UTF8_Code_Point
          );

This procedure puts one UTF-8 code point into the string Source starting from the position Source (Pointer). Pointer is then advanced to the first character following the output. Layout_Error is propagated when Pointer is not in Destination'Range or there is no room for output. Note that parameters Field, Justify and Fill usual for other Put-procedures would have no meaning here.

procedure Skip
          (  Source  : String;
             Pointer : in out Integer;
             Count   : Natural := 1
          );

This procedure skips Count UTF-8 encoded code points in the string Source starting from Source (Pointer). After successful completion Pointer indicates is the first character following the skipped UTF-8 encoded sequence.

Exceptions
Data_Error Illegal UTF-8 string Source
End_Error Less than Count characters detected before the string end
Layout_Error Pointer is not in Source'First..Source'Last + 1

function Value (Source : String) return UTF8_Code_Point;

This function decodes one UFT-8 code point stored in Source. The whole string Source should be matched. Otherwise the exception Data_Error is propagated. It is also propagated when Source is not a legal UTF-8 string.

type Code_Points_Range is record
   Low  : UTF8_Code_Point;
   High : UTF8_Code_Point;
end record;

This type represents a range of code points Low..High:

Full_Range : constant Code_Points_Range;

A range that contains all code points.

type Code_Points_Ranges is
   array
(Positive range <>) of Code_Points_Range;

An array of code points ranges.

7.1. Handling UTF-8 strings

The package Strings_Edit.UTF8.Handling provides the following conversion functions between UTF-8 encoded strings and Ada strings:

function To_String (Value : String) return String;
function To_String
         (  Value      : String;
            Substitute : Character
         )  return String;

These functions convert a UTF-8 encoded string to Latin-1 character string (standard Ada string). The parameter Substitute specifies the character that substitutes non-Latin-1 code points in Value. If omitted Constraint_Error is propagated when a non-Latin-1 code point appears in Value.

Exceptions
Constraint_Error Non-Latin-1 code point detected
Data_Error Illegal UTF-8 string Value

function To_UTF8 (Value : Character     ) return String;
function To_UTF8 (Value : String        ) return String;
function To_UTF8 (Value : Wide_Character) return String;
function To_UTF8 (Value : Wide_String   ) return String;

These functions convert the parameter Value to a UTF-8 encoded string. The parameter can be Character, String, Wide_Character or Wide_String. The result of a character conversion can be from 1 to 3 bytes long. Note that Ada's Character has Latin-1 encoding which differs from UTF-8 in the code positions greater than 127.

function To_Wide_String (Value : String) return Wide_String;
function To_Wide_String
         (  Value      : String;
            Substitute : Wide_Character
         )  return Wide_String;

These functions convert a UTF-8 encoded string to UCS-2 character string (Ada's Wide_String). The parameter Substitute specifies the character that substitutes non-UCS-2 code positions in Value. If omitted Constraint_Error is propagated when a non-UCS-2 code point appears in Value.

Exceptions
Constraint_Error Non-UCS-2 code point detected
Data_Error Illegal UTF-8 string Value

7.2. Generic integer I/O of UTF-8 strings

The package Strings_Edit.UTF8.Integer_Edit provides integer I/O for special encodings of digits, such as subscript and superscript.

generic
   type Number is range <>;
   with procedure Get_Digit
                  (  Source  : String;
                     Pointer : in out Integer;
                     Digit   : out Natural
                  )  is <>;
   with procedure Get_Sign
                  (  Source  : String;
                     Pointer : in out Integer;
                     Sign_Of : out Sign
                  )  is <>;
   with procedure Put_Digit
                  (  Destination : in out String;
                     Pointer     : in out Integer;
                     Digit       : Script_Digit
                  )  is <>;
   with procedure Put_Sign
                  (  Destination : in out String;
                     Pointer     : in out Integer;
                     Sign_Of     : Sign
                  )  is <>;
package Strings_Edit.UTF8.Integer_Edit is
   ...

The generic parameters of the package are:

The package provides the following procedures and functions:

procedure Get
          (  Source  : in String;
             Pointer : in out Integer;
             Value   : out Number'Base;
             Base    : Script_Base := 10;
             First   : Number'Base := Number'First;
             Last    : Number'Base := Number'Last;
             ToFirst : Boolean     := False;
             ToLast  : Boolean     := False
          );

function Value
         (  Source  : String;
            Base    : Script_Base := 10;
            First   : Number'Base := Number'First;
            Last    : Number'Base := Number'Last;
            ToFirst : Boolean     := False;
            ToLast  : Boolean     := False
         )  return Number'Base;

procedure Put
          (  Destination : in out String;
             Pointer     : in out Integer;
             Value       : Number'Base;
             Base        : Script_Base := 10;
             PutPlus     : Boolean     := False
          );

function Image
         (  Value   : Number'Base;
            Base    : Script_Base := 10;
            PutPlus : Boolean     := False
         )  return String;

These subroutines work exactly as ones of String_Edit.Integer_Edit with the difference that the number base is specified by the parameter of Script_Base type defined in Strings_Edit.UTF8 as an integer type with the range 2..10.

7.3. Subscript UTF-8 integer I/O

The generic package Strings_Edit.UTF8.Subscript.Integer_Edit is a specialization of Strings_Edit.UTF8.Integer_Edit for integer I/O of subscript numbers.

generic
   type Number is range <>;
package Strings_Edit.UTF8.Subscript.Integer_Edit is
   ...

The package provides the subroutines described in Strings_Edit.UTF8.Integer_Edit.

A necessary note. If you plan to use sub- and superscripts under Microsoft Windows XP, you probably will have a problem with displaying the corresponding glyphs. The reason for this is that the standard Windows font Tahoma does not contain glyphs for Unicode sub- and superscripts. You can solve this problem by choosing a font which has them. A good candidate, very close to Tahoma is Arial Unicode MS. Go to the Control Panel → Display → Appearance → Advanced and there change the font by clicking on the corresponding sample texts.

This package has a non-generic instantiation with the type Integer: Strings_Edit.Integers.Subscript.

7.4. Superscript UTF-8 integer I/O

The generic package Strings_Edit.UTF8.Superscript.Integer_Edit is a specialization of Strings_Edit.UTF8.Integer_Edit for integer I/O of superscript numbers.

generic
   type Number is range <>;
package Strings_Edit.UTF8.Superscript.Integer_Edit is
   ...

The package provides the subroutines described in Strings_Edit.UTF8.Integer_Edit.

A necessary note. If you plan to use sub- and superscripts under Microsoft Windows XP, you probably will have a problem with displaying the corresponding glyphs. The reason for this is that the standard Windows font Tahoma does not contain glyphs for Unicode sub- and superscripts. You can solve this problem by choosing a font which has them. A good candidate, very close to Tahoma is Arial Unicode MS. Go to the Control Panel → Display → Appearance → Advanced and there change the font by clicking on the corresponding sample texts.

This package has a non-generic instantiation with the type Integer: Strings_Edit.Integers.Superscript.

7.5. Wildcard-matching of UTF-8 strings

The package Strings_Edit.UTF8.Wildcards is provides the following subprograms:

function Match
         (  Text       : String;
            Pattern    : String;
            Wide_Space : Boolean := False;
            Blanks     : Character_Set := SpaceAndTab
         )  return Boolean;

The function matches the string Text against the wildcard pattern Pattern. Both Text and Pattern are UTF-8 encoded strings. Pattern may contain asterisk characters (*) treated as wildcards to match any (possibly empty) sequence of UTF-8 characters. The number of wildcards in Pattern is not limited. Additionally if the parameter Wide_Space is true, space characters in Pattern match any non-empty sequence of characters from the set Blanks. Note that when Blanks contain non-ASCII characters (with the code points 128..255), those will match any UTF-8 characters starting with this octet. When Wide_Space is false Blanks is ignored and space matches as an ordinal character. The result of the function is true when Pattern matches all Text. A typical use of this function is to filter file names using patterns like *.txt. The result is undefined (either true or false) when Text and/or Pattern are illegal UTF-8 strings.

7.6. Case mapping

The package Strings_Edit.UTF8.Mapping provides Unicode mapping of code points and UTF-8 encoded strings. It provides the following subprograms:

function Has_Case (Value : UTF8_Code_Point) return Boolean;

The function returns true if the code point Value has upper or lower case equivalents different from Value. For all x, Has_Case (x) = Is_Lowercase (x) or Is_Uppercase (x). Note that not all Unicode code points have equivalents (simple case mapping). Also there exist points different from either of the equivalents. I.e. x /= To_Lower (x) and x /= To_Upper (x). Refer to Unicode standard for further information.

function Is_Lowercase (Value : UTF8_Code_Point) return Boolean;

The function returns true if the code point Value is lower case.

function Is_Uppercase (Value : UTF8_Code_Point) return Boolean;

The function returns true if the code point Value is upper case.

function To_Lowercase (Value : UTF8_Code_Point)
   return
UTF8_Code_Point;

The function returns a lowercase equivalent of Value. The result is Value if no equivalent exists.

function To_Lowercase (Value : String) return String;

The function converts its argument to lower case. Constraint_Error is propagated when Value is an illegal UTF-8 string.

function To_Uppercase (Value : UTF8_Code_Point)
   return
UTF8_Code_Point;

The function returns an uppercase equivalent of Value. The result is Value if no equivalent exists.

function To_Uppercase (Value : String) return String;

The function converts its argument to upper case. Constraint_Error is propagated when Value is an illegal UTF-8 string.

Implementation nodes. The implementation is based on the upper and lower case mappings as defined by the Unicode standard. Presently these mappings are specified in the UnicodeData.txt file which can be downloaded at the link. It contains about three thousand of Unicode code points which have upper or lower case equivalents. The implementation has an internal sorted array of mappings searched binary. I.e. the efficiency is O(log23103). In case of future changes and extensions of this file, the subdirectory test_strings_edit contains a an utility program strings_edit-utf8-mapping_generator.adb which can be used to adjust the implementation of the package. In order to do this, the utility must be built. Using GNAT Ada compiler for instance:

>gnatmake -I ../ strings_edit-utf8-mapping_generator.adb

Then it is called as follows:

>strings_edit-utf8-mapping_generator ../strings_edit-utf8-mapping.adb UnicodeData.txt

This will replace the internal representation of Unicode case mappings in the source code of the package Strings_Edit.UTF8.Categorization.

7.7. Unicode categorization

The package Strings_Edit.UTF8.Categorization provides code points categorization as defined by the Unicode standard. The enumeration type General_Category represents the categories:

type General_Category is (Lu, ...);

The type has the following values:

Value Description Value Description
Lu Uppercase letter Sm Math symbol
Ll Lowercase letter Sc Currency symbol
Lt Titlecase letter Sk Modifier symbol
Lm Modifier letter So Other symbol
Lo Other letter Zs Space separator
Mn Non-spacing mark Zl Line separator
Mc Spacing combining mark Zp Page separator
Me Enclosing mark Cc Control
Nd Decimal digit (number) Cf Format
Nl Letter (number) Cs Surrogate
No Other number Co Private use
Pc Connector punctuation Cn Not assigned
Pd Dash punctuation  
Ps Open punctuation
Pe Close punctuation
Pi Initial quote punctuation
Pf Final quote punctuation
Po Other punctuation

subtype Letter      is General_Category range Lu..Lo;
subtype Mark        is General_Category range Mn..Me;
subtype Mumber      is General_Category range Nd..No;
subtype Punctuation is General_Category range Pc..Po;
subtype Symbol      is General_Category range Sm..So;
subtype Separator   is General_Category range Zs..Zp;
subtype Other       is General_Category range Cc..Cn;

function Category (Value : UTF8_Code_Point) return General_Category;

The function returns the category of Value.

Additionally the package defines the following indicator functions for commonly used sets of code points:

function Is_Alphanumeric (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a letter (Lu...Lo) or else a decimal digit (Nd) code point.

function Is_Control (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a control code (Cc) point.

function Is_Digit (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a decimal digit (Nd) code point.

function Is_Identifier_Extend (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is represents a character valid in the body of an Ada 2005 identifier additionally to the characters valid at the identifier beginning (ARM 2.3(3.1/2)).

function Is_Identifier_Start (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is represents a character valid at the beginning of an Ada 2005 identifier (ARM 2.3(3/2)).

function Is_ISO_646 (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is represents an ASCII character (ISO 646, 7-bit).

function Is_Letter (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a letter (Lu..Lo) code point.

function Is_Lower (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a lowercase letter (Ll) code point. This function is equivalent to Is_Lowercase.

function Is_Other_Format (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a format (Cf) code point. Such code points are usually ignored when strings are compared as words. For example, soft hyphen (AD16) has this category.

function Is_Space (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a space (Zs) code point.

function Is_Subscript_Digit (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a subscript decimal digit code point.

function Is_Superscript_Digit (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a superscript decimal digit code point.

function Is_Title (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is a title case letter (Lt) code point.

function Is_Upper (Value : UTF8_Code_Point) return Boolean;

The function returns true if Value is an uppercase letter (Lu) code point. This function is equivalent to Is_Uppercase.

Implementation nodes. The implementation is based on the general category values defined by the Unicode standard. Presently these mappings are specified in the UnicodeData.txt file which can be downloaded at the link. In case of future changes and extensions of this file, the subdirectory test_strings_edit contains a an utility program strings_edit-utf8-categorization_generator.adb which can be used to adjust the implementation of the package. In order to do this, the utility must be built. Using GNAT Ada compiler for instance:

>gnatmake -I ../ strings_edit-utf8-categorization_generator.adb

Then it is called as follows:

>strings_edit-utf8-categorization_generator ../strings_edit-utf8-categorization.adb UnicodeData.txt

This will replace the internal representation of Unicode case mappings in the source code of the package Strings_Edit.UTF8.Categorization.

7.8. Blocks

The package Strings_Edit.UTF8.Blocks provides ranges of code points for the Unicode blocks. See Blocks.txt file. The names of the ranges used in the package match the names used Blocks.txt after substitution spaces and hyphens to underline. For example, "Basic Latin" of Blocks.txt is named Basic_Latin in the package. The ranges of code points can be used for construction of code points sets (see To_Set and set-theoretic operations declared in Strings_Edit.UTF8.Maps).

7.9. Sets and maps

The package Strings_Edit.UTF8.Maps provides sets and maps of code points. The package mimics the standard library package Ada.Strings.Maps (ARM A.4.2) augmented for dealing with Unicode in UTF-8 encoding. The operations of Ada.Strings.Maps are extended onto the cases when sets and ranges are intermixed. Because Unicode sets can be potentially very large the implementation supports composition of an indicator function with a set of sorted ranges in order to reduce required space by conjunction (and). Similarly for maps two representations are supported. One by a sorted array and another by a function. Reference counting is used to provide efficient assignments of sets and maps.

7.9.1. Sets

The type Unicode_Set represents sets of code points.

type Unicode_Set is private;

The following type defines an access to indicator function of a code points set.:

type Unicode_Indicator_Function is
   access function
(Value : UTF8_Code_Point) return Boolean;

The type Unicode_Set has the following operations defined:

function "not" (Right : Unicode_Set) return Unicode_Set;
   function
"not" (Right : String) return Unicode_Set;
   function "not" (Right : Code_Points_Range) return Unicode_Set;
function "and" (Left, Right : Unicode_Set) return Unicode_Set;
   function "and" (Left : Unicode_Set; Right : Code_Points_Range) return Unicode_Set;
   function "and" (Left : Code_Points_Range; Right : Unicode_Set) return Unicode_Set;
   function "and" (Left : Unicode_Set; Right : String) return Unicode_Set;
   function "and" (Left : String; Right : Unicode_Set) return Unicode_Set;
function "or" (Left, Right : Unicode_Set) return Unicode_Set;
   function "or" (Left : Unicode_Set; Right : Code_Points_Range) return Unicode_Set;
   function "or" (Left : Code_Points_Range; Right : Unicode_Set) return Unicode_Set;
   function "or" (Left : Unicode_Set; Right : String) return Unicode_Set;
   function "or" (Left : String; Right : Unicode_Set) return Unicode_Set;
function "xor" (Left, Right : Unicode_Set) return Unicode_Set;
   function "xor" (Left : Unicode_Set; Right : Code_Points_Range) return Unicode_Set;
   function "xor" (Left : Code_Points_Range; Right : Unicode_Set) return Unicode_Set;
   function "xor" (Left : Unicode_Set; Right : String) return Unicode_Set;
   function "xor" (Left : String; Right : Unicode_Set) return Unicode_Set;
function "-" (Left, Right : Unicode_Set) return Unicode_Set;
   function "-" (Left : Unicode_Set; Right : Code_Points_Range) return Unicode_Set;
   function "-" (Left : Code_Points_Range; Right : Unicode_Set) return Unicode_Set;
   function "-" (Left : Unicode_Set; Right : String) return Unicode_Set;
   function "-" (Left : String; Right : Unicode_Set) return Unicode_Set;

These functions provide set-theoretic operations on two sets or a set and a range of points or else an UTF-8 encoded string. When one of the arguments is a string then it is treated as a set consisting of the code points found in the string. Data_Error is propagated when a string parameter is not a valid UTF-8 string. A - B is defined as A and not B. A range of points is considered empty if its lower bound is higher than the upper bound.

function "=" (Left, Right : Unicode_Set) return Boolean;
   function "=" (Left : Unicode_Set; Right : Code_Points_Range) return Boolean;
   function "=" (Left : Code_Points_Range; Right : Unicode_Set) return Boolean;
   function "=" (Left : Unicode_Set; Right : String) return Boolean;
   function "=" (Left : String; Right : Unicode_Set) return Boolean;
function "<" (Left, Right : Unicode_Set) return Boolean;
   function "<" (Left : Unicode_Set; Right : Code_Points_Range) return Boolean;
   function "<" (Left : Code_Points_Range; Right : Unicode_Set) return Boolean;
   function "<" (Left : Unicode_Set; Right : String) return Boolean;
   function "<" (Left : String; Right : Unicode_Set) return Boolean;
function "<=" (Left, Right : Unicode_Set) return Boolean;
   function "<=" (Left : Unicode_Set; Right : Code_Points_Range) return Boolean;
   function "<=" (Left : Code_Points_Range; Right : Unicode_Set) return Boolean;
   function "<=" (Left : Unicode_Set; Right : String) return Boolean;
   function "<=" (Left : String; Right : Unicode_Set) return Boolean;
function ">" (Left, Right : Unicode_Set) return Boolean;
   function ">" (Left : Unicode_Set; Right : Code_Points_Range) return Boolean;
   function ">" (Left : Code_Points_Range; Right : Unicode_Set) return Boolean;
   function ">" (Left : Unicode_Set; Right : String) return Boolean;
   function ">" (Left : String; Right : Unicode_Set) return Boolean;
function ">=" (Left, Right : Unicode_Set) return Boolean;
   function ">=" (Left : Unicode_Set; Right : Code_Points_Range) return Boolean;
   function ">=" (Left : Code_Points_Range; Right : Unicode_Set) return Boolean;
   function
">=" (Left : Unicode_Set; Right : String) return Boolean;
   function ">=" (Left : String; Right : Unicode_Set) return Boolean;

These functions provide relational operations on two sets or a set and a range of points or else an UTF-8 encoded string. When one of the arguments is a string then it is treated as a set consisting of the code points found in the string. Data_Error is propagated when a string parameter is not a valid UTF-8 string. The operations < and <= are defined in the sense ⊂ and ⊆ correspondingly.

function Cardinality (Set : Unicode_Set) return Natural;

This function returns the number of elements in Set.

function Choose
         (  Set       : Unicode_Set;
            Indicator : Unicode_Indicator_Function
         )  return Unicode_Set;

This function returns a set consisting of the elements of Set chosen by the function Indicator. When Indicator is null the result is Set. When this function creates a new set, its representation is based on a ranges list and does not refer to Indicator. This should be only be used for compact sets.

procedure Get
          (  Source  : String;
             Pointer : in out Integer;
             Blanks  : Unicode_Set
          );

This procedure skips all code points from the set Blank starting from Source (Pointer). After completion Pointer is either Source'Last + 1, or the first character of the first code point outside Blanks, or else to the first improperly encoded character. Layout_Error is propagated when Pointer is not in the range Source'First..Source'Last + 1.

function Is_Empty (Set : Unicode_Set) return Boolean;
function Is_Range (Set : Unicode_Set) return Boolean;
function Is_Singleton (Set : Unicode_Set) return Boolean;
function Is_Universal (Set : Unicode_Set) return Boolean;

These functions test a set for being empty, a range, a singleton or a full set of code points.

function Is_In
         (  Element : Character;
            Set     : Unicode_Set
         )  return Boolean;
function Is_In
         (  Element : Wide_Character;
            Set     : Unicode_Set
         )  return Boolean;
function Is_In
         (  Element : UTF8_Code_Point;
            Set     : Unicode_Set
         )  return Boolean;

These functions provide membership tests. The first parameter can be a code point, a Latin-1 character or a wide character.

function Is_Subset
         (  Elements : Code_Points_Range;
            Set      : Unicode_Set
         )  return Boolean renames "<=";
function Is_Subset
         (  Elements : Code_Points_Ranges;
            Set      : Unicode_Set
         )  return Boolean;
function Is_Subset
         (  Elements : String;
            Set      : Unicode_Set
         )  return Boolean renames "<=";
function Is_Subset
         (  Elements : Unicode_Set;
            Set      : Unicode_Set
         )  return Boolean renames "<=";

These functions provide subset tests. The first parameter can be a code points range, a set, an UTF-8 encoded string. When the parameter is a string then the result true if all code points of the string belong to the set. Data_Error is propagated when it is not a valid UTF-8 string.

function To_Ranges (Set : Unicode_Set) return Code_Points_Ranges;

This function returns an array of disjoint ascending ranges representing the set. The result is an empty array if the parameter is an empty set.

function To_Set (Singleton : UTF8_Code_Point)    return Unicode_Set;
function To_Set (Singleton : Character)          return Unicode_Set;
function To_Set (Singleton : Wide_Character)     return Unicode_Set;
function To_Set (Span      : Code_Points_Range)  return Unicode_Set;
function To_Set (Ranges    : Code_Points_Ranges) return Unicode_Set;
function To_Set (Low, High : UTF8_Code_Point)    return Unicode_Set;
function To_Set (Sequence  : String)             return Unicode_Set;
function
To_Set (Indicator : Unicode_Indicator_Function)
   return Unicode_Set;

These functions convert

to the corresponding set. When the parameter is a string then the set will contain all and nothing but all characters from the string. Data_Error exception is propagated when the argument is not a properly encoded UTF-8 string. When the parameter is an array of ranges, the result is the union of them. When the parameter specifies an indicator function the result is a set corresponding to the function. It is the universal set when Indicator is null. Differently to Choose the result refers to Indicator.

function To_Sequence (Set : Unicode_Set) return String;

This function returns an UTF-8 encoded string corresponding to the code points of the set. Each code point of Set is represented in the string. They are ordered in ascending order. The following relation holds To_Set (To_Sequence (x)) = x. Constraint_Error is propagated when the result is too large to be represented as a string.

function Trim (Source : String; Blanks : Unicode_Set) return String;

This function returns the content of Source with the characters representing UTF-8 code points from the set Blanks removed from both ends of it. Data_Error is propagated when Source is not a valid UTF-8 string.

generic
   with function
Indicator (Value : UTF8_Code_Point) return Boolean;
function Generic_Choose (Set : Unicode_Set) return Unicode_Set;

This is a generic variant of the function Choose.

Null_Set      : constant Unicode_Set;
Universal_Set : constant Unicode_Set;

The empty and the universal set constants.

To use Unicode_Set effectively one should consider its implementation. A set code points is represented by a conjunction of a sorted array of ranges and an indicator function. A code point belongs to the set when both the indicator function returns true and it is in the array. Such set can be constructed for example like:

Cyrillic_Letters : constant Unicode_Set :=
   To_Set (Is_Letter'Access) and Cyrillic;

Here Is_Letter is an indicator function, which selects only letters. Cyrillic is a range of code points defined in Strings_Edit.UTF8.Blocks. When this set is combined with other sets of ranges using only intersection the implementation will keep this representation. When other operations like complement and disjunction get involved, the representation can be flattened by removing the indicator function from it. For large disjoint sets it might be very inefficient. The set is also flattened when the operation Choose is applied, which might be necessary to do if the indicator function is not declared at the library level, for example. For the example above set of Cyrillic letters could be obtained represented by an array of ranges:

Cyrillic_Letters : constant Unicode_Set :=
   Choose (To_Set (Cyrillic), Is_Letter'Access);

7.9.2. Maps

The type Unicode_Mapping represents a mapping of the set of Unicode code points to itself.

type Unicode_Mapping is private;

The following the type defines access to a mapping function.

type Unicode_Mapping_Function is
   access function
(Value : UTF8_Code_Point)
      return UTF8_Code_Point;

It has the following operations defined on Unicode_Mapping:

function Is_Prefix
         (  Prefix : String;
            Source
: String;
            Map
    : Unicode_Mapping
         )  return Boolean;

This function returns true if Prefix is a prefix of Source with respect to the mapping represented by Map. An empty string is a prefix of any string. Data_Error is propagated when Prefix or Source are not properly encoded UTF-8 strings.

function Is_Prefix
         (  Prefix  : String;
            Source
  : String;
            Pointer
: Integer;
            Map     : Unicode_Mapping
         )  return Boolean;

This function returns true if Prefix is a prefix of Source (Pointer..Source'Last)with respect to the mapping represented by Map. An empty string is a prefix of any substring. The result is false if Pointer is not in the range Source'First..Source'Last + 1. Data_Error is propagated when Prefix or Source are not properly encoded UTF-8 strings.

function Value
         (  Map     : Unicode_Mapping;
            Element
: Character
         )  return UTF8_Code_Point;
function Value
         (  Map     : Unicode_Mapping;
            Element : Wide_Character
         )  return UTF8_Code_Point;
function Value
         (  Map     : Unicode_Mapping;
            Element : UTF8_Code_Point
         )  return UTF8_Code_Point;

These functions return the code point corresponding to the parameter Element in the mapping Map. The parameter can be a code point, Latin-1 or wide character.

function To_Domain (Map : Unicode_Mapping) return String;

This function returns an UTF-8 string of ascending code points x such that Value (Map, x) /= x. Constraint_Error is propagated when the result is too large to be represented as a string.

function To_Mapping (From, To : String) return Unicode_Mapping;

This function creates a new mapping. The parameters are UTF-8 encoded strings. For nth code point of From the resulting mapping yields the nth code point of To. For all other code points the mapping acts as an identity mapping. When From contains repeating code points or else the numbers of code points in From and To differ Translation_Error is propagated. Data_Error is propagated when From or To is an invalid UTF-8 string.

function To_Mapping (Map : Unicode_Mapping_Function)
   return Unicode_Mapping;

This function creates a new mapping from a function Map. The result is identity mapping when Map is null.

function To_Range (Map : Unicode_Mapping) return String;

The result is an UTF-8 string of code points x such that the original of x is not x. I.e. x such that Value (Map, y) = x and y /= x. The points in the result are ordered by y. I.e. x1 precedes x2 iff y1 < y2 and Value (Map, y1) = x1, Value (Map, y2) = x2. Constraint_Error is propagated when the result is too large to be represented as a string.

Identity : constant Unicode_Mapping;

This mapping maps each code point to itself.

7.9.3. Constants

The package Strings_Edit.UTF8.Maps.Constants defines some commonly used sets:

Alphanumeric_Set      : constant Unicode_Set;
Control_Set           : constant Unicode_Set;
Digit_Set             : constant Unicode_Set;
Identifier_Extend_Set : constant Unicode_Set;
Identifier_Start_Set  : constant Unicode_Set;
ISO_646_Set           : constant Unicode_Set;
Letter_Set            : constant Unicode_Set;
Lower_Set             : constant Unicode_Set;
Other_Format_Set      : constant Unicode_Set;
Space_Set             : constant Unicode_Set;
Subscript_Digit_Set   : constant Unicode_Set;
Superscript_Digit_Set : constant Unicode_Set;
Title_Set             : constant Unicode_Set;
Upper_Set             : constant Unicode_Set;

See Strings_Edit.UTF8.Categorization for information about the code points contained by the sets. The package defines the following maps:

Lower_Case_Map : constant Unicode_Mapping;
Upper_Case_Map : constant Unicode_Mapping;


[Back][TOC][Next]

8. Fields

The package Strings_Edit.Fields can be used to write new Put-procedures, when the output size cannot be easily estimated. It contains two subprograms Get_Output_Field and Adjust_Output_Field. Get_Output_Field is used to calculate the available space in the output string. It raises Layout_Error exception as necessary. The program can then output into that space and call Adjust_Output_Field to move the output within the output field, fill and advance the string pointer. The following code fragment shows how it could be made:

procedure Put
          (  Destination : in out String;
             Pointer     : in out Integer;
             Value       : Something;
             Field       : Natural   := 0;
             Justify     : Alignment := Left;
             Fill        : Character := ' '
          )  is
   Out_Field : constant Natural :=
      Get_Output_Field (Destination, Pointer, Field);
   subtype Output is String (Pointer..Pointer + Out_Field - 1);
   Text : Output renames
      Destination (Pointer..Pointer + Out_Field - 1);
   Index : Integer := Pointer;
begin
   --
   -- The output for
Value is done in Text using Index as the pointer
   --

   Adjust_Output_Field
   (  Destination,
      Pointer,
      Index,
      Out_Field,
      Field,
      Justify,
      Fill
   );
end Put;


[Back][TOC][Next]

9. Generic axis scales

When an axis of the plotted curve need to be annotated with the values, it is desirable that the ticks supplied with values have "good" figures, like 0.5 or 0.1 etc. The generic package Strings_Edit.Generic_Scale can be used to ease implementation of such plotters:

generic
   type
Value is digits <>;
package Strings_Edit.Generic_Scale is ...

Its formal parameter is the type of the axis values. The package provides the type Scale:

type Scale is record
   Minor     : Value'Base;
   Low_Value : Value;
   Low_Tick  : Natural;
   Ticks     : Natural;
   Small     : Integer;
end record;

which describes the axis appearance:

here the fields of Scale are:

The function Create evaluates Scale for the given range of values:

function Create (Low, High : Value; Count : Natural) return Scale;

The parameters Low and High determine the interval of values. When LowHigh, Constraint_Error is propagated. The parameter Count is the desired number of major ticks on the scale. Typically it is determined from the scale length in the plot units divided to the optimal major tick length in the same units. (Note that the function outcome is independent on whatever plot units are used.) The result number of major ticks is greater or equal to Count. When Count is 0, it is treated as if it were 1. The major tick length is selected to be n10k where n=1, 2, 5. Here k determines the field Small of the result. Thus when, for example, n is 2 and k is -3 the values corresponding to the major ticks would be like 0.102, 0.104, 0.106 etc. I.e. the major tick step is 210-3. The number m of minor ticks depends on n It is m=1 when n=1 or n=2 and m=4, when n=5. The field Ticks of the result is m. The field Minor of the result is n10k/m.

A typical axis plot using Scale might look as follows:

   Ticks : Scale   := Create (Low, High, Size / Major_Tick_Size);
   Minor : Natural := Ticks.Low_Tick;
   Value : Number  := Ticks.Low_Value;
begin
   while position of Value in the plot range [in plot units] loop
      if Minor = 0 or else Minor > Ticks.Ticks then
         -- Major tick
         draw major tick at the position of Value [in plot units]
         draw its value Image (Value, AbsSmall => Scale.Small);
         Minor := 1;
      else
         -- Minor tick
         draw minor tick at the position of Value [in plot units]
         Minor := Minor + 1;
      end if;
      Value := Value + Ticks.Minor;
   end loop;

Note that the field Small specifies the absolute precision. Therefore, very narrow ranges of large absolute values would probably require a shift to avoid ticks values like 956.611, 956.612, 956.613,... (with Small=-3). Such cases can be detected as log10(Low) >> Small. The difference between these two values indicates how many decimal places would appear before the last one, corresponding to the major tick "heartbeat."


[Back][TOC][Next]

10. String streams

The package Strings_Edit.Streams provides an implementation of streams to read from and write to strings. The package declares the type String_Stream:

type String_Stream (Length : Natural) is
   new
Root_Stream_Type with
record

   Position : Positive := 1;
   Data     : String (1..Length);
end record;

The field Position is the position at which the string is to read or to write. The field Data is the string backing the stream. When written stream elements are placed into Data starting from Position. Where read they are taken from Data at Position. In both cases Position is advanced. The implementation of Write propagates End_Error exception when there is no room for output. Note that initially the position is set to 1, which means that the stream is ready to be written, but also is filled with garbage. The stream is used as follows:

function Get (Stream : String_Stream) return String;

This function returns written contents of Stream. It is used together with the attributes T'Write.and T'Output. First the stream is written, then Get is called to obtain its contents.

function Get_Size (Stream : String_Stream)
   return Stream_Element_Count;

This function returns number of stream elements available to write or to read.

procedure Rewind (Stream : in out String_Stream);

This procedure sets Stream Position to 1. This operation undoes read and write operations done before.

procedure Set (Stream : in out String_Stream; Content : String);

This procedure sets Stream to contain Content. The next read operation will yield the first character of Content. Set is an operation inverse to the attributes T'Read and T'Input, which it should be used with. First the buffer contents is set using this procedure. Then the stream is read out. Constraint_Error is propagates when Stream'Length < Content'Length.

Note, this implementation requires that Stream_Element'Size be a multiple of Character'Size and the latter be a multiple of Storage_Element'Size.

 


[Back][TOC][Next]

11. Lexicographical comparisons

The package Strings_Edit.Lexicographical_Order provides comparisons of strings using lexicographical order. The package provides the following types:

type Precedence is (Less, Equal, Greater);

and the following operations:

function Compare_Textually (Left, Right : String) return Precedence;

This function compares two strings as texts. If strings contain chains of digits. These are logically replaced by single symbol considered lexicographically greater than a non-numeric character. Thus strings ab123 and ab44 are considered same. The string abc precedes ab1. On the basis of this function Boolean-valued comparisons are defined:

function Textually_Equal (Left, Right : String) return Boolean;
function Textually_Less  (Left, Right : String) return Boolean;

Another operation:

function Compare_Lexicographically (Left, Right : String) return Precedence;

This function compares two strings lexicographically. Chains of digits are compared numerically, as decimal numbers. This the string ab44 precedes ab0123. On the basis of this function Boolean-valued comparisons are defined:

function Lexicographically_Equal (Left, Right : String) return Boolean;
function Lexicographically_Less  (Left, Right : String) return Boolean;


[Back][TOC][Next]

12. Packages

Package Provides
Strings_Edit The basic string I/O
     Fields Tools for writing new Put-procedures
Float_Edit Generic I/O of floating-point numbers
Floats I/O of standard Float (instantiation of Float_Edit)
Generic_Scale Generic scales for I/O of plot axes
Integer_Edit Generic I/O of integer numbers
Integers I/O of standard Integer (instantiation of Integer_Edit)
     Subscript I/O of standard Integer using UTF-8 subscript characters
Superscript I/O of standard Integer using UTF-8 superscript characters
Lexicographical_Order Lexicographical comparisons of strings
Quoted I/O of strings put in Ada-style quotes
Roman_Edit I/O of roman numbers
Streams Stream I/O to and from strings
UTF8 The base UTF-8 package. UTF-8 string length, skipping UTF-8 encoded characters
  Blocks Ranges of code points defined by the Unicode standard
Categorization Unicode categorization
Handling Conversions of UTF-8 encoded strings to and from standard Ada strings
Integer_Edit Generic I/O of integer numbers using UTF-8 characters different from standard ASCII digits
Maps Maps and Sets of Unicode code points
    Constants Set constants for some commonly used sets
Mapping Unicode case mapping
Subscript Dealing with UTF-8 subscript characters
     Integer_Edit Generic I/O of integer numbers using UTF-8 subscript characters
Superscript Dealing with UTF-8 superscript characters
  Integer_Edit Generic I/O of integer numbers using UTF-8 superscript characters
Wildcards Wildcard matching of UTF-8 encoded strings

[Back][TOC][Next]

13. Installation

The software does not require special installation. The archive's content can be put in a directory and used as-is. For users of GNAT compiler the software provides gpr project files, which can be used in the Gnat Programming Studio (GPS).

To ease use of the software with GPS, it can be integrated into the GPS using the GPS Library Installer (gps_installer). Start the gps_installer as root (or with the corresponding administrative rights to the GNAT installation directory) specifying the source directory as the argument. Follow the instructions.

Project files Provides Use in custom project
strings_edit Strings edit for Ada with "strings_edit.gpr";

For Fedora and Debian Linux and their derivates packages are provided: see Debian and Fedora packages for the corresponding architectures.

13.1. Fedora packages repository

The Fedora packages of this library are located in a yum software package manager repository. They can be searched, installed and updated automatically using yum. In order to do so, the file dmitry-kazakov.repo can be put into the directory /etc/yum.repos.d.

13.2. Debian packages repository

In order to use apt Debian repository for automatic install and update these packages add the following line to /etc/apt/sources.list:

deb http://dmitry-kazakov.de/distributions sid main

[Back][TOC][Next]

14. Changes log

Changes to the version 2.7:

Changes to the version 2.6:

Changes to the version 2.5:

Changes to the version 2.4:

Changes to the version 2.3:

Changes to the version 2.2:

Changes to the version 2.1:

Changes to the version 2.0:

Changes to the version 1.9:

Changes to the version 1.8:

Changes to the version 1.7:

Changes to the version 1.6:

Changes to the version 1.5:

Changes to the version 1.4:

Changes to the version 1.3:

Changes to the version 1.2:

Changes to the version 1.1:

Changes to the version 1.0:


[Back][TOC]

15. Table of Contents

1. Input from String
    1.1. Get procedures
    1.2. Value functions
2. Output into String
    2.1. Put procedures
    2.2. Image functions
3. String I/O
    3.1. Quoted strings
4. Roman I/O
5. Integer I/O
6. Floating-point I/O
7. UTF-8
    7.1. Handling UTF-8 strings
    7.2. Generic integer I/O of UTF-8 strings
    7.3. Subscript UTF-8 integer I/O
    7.4. Superscript UTF-8 integer I/O
    7.5. Wildcard-matching of UTF-8 strings
    7.6. Case mapping
    7.7. Unicode categorization
    7.8. Blocks
    7.9. Sets and maps
       7.9.1. Sets
       7.9.2. Maps
       7.9.3. Constants
8. Fields
9. Generic axis scales
10. String streams
11. Lexicographical comparisons
12. Packages
13. Installation
    13.1. Fedora packages repository
    13.2. Debian packages repository
14. Changes log
15. Table of contents