The Standard ML Basis Library


The StringCvt structure

The StringCvt structure provides types and functions for handling the conversion between strings and values of various basic types.


Synopsis

signature STRING_CVT
structure StringCvt : STRING_CVT

Interface

datatype radix = BIN | OCT | DEC | HEX
datatype realfmt
  = SCI of int option
  | FIX of int option
  | GEN of int option
  | EXACT
type ('a, 'b) reader = 'b -> ('a * 'b) option
val padLeft : char -> int -> string -> string
val padRight : char -> int -> string -> string
val splitl : (char -> bool) -> (char, 'a) reader ->'a -> (string * 'a)
val takel : (char -> bool) -> (char, 'a) reader ->'a -> string
val dropl : (char -> bool) -> (char, 'a) reader ->'a -> 'a
val skipWS : (char, 'a) reader -> 'a -> 'a
type cs
val scanString : ((char, cs) reader -> ('a, cs) reader) -> string -> 'a option

Description

datatype radix
The values of type radix are used to specify the radix of a representation of an integer, corresponding to the bases 2, 8, 10 and 16, respectively.

datatype realfmt
Values of type realfmt are used to specify the format of a real or floating-point number. The first two correspond to scientific and fixed-point representations, respectively. The optional integer value specifies the number of decimal digits to appear after the decimal point, with 6 being the default. In particular, if 0 is specified, there should be no fractional part.

The third constructor GEN allows a formatting function to use either the scientific or fixed-point notation, typically guided by the magnitude of the number. The optional integer value specifies the maximum number of significant digits, with 12 being the default.

The fourth constructor EXACT specifies that the string should represent the real using an exact decimal representation. The string contains enough information in order to reconstruct a semantically equivalent real value using REAL.fromDecimal o valOf o IEEEReal.fromString. Refer to the description of IEEEReal.toString for more precise information concerning this format.

type ('a, 'b) reader
type representing a reader producing values of type 'a from a stream of type 'b. A return value of SOME(a,b) corresponds to a value a scanned from the stream, plus the remainder b of the stream. A return value of NONE indicates that no value of the correct type could be scanned from the stream.

The reader type is designed for use with a stream or functional view of I/O. Scanning functions using the reader type, such as skipWS, splitl and Int.scan, will often use lookahead characters to determine when to stop scanning. If the character source ('b in an ('a,'b) reader) is imperative, the lookahead characters will be lost to any subsequent scanning of the source. One mechanism for combining imperative I/O with the standard scanning functions is provided by the TextIO.scanStream function.

padLeft c i s
padRight c i s
return s padded, on the left and right, respectively, with i - size s copies of the character c. If size s >= i, they just return the string s. In other words, these functions right- and left-justify s in a field i characters wide, never trimming off any part of s. Note that if i <= 0, s is returned. These functions raise Size if the size of the resulting string would be greater than String.maxSize.

splitl p f src
returns (pref, src') where pref is the longest prefix (left substring) of src, as produced from src by the character reader f, all of whose characters satisfy p, and src' is the remainder of src. Thus, the first character retrievable from src' is the leftmost character not satisfying p.

splitl can be used with scanning functions such as scanString by composing it with SOME; e.g., scanString (fn rdr => SOME o ((splitl p) rdr)).

takel p f src
dropl p f src
These routines scan the source src for the first character not satisfying the predicate p. The function dropl drops the maximal prefix satisfying the predicate, returning the rest of the source, while takel returns the maximal prefix satisfying the predicate. These can be defined in terms of splitl:
          takel p f s = #1(splitl p f s)
          dropl p f s = #2(splitl p f s)
          


skipWS f s
strips whitespace characters from a stream s using the reader f. It returns the remaining stream. A whitespace character is one that satisfies the predicate Char.isSpace. Equivalent to dropl Char.isSpace.

type cs
is an abstract character stream used by scanString. A value of this type represents the state of a character stream. The concrete type is left unspecified to allow implementations a choice of representations. Typically, cs will be an integer index into a string.

scanString f s
The function scanString provides a general framework for converting a string into some value. The user supplies a scanning function f and a string s. scanString converts the string into a character source (type cs) and applies the scanning function. A scanning function converts a reader of characters into a reader of values of the desired type. Typical scanning functions are Bool.scan and Date.scan.


Discussion

The basis library emphasizes a functional view for scanning values from text. This provides a natural and elegant way to write simple scanners and parsers, especially as these typically involve some form of reading ahead and backtracking. The model involves two types of components: ways to produce character readers and functions to convert character readers into value readers. For the latter, most types T have a corresponding scanning function of type

(char, 'a) reader -> (T, 'a) reader
Character readers are provided for the common sources of characters, either explicitly, such as the SUBSTRING.getc and STREAM_IO.input1 functions, or implicitly, such as the TEXT_IO.scanStream. As an example, suppose we expect to read a decimal integer followed by a date from TextIO.stdIn. This could be handled by the following code:
Example:
  let
    val scanInt = Int.scan StringCvt.DEC TextIO.StreamIO.input1
    val scanDate = Date.scan TextIO.StreamIO.input1
    in
      case scanInt (TextIO.getInstream TextIO.stdIn) of
        NONE => (* error *)
      | SOME (intVal, ins') => case scanDate ins' of
          NONE => (* error *)
        | SOME (dateVal, ins'') =>  (* ... *)
    end
In this example, we used the underlying stream I/O component of TextIO.stdIn, which is cleaner and more efficient. If, at some later point, we wish to return to the imperative model and do input directly using TextIO.stdIn, we need to reset it with the current stream I/O value using TextIO.setInstream. Alternatively, we could rewrite the code using imperative I/O:
Example:
  case TextIO.scanStream (Int.scan StringCvt.DEC) TextIO.stdIn of
    NONE => (* error *)
  | SOME intVal => case TextIO.scanStream Date.scan TextIO.stdIn of
      NONE => (* error *)
    | SOME dateVal =>  (* ... *)

The scanString function was designed specifically to be combined with the scan function of some type T, producing a function val fromString : string -> T option for the type. For this reason, scanString only returns a scanned value, and not some indication of where scanning stopped in the string. For the user who wants to receive a scanned value and the unscanned portion of a string, the recommended technique is to convert the string into a substring and combine scanning functions with Substring.getc, e.g., Bool.scan Substring.getc.

When the input source is a list of characters, scanning values can be accomplished by applying the appropriate scan function to the function List.getItem. Thus, Bool.scan List.getItem has the type (bool, char list) reader, which will scan a boolean value and return that value and the remainder of the list.

See Also

String, Char

[ INDEX | TOP | Parent | Root ]

Last Modified October 4, 1997
Comments to John Reppy.
Copyright © 1997 Bell Labs, Lucent Technologies