Title: | Simple Tools for Lexing/Parsing Text Data |
---|---|
Description: | Simple tools for lexing/parsing text data. |
Authors: | mikefc |
Maintainer: | mikefc <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.7 |
Built: | 2024-11-15 04:36:30 UTC |
Source: | https://github.com/coolbutuseless/flexo |
This is very similar to the R6 Class TokenStream
, but it has no
dependencies
create_stream(named_values)
create_stream(named_values)
named_values |
named vector containing the tokens. Usually the output
from |
Break a string into labelled tokens based upon a set of patterns
lex(text, regexes, verbose = FALSE, ...)
lex(text, regexes, verbose = FALSE, ...)
text |
a single character string |
regexes |
a named vector of regex strings. Each string represents
a regex to match a token, and the name of the string is the
label for the token. Each regex can contain an explicit
captured group using the standard |
verbose |
print more information about the matching process. default: FALSE |
... |
further arguments passed to |
a named character vector with the names representing the token type with the value being the element extracted by the corresponding regular expression.
lex("hello there 123.45", regexes=c(number=re$number, word="(\\w+)", whitespace="(\\s+)"))
lex("hello there 123.45", regexes=c(number=re$number, word="(\\w+)", whitespace="(\\s+)"))
Regexes to match common elements
re
re
An object of class list
of length 3.
An R6 class for manipulating/interrogating a stream of tokens.
An R6 class for manipulating/interrogating a stream of tokens.
named_values
the original tokens
position
current stream position Initialise a stream
new()
TokenStream$new(named_values)
named_values
named vector of values Reset stream to the given absolute position.
reset()
TokenStream$reset(position = 1L)
position
absolute position in stream. Default: 1 i.e. the start Throw an error if a read is not within range
assert_within_range()
TokenStream$assert_within_range(start, n)
start, n
start position and number of values to read Check if a read is not within range
check_within_range()
TokenStream$check_within_range(start, n)
start, n
start position and number of values to read
logical TRUE if values are within range of data Check the next names match the name sequence specified
check_name_seq()
TokenStream$check_name_seq(name_seq)
name_seq
Expected sequence of names Assert the next names match the name sequence specified
assert_name_seq()
TokenStream$assert_name_seq(name_seq)
name_seq
Expected sequence of names Check the next name is one of the valid names specified
check_name()
TokenStream$check_name(valid_names)
valid_names
Valid names Assert the next name is one of the valid names specified
assert_name()
TokenStream$assert_name(valid_names)
valid_names
Valid names Check the next values match the value sequence specified
check_value_seq()
TokenStream$check_value_seq(value_seq)
value_seq
Expected sequence of values Assert the next values match the value sequence specified
assert_value_seq()
TokenStream$assert_value_seq(value_seq)
value_seq
Expected sequence of values Check the next value is one of the valid values specified
check_value()
TokenStream$check_value(valid_values)
valid_values
Valid values Assert the next value is one of the valid values specified
assert_value()
TokenStream$assert_value(valid_values)
valid_values
Valid values Advance the stream
advance()
TokenStream$advance(n)
n
number of tokens by which to advance the stream. May be negative. New position must be within range of the data Read n named values from the given position
Returns values but does not advance stream position
read()
TokenStream$read(n, offset = 0)
n
number of values to read
offset
offset from given position
named values at this position Read n names from the given position
Returns values but does not advance stream position
read_names()
TokenStream$read_names(n, offset = 0)
n
number of values to read
offset
offset from given position
names at this position Read n values from the given position
Returns values but does not advance stream position
read_values()
TokenStream$read_values(n, offset = 0)
n
number of values to read
offset
offset from given position
values at this position Consume n tokens from the given position i.e. read and advance the stream
Returns values and advances stream position.
consume()
TokenStream$consume(n)
n
number of values to read
values starting at this position
end_of_stream()
has end of stream been reached? Read tokens while some expression matches
Returns values but does not advance stream position
TokenStream$end_of_stream()
read_while()
TokenStream$read_while(name = NULL, value = NULL, combine = "or")
name, value
the boundary of the consumption. if both name and
value are specified, then combine
indicates how to logically
define the combination
combine
logical operator value values: and, or Consume tokens while some expression matches
Returns values and advances stream position.
consume_while()
TokenStream$consume_while(name = NULL, value = NULL, combine = "or")
name, value
the boundary of the consumption. if both name and
value are specified, then combine
indicates how to logically
define the combination
combine
logical operator value values: and, or Read until some expression matches
Returns values but does not advance stream position
read_until()
TokenStream$read_until( name = NULL, value = NULL, combine = "or", inclusive = TRUE )
name, value
the boundary of the consumption. if both name and
value are specified, then combine
indicates how to logically
define the combination
combine
logical operator value values: and, or
inclusive
should the end-point be included in the returned results? Default: TRUE. If FALSE, then the end-point is not returned, and the stream position is set to *before* this end-point Consume until some expression matches
Returns values and advances stream position.
consume_until()
TokenStream$consume_until( name = NULL, value = NULL, combine = "or", inclusive = TRUE )
name, value
the boundary of the consumption. if both name and
value are specified, then combine
indicates how to logically
define the combination
combine
logical operator value values: and, or
inclusive
should the end-point be included in the returned results? Default: TRUE. If FALSE, then the end-point is not returned, and the stream position is set to *before* this end-point Print current state
print()
TokenStream$print(n = 5)
n
number of elements to print
clone()
The objects of this class are cloneable with this method.
TokenStream$clone(deep = FALSE)
deep
Whether to make a deep clone.