|
|
For full documentation, see flexdoc(1). This manual entry is intended for use as a quick reference.
NOTE: in previous releases of flex -c specified table-compression options. This functionality is now given by the -C flag. To ease the the impact of this change, when flex encounters -c, it currently issues a warning message and assumes that -C was desired instead. In the future this "promotion" of -c to -C will go away in the name of full POSIX compliance (unless the POSIX meaning is removed first).
--accepting rule at line 53 ("the matched text")
The line number refers to the location of the rule in the file
defining the scanner (i.e., the file that was fed to flex). Messages
are also generated when the scanner backs up, accepts the
default rule, reaches the end of its input buffer (or encounters
a NUL; the two look the same as far as the scanner's concerned),
or reaches an end-of-file.
This option is equivalent to -CFr (see below).
Note, -I cannot be used in conjunction with full or fast tables, i.e., the -f, -F, -Cf, or -CF flags. For other table compression options, -I is the default.
-Ca trade off larger tables in the generated scanner for faster performance because the elements of the tables are better aligned for memory access and computation. This option can double the size of the tables used by your scanner.
-Ce directs flex to construct equivalence classes, i.e., sets of characters which have identical lexical properties. Equivalence classes usually give dramatic reductions in the final table/object file sizes (typically a factor of 2-5) and are pretty cheap performance-wise (one array look-up per character scanned).
-Cf specifies that the full scanner tables should be generated - flex should not compress the tables by taking advantages of similar transition functions for different states.
-CF specifies that the alternate fast scanner representation (described in flexdoc(1)) should be used. This option cannot be used with -+.
-Cm directs flex to construct meta-equivalence classes, which are sets of equivalence classes (or characters, if equivalence classes are not being used) that are commonly used together. Meta-equivalence classes are often a big win when using compressed tables, but they have a moderate performance impact (one or two "if" tests and one array look-up per character scanned).
-Cr causes the generated scanner to bypass using stdio for input. In general this option results in a minor performance gain only worthwhile if used in conjunction with -Cf or -CF. It can cause surprising behavior if you use stdio yourself to read from yyin prior to calling the scanner.
A lone -C specifies that the scanner tables should be compressed but neither equivalence classes nor meta-equivalence classes should be used.
The options -Cf or -CF and -Cm do not make sense together - there is no opportunity for meta-equivalence classes if the table is not being compressed. Otherwise the options may be freely mixed.
The default setting is -Cem, which specifies that flex should generate equivalence classes and meta-equivalence classes. This setting provides the highest degree of table compression. You can trade off faster-executing scanners at the cost of larger tables with the following generally being true:
slowest & smallest
-Cem
-Cm
-Ce
-C
-C{f,F}e
-C{f,F}
-C{f,F}a
fastest & largest
-C options are cumulative.
x match the character 'x'
. any character except newline
[xyz] a "character class"; in this case, the pattern
matches either an 'x', a 'y', or a 'z'
[abj-oZ] a "character class" with a range in it; matches
an 'a', a 'b', any letter from 'j' through 'o',
or a 'Z'
[^A-Z] a "negated character class", i.e., any character
but those in the class. In this case, any
character EXCEPT an uppercase letter.
[^A-Z\n] any character EXCEPT an uppercase letter or
a newline
r* zero or more r's, where r is any regular expression
r+ one or more r's
r? zero or one r's (that is, "an optional r")
r{2,5} anywhere from two to five r's
r{2,} two or more r's
r{4} exactly 4 r's
{name} the expansion of the "name" definition
(see above)
"[xyz]\"foo"
the literal string: [xyz]"foo
\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
then the ANSI-C interpretation of \x.
Otherwise, a literal 'X' (used to escape
operators such as '*')
\123 the character with octal value 123
\x2a the character with hexadecimal value 2a
(r) match an r; parentheses are used to override
precedence (see below)
rs the regular expression r followed by the
regular expression s; called "concatenation"
r|s either an r or an s
r/s an r but only if it is followed by an s. The
s is not part of the matched text. This type
of pattern is called as "trailing context".
^r an r, but only at the beginning of a line
r$ an r, but only at the end of a line. Equivalent
to "r/\n".
<s>r an r, but only in start condition s (see
below for discussion of start conditions)
<s1,s2,s3>r
same, but in any of start conditions s1,
s2, or s3
<*>r an r in any start condition, even an exclusive one.
<<EOF>> an end-of-file
<s1,s2><<EOF>>
an end-of-file when in start condition s1 or s2
The regular expressions listed above are grouped according to
precedence, from highest precedence at the top to lowest at the bottom.
Those grouped together have equal precedence.
Some notes on patterns:
foo/bar$
foo|(bar$)
foo|^bar
<sc1>foo<sc2>bar
Note also that unlike the other special actions, REJECT is a branch; code immediately following it in the action will not be executed.
By default, yyterminate() is also called when an end-of-file is encountered. It is a macro and may be redefined.
If the special directive %array appears in the first section of the scanner description, then yytext is instead declared char yytext[YYLMAX], where YYLMAX is a macro definition that you can redefine in the first section if you don't like the default value (generally 8KB). Using %array results in somewhat slower scanners, but the value of yytext becomes immune to calls to input() and unput(), which potentially destroy its value when yytext is a character pointer. The opposite of %array is %pointer, which is the default.
You cannot use %array when generating C++ scanner classes (the -+ flag).
%{
#undef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
{ \
int c = getchar(); \
result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
}
%}
The default yywrap() always returns 1.
flexdoc(1), lex(1), yacc(1), sed(1), awk(1).
M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator
reject_used_but_not_detected undefined or
yymore_used_but_not_detected undefined - These errors can occur at compile time. They indicate that the scanner uses REJECT or yymore() but that flex failed to notice the fact, meaning that flex scanned the first two sections looking for occurrences of these actions and failed to find any, but somehow you snuck some in (via a #include file, for example). Make an explicit reference to the action in your flex input file. (Note that previously flex supported a %used/%unused mechanism for dealing with this problem; this feature is still supported but now deprecated, and will go away soon unless the author hears from people who can argue compellingly that they need it.)
flex scanner jammed - a scanner compiled with -s has encountered an input string which wasn't matched by any of its rules.
warning, rule cannot be matched indicates that the given rule cannot be matched because it follows other rules that will always match the same text as it. See flexdoc(1) for an example.
warning, -s option given but default rule can be matched means that it is possible (perhaps only in a particular start condition) that the default rule (match any single character) is the only one that will match a particular input. Since
scanner input buffer overflowed - a scanner rule matched more text than the available dynamic memory.
token too large, exceeds YYLMAX - your scanner uses %array and one of its rules matched a string longer than the YYLMAX constant (8K bytes by default). You can increase the value by #define'ing YYLMAX in the definitions section of your flex input.
scanner requires -8 flag to use the character 'x' - Your scanner specification includes recognizing the 8-bit character 'x' and you did not specify the -8 flag, and your scanner defaulted to 7-bit because you used the -Cf or -CF table compression options.
flex scanner push-back overflow - you used unput() to push back so much text that the scanner's buffer could not hold both the pushed-back text and the current token in yytext. Ideally the scanner should dynamically resize the buffer in this case, but at present it does not.
input buffer overflow, can't enlarge buffer because scanner uses REJECT - the scanner was working on matching an extremely large token and needed to expand the input buffer. This doesn't work with scanners that use REJECT.
fatal flex scanner internal error--end of buffer missed - This can occur in an scanner which is reentered after a long-jump has jumped out (or over) the scanner's activation frame. Before reentering the scanner, use:
yyrestart( yyin );
or use C++ scanner classes (the
-+
option), which are fully reentrant.
See flexdoc(1) for additional credits and the address to send comments to.
Some trailing context patterns cannot be properly matched and generate warning messages ("dangerous trailing context"). These are patterns where the ending of the first part of the rule matches the beginning of the second part, such as "zx*/xy*", where the 'x*' matches the 'x' at the beginning of the trailing context. (Note that the POSIX draft states that the text matched by such patterns is undefined.)
For some trailing context rules, parts which are actually fixed-length are not recognized as such, leading to the abovementioned performance loss. In particular, parts using '|' or {n} (such as "foo{3}") are always considered variable-length.
Combining trailing context with the special '|' action can result in fixed trailing context being turned into the more expensive variable trailing context. For example, in the following:
%%
abc |
xyz/def
Use of unput() or input() invalidates yytext and yyleng, unless the %array directive or the -l option has been used.
Use of unput() to push back more text than was matched can result in the pushed-back text matching a beginning-of-line ('^') rule even though it didn't come at the beginning of the line (though this is rare!).
Pattern-matching of NUL's is substantially slower than matching other characters.
Dynamic resizing of the input buffer is slow, as it entails rescanning all the text matched so far by the current (generally huge) token.
flex does not generate correct #line directives for code internal to the scanner; thus, bugs in flex.skl yield bogus line numbers.
Due to both buffering of input and read-ahead, you cannot intermix calls to <stdio.h> routines, such as, for example, getchar(), with flex rules and expect it to work. Call input() instead.
The total table entries listed by the -v flag excludes the number of table entries needed to determine what rule has been matched. The number of entries is equal to the number of DFA states if the scanner does not use REJECT, and somewhat greater than the number of states if it does.
REJECT cannot be used with the -f or -F options.
The flex internal algorithms need documentation.
|
|
Created by unroff & hp-tools. © somebody (See intro for details). All Rights Reserved. Last modified 11/5/97