Tokenizer
class is designed to accept input for a string or file and break it up into tokens. It is similar to the standard istream class in this regard, but it has some additional facilities. It permits character classes to be defined to specify that certain characters are white space and others are "special" and should be returned as single-character tokens; it permits quoted strings to override this, and it has a file inclusion facility. In short, it is a simple, reconfigurable lexical analyzer. Tokenizer
has a public const data member named defWhite
that contains the default white space characters: space, newline, and tab. It is possible to change the definition of white space for a particular constructor.
Tokenizer
provides three different constructors:
Tokenizer();The default constructor creates a
Tokenizer
that reads from the standard input stream, cin.
Its special characters are simply \key ( and \key ).
Tokenizer(istream& input,const char* spec,This constructor creates a
const char* w = defWhite);
Tokenizer
that reads from the stream named by input.
The other arguments specify the special characters and the white space characters.
Tokenizer(const char* buffer,const char* spec,This constructor creates a
const char* w = defWhite);
Tokenizer
that reads from the null-terminated string in buffer.
Tokenizer
's destructor closes any include files associated with the constructor and deletes associated internal storage. The following operations change the definition of white space and of special characters, respectively:
const char* setWhite(const char* w);In each case, the old value is returned. By default, the line comment character for
const char* setSpecial(const char* s);
Tokenizer
is #
. It can be changed by
char setCommentChar(char n);Use an argument of 0 to disable the feature. The old comment character is returned.
Tokenizer
:
Tokenizer& operator >> (char* pBuffer);Here
pBuffer
points to a character buffer that reads the token. There is a design flaw: there isn't a way to give a maximum buffer length, so overflow is a risk. By analogy with streams, the following operation is provided:
operator void*();It returns null if
EOF
has already been reached and non-null otherwise. This permits loops like
Tokenizer tin;Returns true if the end of file or end of input has been reached on the
while (tin) { ... do stuff ... }
int eof() const;
Tokenizer
. It is possible that there is nothing left in the input but write space, so in many situations skipwhite
should be called before making this test.
void skipwhite();Skip white space in the input.
void flush();If in an include file, the file is closed. If at the top level, discard the rest of the current line.
Tokenizer
can use include files, and can nest them to any depth. It maintains a stack of include files, and as EOF
is reached in each file, it is closed and popped off of the stack. The method
int fromFile(const char* name);opens a new file and the
Tokenizer
will then read from that. When that file ends, Tokenizer
will continue reading from the current point in the current file.
const char* current_file() const;These methods report on the file name and line number where
int current_line() const;
Tokenizer
is currently reading from. This information is maintained for include files. At the top level, current_file
returns a null pointer, but current_line
returns one more than the number of line feeds seen so far.
int readingFromFile() const;Returns true (1) if the
Tokenizer
is reading from an include file, false (0) if not.