public class XmlParser
extends java.lang.Object
You need to define a class implementing the XmlHandler
interface: an object belonging to this class will receive the
callbacks for the events. (As an alternative to implementing
the full XmlHandler interface, you can simply extend the
HandlerBase
convenience class.)
Usage (assuming that MyHandler
is your implementation
of the XmlHandler
interface):
XmlHandler handler = new MyHandler(); XmlParser parser = new XmlParser(); parser.setHandler(handler); try { parser.parse("http://www.host.com/doc.xml", null); } catch (Exception e) { [do something interesting] }
Alternatively, you can use the standard SAX interfaces
with the SAXDriver
class as your entry point.
XmlHandler
,
HandlerBase
Modifier and Type | Field and Description |
---|---|
static int |
ATTRIBUTE_CDATA
Constant: the attribute value is a string value.
|
static int |
ATTRIBUTE_DEFAULT_FIXED
Constant: the attribute was declared #FIXED.
|
static int |
ATTRIBUTE_DEFAULT_IMPLIED
Constant: the attribute was declared #IMPLIED.
|
static int |
ATTRIBUTE_DEFAULT_REQUIRED
Constant: the attribute was declared #REQUIRED.
|
static int |
ATTRIBUTE_DEFAULT_SPECIFIED
Constant: the attribute has a literal default value specified.
|
static int |
ATTRIBUTE_DEFAULT_UNDECLARED
Constant: the attribute is not declared.
|
static int |
ATTRIBUTE_ENTITIES
Constant: the attribute value is a list of entity names.
|
static int |
ATTRIBUTE_ENTITY
Constant: the attribute value is the name of an entity.
|
static int |
ATTRIBUTE_ENUMERATED
Constant: the attribute value is a token from an enumeration.
|
static int |
ATTRIBUTE_ID
Constant: the attribute value is a unique identifier.
|
static int |
ATTRIBUTE_IDREF
Constant: the attribute value is a reference to a unique identifier.
|
static int |
ATTRIBUTE_IDREFS
Constant: the attribute value is a list of ID references.
|
static int |
ATTRIBUTE_NMTOKEN
Constant: the attribute value is a name token.
|
static int |
ATTRIBUTE_NMTOKENS
Constant: the attribute value is a list of name tokens.
|
static int |
ATTRIBUTE_NOTATION
Constant: the attribute is the name of a notation.
|
static int |
ATTRIBUTE_UNDECLARED
Constant: the attribute has not been declared for this element type.
|
static int |
CONTENT_ANY
Constant: the element has a content model of ANY.
|
static int |
CONTENT_ELEMENTS
Constant: the element has element content.
|
static int |
CONTENT_EMPTY
Constant: the element has declared content of EMPTY.
|
static int |
CONTENT_MIXED
Constant: the element has mixed content.
|
static int |
CONTENT_UNDECLARED
Constant: an element has not been declared.
|
static int |
ENTITY_INTERNAL
Constant: the entity is internal.
|
static int |
ENTITY_NDATA
Constant: the entity is external, non-XML data.
|
static int |
ENTITY_TEXT
Constant: the entity is external XML data.
|
static int |
ENTITY_UNDECLARED
Constant: the entity has not been declared.
|
(package private) XmlHandler |
handler |
Constructor and Description |
---|
XmlParser()
Construct a new parser with no associated handler.
|
Modifier and Type | Method and Description |
---|---|
(package private) void |
checkEncoding(java.lang.String encodingName,
boolean ignoreEncoding)
Check that the encoding specified makes sense.
|
(package private) void |
cleanupVariables()
Clean up after the parse to allow some garbage collection.
|
(package private) void |
copyIso8859_1ReadBuffer(int count)
Convert a buffer of ISO-8859-1-encoded bytes into UTF-16 characters.
|
(package private) void |
copyUcs2ReadBuffer(int count,
int shift1,
int shift2)
Convert a buffer of UCS-2-encoded bytes into UTF-16 characters.
|
(package private) void |
copyUcs4ReadBuffer(int count,
int shift1,
int shift2,
int shift3,
int shift4)
Convert a buffer of UCS-4-encoded bytes into UTF-16 characters.
|
(package private) void |
copyUtf8ReadBuffer(int count)
Convert a buffer of UTF-8-encoded bytes into UTF-16 characters.
|
(package private) void |
dataBufferAppend(char c)
Add a character to the data buffer.
|
(package private) void |
dataBufferAppend(char[] ch,
int start,
int length)
Append (part of) a character array to the data buffer.
|
(package private) void |
dataBufferAppend(java.lang.String s)
Add a string to the data buffer.
|
(package private) void |
dataBufferFlush()
Flush the contents of the data buffer to the handler, if
appropriate, and reset the buffer for new input.
|
(package private) void |
dataBufferNormalize()
Normalise whitespace in the data buffer.
|
(package private) java.lang.String |
dataBufferToString()
Convert the data buffer to a string.
|
java.util.Enumeration |
declaredAttributes(java.lang.String elname)
Get the declared attributes for an element type.
|
java.util.Enumeration |
declaredElements()
Get the declared elements for an XML document.
|
java.util.Enumeration |
declaredEntities()
Get declared entities.
|
java.util.Enumeration |
declaredNotations()
Get declared notations.
|
(package private) void |
detectEncoding()
Attempt to detect the encoding of an entity.
|
(package private) void |
encodingError(java.lang.String message,
int value,
int offset)
Report a character encoding error.
|
(package private) void |
error(java.lang.String message,
char textFound,
java.lang.String textExpected)
Report a serious error.
|
(package private) void |
error(java.lang.String message,
java.lang.String textFound,
java.lang.String textExpected)
Report an error.
|
(package private) java.lang.Object |
extendArray(java.lang.Object array,
int currentSize,
int requiredSize)
Ensure the capacity of an array, allocating a new one if
necessary.
|
(package private) void |
filterCR()
Filter carriage returns in the read buffer.
|
(package private) java.lang.Object[] |
getAttribute(java.lang.String elName,
java.lang.String name)
Retrieve the three-member array representing an
attribute declaration.
|
java.lang.String |
getAttributeDefaultValue(java.lang.String name,
java.lang.String aname)
Retrieve the default value of a declared attribute.
|
int |
getAttributeDefaultValueType(java.lang.String name,
java.lang.String aname)
Retrieve the default value type of a declared attribute.
|
java.lang.String |
getAttributeEnumeration(java.lang.String name,
java.lang.String aname)
Retrieve the allowed values for an enumerated attribute type.
|
java.lang.String |
getAttributeExpandedValue(java.lang.String name,
java.lang.String aname)
Retrieve the expanded value of a declared attribute.
|
int |
getAttributeType(java.lang.String name,
java.lang.String aname)
Retrieve the declared type of an attribute.
|
int |
getColumnNumber()
Return the current column number.
|
java.lang.String |
getCurrentElement()
Return the current element.
|
(package private) java.util.Hashtable |
getElementAttributes(java.lang.String name)
Look up the attribute hash table for an element.
|
java.lang.String |
getElementContentModel(java.lang.String name)
Look up the content model of an element.
|
int |
getElementContentType(java.lang.String name)
Look up the content type of an element.
|
java.lang.String |
getEntityNotationName(java.lang.String eName)
Get the notation name associated with an NDATA entity.
|
java.lang.String |
getEntityPublicId(java.lang.String ename)
Return an external entity's public identifier, if any.
|
java.lang.String |
getEntitySystemId(java.lang.String ename)
Return an external entity's system identifier.
|
int |
getEntityType(java.lang.String ename)
Find the type of an entity.
|
java.lang.String |
getEntityValue(java.lang.String ename)
Return the value of an internal entity.
|
int |
getLineNumber()
Return the current line number.
|
(package private) int |
getNextUtf8Byte(int pos,
int count)
Return the next byte value in a UTF-8 sequence.
|
java.lang.String |
getNotationPublicId(java.lang.String nname)
Look up the public identifier for a notation.
|
java.lang.String |
getNotationSystemId(java.lang.String nname)
Look up the system identifier for a notation.
|
(package private) void |
initializeVariables()
Re-initialize the variables for each parse.
|
java.lang.String |
intern(char[] ch,
int start,
int length)
Create an internalised string from a character array.
|
java.lang.String |
intern(java.lang.String s)
Return an internalised version of a string.
|
(package private) boolean |
isWhitespace(char c)
Test if a character is whitespace.
|
void |
parse(java.lang.String systemId,
java.lang.String publicId,
java.io.InputStream stream,
java.lang.String encoding)
Parse an XML document from a byte stream.
|
void |
parse(java.lang.String systemId,
java.lang.String publicId,
java.io.Reader reader)
Parse an XML document from a character stream.
|
void |
parse(java.lang.String systemId,
java.lang.String publicId,
java.lang.String encoding)
Parse an XML document from a URI.
|
(package private) void |
parseAttDef(java.lang.String elementName)
Parse a single attribute definition
|
(package private) void |
parseAttlistDecl()
Parse an attribute list declaration
|
(package private) void |
parseAttribute(java.lang.String name)
Parse an attribute assignment.
|
(package private) void |
parseCDSect()
Parse a CDATA marked section.
|
(package private) void |
parseCharRef()
Read a character reference
|
(package private) void |
parseComment()
Skip a comment.
|
(package private) void |
parseConditionalSect()
Parse a conditional section
|
(package private) void |
parseContent()
Parse the content of an element
|
(package private) void |
parseContentspec(java.lang.String name)
Content specification
|
(package private) void |
parseCp()
Parse a content particle
|
(package private) void |
parseDefault(java.lang.String elementName,
java.lang.String name,
int type,
java.lang.String enumeration)
Parse the default value for an attribute
|
(package private) void |
parseDoctypedecl()
Parse a document type declaration.
|
(package private) void |
parseDocument()
Parse an XML document.
|
(package private) void |
parseElement()
Parse an element, with its tags.
|
(package private) void |
parseElementdecl()
Parse an element type declaration
|
(package private) void |
parseElements()
Parse an element-content model
|
(package private) void |
parseEntityDecl()
Parse an entity declaration
|
(package private) void |
parseEntityRef(boolean externalAllowed)
Parse a reference
|
(package private) void |
parseEnumeration()
Parse an enumeration
|
(package private) void |
parseEq()
Parse an equals sign surrounded by optional whitespace
|
(package private) void |
parseETag()
Parse an end tag
|
(package private) void |
parseMarkupdecl()
Parse a markup declaration in the internal or external DTD subset.
|
(package private) void |
parseMisc()
Parse miscellaneous markup outside the document element and DOCTYPE
declaration.
|
(package private) void |
parseMixed()
Parse mixed content
|
(package private) void |
parseNotationDecl()
Parse a notation declaration
|
(package private) void |
parseNotationType()
Parse a notation type for an attribute
|
(package private) void |
parsePCData()
Parse PCDATA.
|
(package private) void |
parsePEReference(boolean isEntityValue)
Parse a parameter entity reference
|
(package private) void |
parsePI()
Parse a processing instruction and do a call-back.
|
(package private) void |
parseProlog()
Parse the prolog of an XML document.
|
(package private) void |
parseTextDecl(boolean ignoreEncoding)
Parse the Encoding PI.
|
(package private) void |
parseUntil(java.lang.String delim)
Read all data until we find the specified string.
|
(package private) void |
parseWhitespace()
Parse whitespace characters, and leave them in the data buffer.
|
(package private) void |
parseXMLDecl(boolean ignoreEncoding)
Parse the XML declaration.
|
(package private) void |
popInput()
Restore a previous input source.
|
(package private) void |
pushCharArray(java.lang.String ename,
char[] ch,
int start,
int length)
Push a new internal input source.
|
(package private) void |
pushInput(java.lang.String ename)
Save the current input source onto the stack.
|
(package private) void |
pushString(java.lang.String ename,
java.lang.String s)
This method pushes a string back onto input.
|
(package private) void |
pushURL(java.lang.String ename,
java.lang.String publicId,
java.lang.String systemId,
java.io.Reader reader,
java.io.InputStream stream,
java.lang.String encoding)
Push a new external input source.
|
(package private) void |
read8bitEncodingDeclaration()
Read just the encoding declaration (or XML declaration) at the
start of an external entity.
|
(package private) int |
readAttType()
Parse the attribute type
|
(package private) char |
readCh()
Read a single character from the readBuffer.
|
(package private) void |
readDataChunk()
Read a chunk of data from an external input source.
|
(package private) java.lang.String[] |
readExternalIds(boolean inNotation)
Try reading external identifiers.
|
(package private) java.lang.String |
readLiteral(int flags)
Read a literal
|
(package private) java.lang.String |
readNmtoken(boolean isName)
Read a name or name token
|
(package private) void |
require(char delim)
Require a character to appear, or throw an exception.
|
(package private) void |
require(java.lang.String delim)
Require a string to appear, or throw an exception.
|
(package private) void |
requireWhitespace()
Require whitespace characters
|
(package private) void |
setAttribute(java.lang.String elName,
java.lang.String name,
int type,
java.lang.String enumeration,
java.lang.String value,
int valueType)
Register an attribute declaration for later retrieval.
|
(package private) void |
setElement(java.lang.String name,
int contentType,
java.lang.String contentModel,
java.util.Hashtable attributes)
Register an element.
|
(package private) void |
setEntity(java.lang.String eName,
int eClass,
java.lang.String pubid,
java.lang.String sysid,
java.lang.String value,
java.lang.String nName)
Register an entity declaration for later retrieval.
|
(package private) void |
setExternalDataEntity(java.lang.String eName,
java.lang.String pubid,
java.lang.String sysid,
java.lang.String nName)
Register an external data entity.
|
(package private) void |
setExternalTextEntity(java.lang.String eName,
java.lang.String pubid,
java.lang.String sysid)
Register an external text entity.
|
void |
setHandler(XmlHandler handler)
Set the handler that will receive parsing events.
|
(package private) void |
setInternalEntity(java.lang.String eName,
java.lang.String value)
Register an entity declaration for later retrieval.
|
(package private) void |
setNotation(java.lang.String nname,
java.lang.String pubid,
java.lang.String sysid)
Register a notation declaration for later retrieval.
|
(package private) void |
skipUntil(java.lang.String delim)
Skip all data until we find the specified string.
|
(package private) void |
skipWhitespace()
Skip whitespace characters
|
(package private) boolean |
tryEncoding(byte[] sig,
byte b1,
byte b2)
Check for a two-byte signature.
|
(package private) boolean |
tryEncoding(byte[] sig,
byte b1,
byte b2,
byte b3,
byte b4)
Check for a four-byte signature.
|
(package private) void |
tryEncodingDecl(boolean ignoreEncoding)
Check for an encoding declaration.
|
(package private) boolean |
tryRead(char delim)
Return true if we can read the expected character.
|
(package private) boolean |
tryRead(java.lang.String delim)
Return true if we can read the expected string.
|
(package private) boolean |
tryWhitespace()
Return true if we can read some whitespace.
|
(package private) void |
unread(char c)
Push a single character back onto the current input stream.
|
(package private) void |
unread(char[] ch,
int length)
Push a char array back onto the current input stream.
|
public static final int CONTENT_UNDECLARED
public static final int CONTENT_ANY
public static final int CONTENT_EMPTY
public static final int CONTENT_MIXED
public static final int CONTENT_ELEMENTS
public static final int ENTITY_UNDECLARED
public static final int ENTITY_INTERNAL
public static final int ENTITY_NDATA
public static final int ENTITY_TEXT
public static final int ATTRIBUTE_UNDECLARED
public static final int ATTRIBUTE_CDATA
public static final int ATTRIBUTE_ID
public static final int ATTRIBUTE_IDREF
public static final int ATTRIBUTE_IDREFS
public static final int ATTRIBUTE_ENTITY
public static final int ATTRIBUTE_ENTITIES
public static final int ATTRIBUTE_NMTOKEN
public static final int ATTRIBUTE_NMTOKENS
public static final int ATTRIBUTE_ENUMERATED
public static final int ATTRIBUTE_NOTATION
public static final int ATTRIBUTE_DEFAULT_UNDECLARED
public static final int ATTRIBUTE_DEFAULT_SPECIFIED
public static final int ATTRIBUTE_DEFAULT_IMPLIED
public static final int ATTRIBUTE_DEFAULT_REQUIRED
public static final int ATTRIBUTE_DEFAULT_FIXED
XmlHandler handler
public void setHandler(XmlHandler handler)
handler
- The handler to receive callback events.parse(java.lang.String, java.lang.String, java.lang.String)
,
XmlHandler
public void parse(java.lang.String systemId, java.lang.String publicId, java.lang.String encoding) throws java.lang.Exception
You may parse a document more than once, but only one thread may call this method for an object at one time.
systemId
- The URI of the document.publicId
- The public identifier of the document, or null.encoding
- The suggested encoding, or null if unknown.java.lang.Exception
- Any exception thrown by your
own handlers, or any derivation of java.io.IOException
thrown by the parser itself.public void parse(java.lang.String systemId, java.lang.String publicId, java.io.InputStream stream, java.lang.String encoding) throws java.lang.Exception
The URI that you supply will become the base URI for resolving relative links, but Ælfred will actually read the document from the supplied input stream.
You may parse a document more than once, but only one thread may call this method for an object at one time.
systemId
- The base URI of the document, or null if not
known.publicId
- The public identifier of the document, or null
if not known.stream
- A byte input stream.encoding
- The suggested encoding, or null if unknown.java.lang.Exception
- Any exception thrown by your
own handlers, or any derivation of java.io.IOException
thrown by the parser itself.public void parse(java.lang.String systemId, java.lang.String publicId, java.io.Reader reader) throws java.lang.Exception
The URI that you supply will become the base URI for resolving relative links, but Ælfred will actually read the document from the supplied input stream.
You may parse a document more than once, but only one thread may call this method for an object at one time.
systemId
- The base URI of the document, or null if not
known.publicId
- The public identifier of the document, or null
if not known.reader
- A character stream.java.lang.Exception
- Any exception thrown by your
own handlers, or any derivation of java.io.IOException
thrown by the parser itself.void error(java.lang.String message, java.lang.String textFound, java.lang.String textExpected) throws java.lang.Exception
message
- The error message.textFound
- The text that caused the error (or null).java.lang.Exception
XmlHandler.error(java.lang.String, java.lang.String, int, int)
,
line
void error(java.lang.String message, char textFound, java.lang.String textExpected) throws java.lang.Exception
message
- The error message.textFound
- The text that caused the error (or null).java.lang.Exception
void parseDocument() throws java.lang.Exception
[1] document ::= prolog element Misc*
This is the top-level parsing function for a single XML document. As a minimum, a well-formed document must have a document element, and a valid document must have a prolog as well.
java.lang.Exception
void parseComment() throws java.lang.Exception
[18] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* "-->"
(The <!--
has already been read.)
java.lang.Exception
void parsePI() throws java.lang.Exception
[19] PI ::= '<?' Name (S (Char* - (Char* '?>' Char*)))? '?>'
(The <?
has already been read.)
An XML processing instruction must begin with a Name, which is the instruction's target.
java.lang.Exception
void parseCDSect() throws java.lang.Exception
[20] CDSect ::= CDStart CData CDEnd [21] CDStart ::= '<![CDATA[' [22] CData ::= (Char* - (Char* ']]>' Char*)) [23] CDEnd ::= ']]>'
(The '<![CDATA[' has already been read.)
Note that this just appends characters to the dataBuffer, without actually generating an event.
java.lang.Exception
void parseProlog() throws java.lang.Exception
[24] prolog ::= XMLDecl? Misc* (Doctypedecl Misc*)?
There are a couple of tricks here. First, it is necessary to declare the XML default attributes after the DTD (if present) has been read. Second, it is not possible to expand general references in attribute value literals until after the entire DTD (if present) has been parsed.
We do not look for the XML declaration here, because it is handled by pushURL().
java.lang.Exception
pushURL(java.lang.String, java.lang.String, java.lang.String, java.io.Reader, java.io.InputStream, java.lang.String)
void parseXMLDecl(boolean ignoreEncoding) throws java.lang.Exception
[25] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' [26] VersionInfo ::= S 'version' Eq ('"1.0"' | "'1.0'") [33] SDDecl ::= S 'standalone' Eq "'" ('yes' | 'no') "'" | S 'standalone' Eq '"' ("yes" | "no") '"' [78] EncodingDecl ::= S 'encoding' Eq QEncoding
([80] to [82] are also significant.)
(The <?xml
and whitespace have already been read.)
TODO: validate value of standalone.
java.lang.Exception
parseTextDecl(boolean)
,
checkEncoding(java.lang.String, boolean)
void parseTextDecl(boolean ignoreEncoding) throws java.lang.Exception
[78] EncodingDecl ::= S 'encoding' Eq QEncoding [79] EncodingPI ::= '<?xml' S 'encoding' Eq QEncoding S? '?>' [80] QEncoding ::= '"' Encoding '"' | "'" Encoding "'" [81] Encoding ::= LatinName [82] LatinName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
(The <?xml
' and whitespace have already been read.)
java.lang.Exception
parseXMLDecl(boolean)
,
checkEncoding(java.lang.String, boolean)
void checkEncoding(java.lang.String encodingName, boolean ignoreEncoding) throws java.lang.Exception
Compare what the author has specified in the XML declaration or encoding PI with what we have detected.
This is also important for distinguishing among the various 7- and 8-bit encodings, such as ISO-LATIN-1 (I cannot autodetect those).
encodingName
- The name of the encoding specified by the user.java.lang.Exception
parseXMLDecl(boolean)
,
parseTextDecl(boolean)
void parseMisc() throws java.lang.Exception
[27] Misc ::= Comment | PI | S
java.lang.Exception
void parseDoctypedecl() throws java.lang.Exception
[28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' %markupdecl* ']' S?)? '>'
(The <!DOCTYPE
has already been read.)
java.lang.Exception
void parseMarkupdecl() throws java.lang.Exception
[29] markupdecl ::= ( %elementdecl | %AttlistDecl | %EntityDecl | %NotationDecl | %PI | %S | %Comment | InternalPERef ) [30] InternalPERef ::= PEReference [31] extSubset ::= (%markupdecl | %conditionalSect)*
java.lang.Exception
void parseElement() throws java.lang.Exception
[33] STag ::= '<' Name (S Attribute)* S? '>' [WFC: unique Att spec] [38] element ::= EmptyElement | STag content ETag [39] EmptyElement ::= '<' Name (S Attribute)* S? '/>' [WFC: unique Att spec]
(The '<' has already been read.)
NOTE: this method actually chains onto parseContent(), if necessary, and parseContent() will take care of calling parseETag().
java.lang.Exception
void parseAttribute(java.lang.String name) throws java.lang.Exception
[34] Attribute ::= Name Eq AttValue
name
- The name of the attribute's element.java.lang.Exception
XmlHandler.attribute(java.lang.String, java.lang.String, boolean)
void parseEq() throws java.lang.Exception
java.lang.Exception
void parseETag() throws java.lang.Exception
java.lang.Exception
void parseContent() throws java.lang.Exception
java.lang.Exception
void parseElementdecl() throws java.lang.Exception
java.lang.Exception
void parseContentspec(java.lang.String name) throws java.lang.Exception
java.lang.Exception
void parseElements() throws java.lang.Exception
java.lang.Exception
void parseCp() throws java.lang.Exception
java.lang.Exception
void parseMixed() throws java.lang.Exception
java.lang.Exception
void parseAttlistDecl() throws java.lang.Exception
java.lang.Exception
void parseAttDef(java.lang.String elementName) throws java.lang.Exception
java.lang.Exception
int readAttType() throws java.lang.Exception
java.lang.Exception
void parseEnumeration() throws java.lang.Exception
java.lang.Exception
void parseNotationType() throws java.lang.Exception
java.lang.Exception
void parseDefault(java.lang.String elementName, java.lang.String name, int type, java.lang.String enumeration) throws java.lang.Exception
java.lang.Exception
void parseConditionalSect() throws java.lang.Exception
java.lang.Exception
void parseCharRef() throws java.lang.Exception
java.lang.Exception
void parseEntityRef(boolean externalAllowed) throws java.lang.Exception
externalAllowed
- External entities are allowed here.java.lang.Exception
void parsePEReference(boolean isEntityValue) throws java.lang.Exception
java.lang.Exception
void parseEntityDecl() throws java.lang.Exception
java.lang.Exception
void parseNotationDecl() throws java.lang.Exception
java.lang.Exception
void parsePCData() throws java.lang.Exception
[16] PCData ::= [^<&]*
The trick here is that the data stays in the dataBuffer without necessarily being converted to a string right away.
java.lang.Exception
void requireWhitespace() throws java.lang.Exception
java.lang.Exception
void parseWhitespace() throws java.lang.Exception
java.lang.Exception
void skipWhitespace() throws java.lang.Exception
java.lang.Exception
java.lang.String readNmtoken(boolean isName) throws java.lang.Exception
java.lang.Exception
java.lang.String readLiteral(int flags) throws java.lang.Exception
java.lang.Exception
java.lang.String[] readExternalIds(boolean inNotation) throws java.lang.Exception
The system identifier is not required for notations.
inNotation
- Are we in a notation?java.lang.Exception
final boolean isWhitespace(char c)
[1] S ::= (#x20 | #x9 | #xd | #xa)+
c
- The character to test.void dataBufferAppend(char c)
void dataBufferAppend(java.lang.String s)
void dataBufferAppend(char[] ch, int start, int length)
void dataBufferNormalize()
java.lang.String dataBufferToString()
intern(char[],int,int)
void dataBufferFlush() throws java.lang.Exception
java.lang.Exception
void require(java.lang.String delim) throws java.lang.Exception
java.lang.Exception
void require(char delim) throws java.lang.Exception
java.lang.Exception
public java.lang.String intern(java.lang.String s)
Ælfred uses this method to create an internalised version
of all names and attribute values, so that it can test equality
with ==
instead of String.equals()
.
If you want to be able to test for equality in the same way, you can use this method to internalise your own strings first:
String PARA = handler.intern("PARA");
Note that this will not return the same results as String.intern().
s
- The string to internalise.intern(char[],int,int)
,
String.intern()
public java.lang.String intern(char[] ch, int start, int length)
This is much more efficient than constructing a non-internalised string first, and then internalising it.
Note that this will not return the same results as String.intern().
ch
- an array of characters for building the string.start
- the starting position in the array.length
- the number of characters to place in the string.intern(String)
,
String.intern()
java.lang.Object extendArray(java.lang.Object array, int currentSize, int requiredSize)
public java.util.Enumeration declaredElements()
The results will be valid only after the DTD (if any) has been parsed.
getElementContentType(java.lang.String)
,
getElementContentModel(java.lang.String)
public int getElementContentType(java.lang.String name)
name
- The element type name.getElementContentModel(java.lang.String)
,
CONTENT_UNDECLARED
,
CONTENT_ANY
,
CONTENT_EMPTY
,
CONTENT_MIXED
,
CONTENT_ELEMENTS
public java.lang.String getElementContentModel(java.lang.String name)
The result will always be null unless the content type is CONTENT_ELEMENTS or CONTENT_MIXED.
name
- The element type name.getElementContentType(java.lang.String)
void setElement(java.lang.String name, int contentType, java.lang.String contentModel, java.util.Hashtable attributes) throws java.lang.Exception
java.lang.Exception
java.util.Hashtable getElementAttributes(java.lang.String name)
public java.util.Enumeration declaredAttributes(java.lang.String elname)
elname
- The name of the element type.getAttributeType(java.lang.String, java.lang.String)
,
getAttributeEnumeration(java.lang.String, java.lang.String)
,
getAttributeDefaultValueType(java.lang.String, java.lang.String)
,
getAttributeDefaultValue(java.lang.String, java.lang.String)
,
getAttributeExpandedValue(java.lang.String, java.lang.String)
public int getAttributeType(java.lang.String name, java.lang.String aname)
name
- The name of the associated element.aname
- The name of the attribute.ATTRIBUTE_UNDECLARED
,
ATTRIBUTE_CDATA
,
ATTRIBUTE_ID
,
ATTRIBUTE_IDREF
,
ATTRIBUTE_IDREFS
,
ATTRIBUTE_ENTITY
,
ATTRIBUTE_ENTITIES
,
ATTRIBUTE_NMTOKEN
,
ATTRIBUTE_NMTOKENS
,
ATTRIBUTE_ENUMERATED
,
ATTRIBUTE_NOTATION
public java.lang.String getAttributeEnumeration(java.lang.String name, java.lang.String aname)
name
- The name of the associated element.aname
- The name of the attribute.ATTRIBUTE_ENUMERATED
,
ATTRIBUTE_NOTATION
public java.lang.String getAttributeDefaultValue(java.lang.String name, java.lang.String aname)
name
- The name of the associated element.aname
- The name of the attribute.getAttributeExpandedValue(java.lang.String, java.lang.String)
public java.lang.String getAttributeExpandedValue(java.lang.String name, java.lang.String aname)
All general entities will be expanded.
name
- The name of the associated element.aname
- The name of the attribute.getAttributeDefaultValue(java.lang.String, java.lang.String)
public int getAttributeDefaultValueType(java.lang.String name, java.lang.String aname)
name
- The name of the element.aname
- The name of the attribute.ATTRIBUTE_DEFAULT_SPECIFIED
,
ATTRIBUTE_DEFAULT_IMPLIED
,
ATTRIBUTE_DEFAULT_REQUIRED
,
ATTRIBUTE_DEFAULT_FIXED
void setAttribute(java.lang.String elName, java.lang.String name, int type, java.lang.String enumeration, java.lang.String value, int valueType) throws java.lang.Exception
java.lang.Exception
java.lang.Object[] getAttribute(java.lang.String elName, java.lang.String name)
elName
- The name of the element.name
- The name of the attribute.public java.util.Enumeration declaredEntities()
getEntityType(java.lang.String)
,
getEntityPublicId(java.lang.String)
,
getEntitySystemId(java.lang.String)
,
getEntityValue(java.lang.String)
,
getEntityNotationName(java.lang.String)
public java.lang.String getCurrentElement()
public int getEntityType(java.lang.String ename)
ename
- The name of the entity.ENTITY_UNDECLARED
,
ENTITY_INTERNAL
,
ENTITY_NDATA
,
ENTITY_TEXT
public java.lang.String getEntityPublicId(java.lang.String ename)
ename
- The name of the external entity.getEntityType(java.lang.String)
public java.lang.String getEntitySystemId(java.lang.String ename)
ename
- The name of the external entity.getEntityType(java.lang.String)
public java.lang.String getEntityValue(java.lang.String ename)
ename
- The name of the internal entity.getEntityType(java.lang.String)
public java.lang.String getEntityNotationName(java.lang.String eName)
eName
- The NDATA entity name.getEntityType(java.lang.String)
void setInternalEntity(java.lang.String eName, java.lang.String value)
void setExternalDataEntity(java.lang.String eName, java.lang.String pubid, java.lang.String sysid, java.lang.String nName)
void setExternalTextEntity(java.lang.String eName, java.lang.String pubid, java.lang.String sysid)
void setEntity(java.lang.String eName, int eClass, java.lang.String pubid, java.lang.String sysid, java.lang.String value, java.lang.String nName)
public java.util.Enumeration declaredNotations()
getNotationPublicId(java.lang.String)
,
getNotationSystemId(java.lang.String)
public java.lang.String getNotationPublicId(java.lang.String nname)
nname
- The name of the notation.getNotationSystemId(java.lang.String)
public java.lang.String getNotationSystemId(java.lang.String nname)
nname
- The name of the notation.getNotationPublicId(java.lang.String)
void setNotation(java.lang.String nname, java.lang.String pubid, java.lang.String sysid) throws java.lang.Exception
java.lang.Exception
public int getLineNumber()
public int getColumnNumber()
char readCh() throws java.lang.Exception
The readDataChunk() method maintains the buffer.
If we hit the end of an entity, try to pop the stack and keep going.
(This approach doesn't really enforce XML's rules about entity boundaries, but this is not currently a validating parser).
This routine also attempts to keep track of the current position in external entities, but it's not entirely accurate.
java.lang.Exception
unread(char)
,
readDataChunk()
,
readBuffer
,
line
void unread(char c) throws java.lang.Exception
This method usually pushes the character back onto the readBuffer.
I don't think that this would ever be called with readBufferPos = 0, because the methods always reads a character before unreading it, but just in case, I've added a boundary condition.
c
- The character to push back.java.lang.Exception
readCh()
,
unread(char[], int)
,
readBuffer
void unread(char[] ch, int length) throws java.lang.Exception
NOTE: you must never push back characters that you haven't actually read: use pushString() instead.
java.lang.Exception
readCh()
,
unread(char)
,
readBuffer
,
pushString(java.lang.String, java.lang.String)
void pushURL(java.lang.String ename, java.lang.String publicId, java.lang.String systemId, java.io.Reader reader, java.io.InputStream stream, java.lang.String encoding) throws java.lang.Exception
The source will be either an external text entity, or the DTD external subset.
TO DO: Right now, this method always attempts to autodetect the encoding; in the future, it should allow the caller to request an encoding explicitly, and it should also look at the headers with an HTTP connection.
ename
- publicId
- systemId
- reader
- stream
- encoding
- java.lang.Exception
XmlHandler.resolveEntity(java.lang.String, java.lang.String)
,
pushString(java.lang.String, java.lang.String)
,
sourceType
,
pushInput(java.lang.String)
,
detectEncoding()
,
sourceType
,
readBuffer
void tryEncodingDecl(boolean ignoreEncoding) throws java.lang.Exception
java.lang.Exception
void detectEncoding() throws java.lang.Exception
The trick here (as suggested in the XML standard) is that any entity not in UTF-8, or in UCS-2 with a byte-order mark, must begin with an XML declaration or an encoding declaration; we simply have to look for "<?XML" in various encodings.
This method has no way to distinguish among 8-bit encodings. Instead, it assumes UTF-8, then (possibly) revises its assumption later in checkEncoding(). Any ASCII-derived 8-bit encoding should work, but most will be rejected later by checkEncoding().
I don't currently detect EBCDIC, since I'm concerned that it could also be a valid UTF-8 sequence; I'll have to do more checking later.
java.lang.Exception
tryEncoding(byte[], byte, byte, byte, byte)
,
tryEncoding(byte[], byte, byte)
,
checkEncoding(java.lang.String, boolean)
,
read8bitEncodingDeclaration()
boolean tryEncoding(byte[] sig, byte b1, byte b2, byte b3, byte b4)
Utility routine for detectEncoding().
Always looks for some part of "<?XML" in a specific encoding.
sig
- The first four bytes read.b1
- The first byte of the signatureb2
- The second byte of the signatureb3
- The third byte of the signatureb4
- The fourth byte of the signaturedetectEncoding()
boolean tryEncoding(byte[] sig, byte b1, byte b2)
Looks for a UCS-2 byte-order mark.
Utility routine for detectEncoding().
sig
- The first four bytes read.b1
- The first byte of the signatureb2
- The second byte of the signaturedetectEncoding()
void pushString(java.lang.String ename, java.lang.String s) throws java.lang.Exception
It is useful either as the expansion of an internal entity, or for backtracking during the parse.
Call pushCharArray() to do the actual work.
s
- The string to push back onto input.java.lang.Exception
pushCharArray(java.lang.String, char[], int, int)
void pushCharArray(java.lang.String ename, char[] ch, int start, int length) throws java.lang.Exception
This method is useful for expanding an internal entity, or for unreading a string of characters. It creates a new readBuffer containing the characters in the array, instead of characters converted from an input byte stream.
I've added a couple of optimisations: don't push zero- length strings, and just push back a single character for 1-character strings; this should save some time and memory.
ch
- The char array to push.java.lang.Exception
pushString(java.lang.String, java.lang.String)
,
pushURL(java.lang.String, java.lang.String, java.lang.String, java.io.Reader, java.io.InputStream, java.lang.String)
,
readBuffer
,
sourceType
,
pushInput(java.lang.String)
void pushInput(java.lang.String ename) throws java.lang.Exception
This method saves all of the global variables associated with the current input source, so that they can be restored when a new input source has finished. It also tests for entity recursion.
The method saves the following global variables onto a stack using a fixed-length array:
ename
- The name of the entity (if any) causing the new input.java.lang.Exception
popInput()
,
sourceType
,
externalEntity
,
readBuffer
,
readBufferPos
,
readBufferLength
,
line
,
encoding
void popInput() throws java.lang.Exception
This method restores all of the global variables associated with the current input source.
java.io.EOFException
- If there are no more entries on the input stack.java.lang.Exception
pushInput(java.lang.String)
,
sourceType
,
externalEntity
,
readBuffer
,
readBufferPos
,
readBufferLength
,
line
,
encoding
boolean tryRead(char delim) throws java.lang.Exception
Note that the character will be removed from the input stream on success, but will be put back on failure. Do not attempt to read the character again if the method succeeds.
delim
- The character that should appear next. For a
insensitive match, you must supply this in upper-case.java.lang.Exception
tryRead(String)
boolean tryRead(java.lang.String delim) throws java.lang.Exception
This is simply a convenience method.
Note that the string will be removed from the input stream on success, but will be put back on failure. Do not attempt to read the string again if the method succeeds.
This method will push back a character rather than an array whenever possible (probably the majority of cases).
NOTE: This method currently has a hard-coded limit of 100 characters for the delimiter.
delim
- The string that should appear next.java.lang.Exception
tryRead(char)
boolean tryWhitespace() throws java.lang.Exception
This is simply a convenience method.
This method will push back a character rather than an array whenever possible (probably the majority of cases).
java.lang.Exception
void parseUntil(java.lang.String delim) throws java.lang.Exception
This is especially useful for scanning marked sections.
This is a a little inefficient right now, since it calls tryRead() for every character.
delim
- The string delimiterjava.lang.Exception
tryRead(String)
,
readCh()
void skipUntil(java.lang.String delim) throws java.lang.Exception
This is especially useful for scanning comments.
This is a a little inefficient right now, since it calls tryRead() for every character.
delim
- The string delimiterjava.lang.Exception
readCh()
void read8bitEncodingDeclaration() throws java.lang.Exception
java.lang.Exception
void readDataChunk() throws java.lang.Exception
This is simply a front-end that fills the rawReadBuffer with bytes, then calls the appropriate encoding handler.
java.lang.Exception
encoding
,
rawReadBuffer
,
readBuffer
,
filterCR()
,
copyUtf8ReadBuffer(int)
,
copyIso8859_1ReadBuffer(int)
void filterCR()
CRLF becomes LF; CR becomes LF.
readDataChunk()
,
readBuffer
,
readBufferOverflow
void copyUtf8ReadBuffer(int count) throws java.lang.Exception
When readDataChunk() calls this method, the raw bytes are in rawReadBuffer, and the final characters will appear in readBuffer.
The tricky part of this is dealing with UTF-8 multi-byte sequences, but it doesn't seem to slow things down too much.
count
- The number of bytes to convert.java.lang.Exception
readDataChunk()
,
rawReadBuffer
,
readBuffer
,
getNextUtf8Byte(int, int)
int getNextUtf8Byte(int pos, int count) throws java.lang.Exception
pos
- The current position in the rawReadBuffer.count
- The number of bytes in the rawReadBufferjava.io.EOFException
- If the sequence is incomplete.java.lang.Exception
void copyIso8859_1ReadBuffer(int count)
When readDataChunk() calls this method, the raw bytes are in rawReadBuffer, and the final characters will appear in readBuffer.
This is a direct conversion, with no tricks.
count
- The number of bytes to convert.readDataChunk()
,
rawReadBuffer
,
readBuffer
void copyUcs2ReadBuffer(int count, int shift1, int shift2) throws java.lang.Exception
When readDataChunk() calls this method, the raw bytes are in rawReadBuffer, and the final characters will appear in readBuffer.
count
- The number of bytes to convert.shift1
- The number of bits to shift byte 1.shift2
- The number of bits to shift byte 2java.lang.Exception
readDataChunk()
,
rawReadBuffer
,
readBuffer
void copyUcs4ReadBuffer(int count, int shift1, int shift2, int shift3, int shift4) throws java.lang.Exception
When readDataChunk() calls this method, the raw bytes are in rawReadBuffer, and the final characters will appear in readBuffer.
Java has 16-bit chars, but this routine will attempt to use surrogates to encoding values between 0x00010000 and 0x000fffff.
count
- The number of bytes to convert.shift1
- The number of bits to shift byte 1.shift2
- The number of bits to shift byte 2shift3
- The number of bits to shift byte 2shift4
- The number of bits to shift byte 2java.lang.Exception
readDataChunk()
,
rawReadBuffer
,
readBuffer
void encodingError(java.lang.String message, int value, int offset) throws java.lang.Exception
java.lang.Exception
void initializeVariables()
void cleanupVariables()