Package google :: Package protobuf :: Module text_format :: Class Tokenizer

Class Tokenizer

object --+
         |
        Tokenizer

Protocol buffer text representation tokenizer.

This class handles the lower level string parsing by splitting it into meaningful tokens.

It was directly ported from the Java protocol buffer API.

Instance Methods

[hide private]

__init__(self, lines, skip_comments=True)
x.__init__(...) initializes x; see help(type(x)) for signature

source code

LookingAt(self, token)

source code

AtEnd(self)
Checks the end of the text was reached.

source code

_PopLine(self)

source code

_SkipWhitespace(self)

source code

TryConsume(self, token)
Tries to consume a given piece of text.

source code

Consume(self, token)
Consumes a piece of text.

source code

ConsumeComment(self)

source code

ConsumeCommentOrTrailingComment(self)
Consumes a comment, returns a 2-tuple (trailing bool, comment str).

source code

TryConsumeIdentifier(self)

source code

ConsumeIdentifier(self)
Consumes protocol message field identifier.

source code

TryConsumeIdentifierOrNumber(self)

source code

ConsumeIdentifierOrNumber(self)
Consumes protocol message field identifier.

source code

TryConsumeInteger(self)

source code

ConsumeInteger(self, is_long=False)
Consumes an integer number.

source code

TryConsumeFloat(self)

source code

ConsumeFloat(self)
Consumes an floating point number.

source code

ConsumeBool(self)
Consumes a boolean value.

source code

TryConsumeByteString(self)

source code

ConsumeString(self)
Consumes a string value.

source code

ConsumeByteString(self)
Consumes a byte array value.

source code

_ConsumeSingleByteString(self)
Consume one token of a string literal.

source code

ConsumeEnum(self, field)

source code

ParseErrorPreviousToken(self, message)
Creates and *returns* a ParseError for the previously read token.

source code

ParseError(self, message)
Creates and *returns* a ParseError for the current token.

source code

_StringParseError(self, e)

source code

NextToken(self)
Reads the next meaningful token.

source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables

[hide private]

_WHITESPACE = re.compile(r'\s+')

_COMMENT = re.compile(r'(?m)(\s*#.*$)')

_WHITESPACE_OR_COMMENT = re.compile(r'(?m)(\s|(#.*$))+')

_TOKEN = re.compile(r'[a-zA-Z_][0-9a-zA-Z_\+-]*|([0-9\+-]|(\.[...

_IDENTIFIER = re.compile(r'[^\d\W]\w*')

_IDENTIFIER_OR_NUMBER = re.compile(r'\w+')

mark = '\''

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

init(self, lines, skip_comments=True)
(Constructor)

source code

x.__init__(...) initializes x; see help(type(x)) for signature

Overrides: object.__init__: (inherited documentation)

AtEnd(self)

source code

Checks the end of the text was reached.

Returns:
  True iff the end was reached.

TryConsume(self, token)

source code

Tries to consume a given piece of text.

Args:
  token: Text to consume.

Returns:
  True iff the text was consumed.

Consume(self, token)

source code

Consumes a piece of text.

Args:
  token: Text to consume.

Raises:
  ParseError: If the text couldn't be consumed.

ConsumeIdentifier(self)

source code

Consumes protocol message field identifier.

Returns:
  Identifier string.

Raises:
  ParseError: If an identifier couldn't be consumed.

ConsumeIdentifierOrNumber(self)

source code

Consumes protocol message field identifier.

Returns:
  Identifier string.

Raises:
  ParseError: If an identifier couldn't be consumed.

ConsumeInteger(self, is_long=False)

source code

Consumes an integer number.

Args:
  is_long: True if the value should be returned as a long integer.
Returns:
  The integer parsed.

Raises:
  ParseError: If an integer couldn't be consumed.

ConsumeFloat(self)

source code

Consumes an floating point number.

Returns:
  The number parsed.

Raises:
  ParseError: If a floating point number couldn't be consumed.

ConsumeBool(self)

source code

Consumes a boolean value.

Returns:
  The bool parsed.

Raises:
  ParseError: If a boolean value couldn't be consumed.

ConsumeString(self)

source code

Consumes a string value.

Returns:
  The string parsed.

Raises:
  ParseError: If a string value couldn't be consumed.

ConsumeByteString(self)

source code

Consumes a byte array value.

Returns:
  The array parsed (as a string).

Raises:
  ParseError: If a byte array value couldn't be consumed.

_ConsumeSingleByteString(self)

source code

Consume one token of a string literal.

String literals (whether bytes or text) can come in multiple adjacent
tokens which are automatically concatenated, like in C or Python.  This
method only consumes one token.

Returns:
  The token parsed.
Raises:
  ParseError: When the wrong format data is found.

ParseErrorPreviousToken(self, message)

source code

Creates and *returns* a ParseError for the previously read token.

Args:
  message: A message to set for the exception.

Returns:
  A ParseError instance.

Class Variable Details

[hide private]

_TOKEN

Value:

re.compile(r'[a-zA-Z_][0-9a-zA-Z_\+-]*|([0-9\+-]|(\.[0-9]))[0-9a-zA-Z_
\.\+-]*|"[^"\n\\]*((\\.)+[^"\n\\]*)*("|\\?$)|\'[^\'\n\\]*((\\.)+[^\'\n
\\]*)*(\'|\\?$)')

Class Tokenizer

__init__(self, lines, skip_comments=True) (Constructor)

AtEnd(self)

TryConsume(self, token)

Consume(self, token)

ConsumeIdentifier(self)

ConsumeIdentifierOrNumber(self)

ConsumeInteger(self, is_long=False)

ConsumeFloat(self)

ConsumeBool(self)

ConsumeString(self)

ConsumeByteString(self)

_ConsumeSingleByteString(self)

ParseErrorPreviousToken(self, message)

_TOKEN

init(self, lines, skip_comments=True)
(Constructor)