py2neo.cypher.lexer – Cypher Lexer

This module contains a Cypher language lexer based on the Pygments lexer framework. This can be used to parse statements and expressions for the Cypher variant available in Neo4j 3.4.

To parse a Cypher statement, create a CypherLexer or select by the name py2neo.cypher, then invoke the get_tokens() method:

>>> from pygments.lexers import get_lexer_by_name
>>> lexer = get_lexer_by_name("py2neo.cypher")
>>> list(lexer.get_tokens("MATCH (a:Person)-[:KNOWS]->(b) RETURN a"))
[(Token.Keyword, 'MATCH'),
 (Token.Text.Whitespace, ' '),
 (Token.Punctuation, '('),
 (Token.Name.Variable, 'a'),
 (Token.Punctuation, ':'),
 (Token.Name.Label, 'Person'),
 (Token.Punctuation, ')-['),
 (Token.Punctuation, ':'),
 (Token.Name.Label, 'KNOWS'),
 (Token.Punctuation, ']->('),
 (Token.Name.Variable, 'b'),
 (Token.Punctuation, ')'),
 (Token.Text.Whitespace, ' '),
 (Token.Keyword, 'RETURN'),
 (Token.Text.Whitespace, ' '),
 (Token.Name.Variable, 'a'),
 (Token.Text.Whitespace, '\n')]

To split multiple semicolon-separated statements within a single string, use instead the get_statements() method:

>>> list(lexer.get_statements("CREATE (:Person {name:'Alice'}); MATCH (a:Person {name:'Alice'}) RETURN id(a)"))
["CREATE (:Person {name:'Alice'})",
 "MATCH (a:Person {name:'Alice'}) RETURN id(a)"]
class py2neo.cypher.lexer.CypherLexer(*args, **kwds)[source]

Pygments lexer for the Cypher Query Language as available in Neo4j 4.2.

aliases = ['cypher', 'py2neo.cypher']

A list of short, unique identifiers that can be used to look up the lexer from a list, e.g., using get_lexer_by_name().

filenames = ['*.cypher', '*.cyp']

A list of fnmatch patterns that match filenames which contain content for this lexer. The patterns in this list should be unique among all lexers.

flags = 42

Flags for compiling the regular expressions. Defaults to MULTILINE.

get_statements(text)[source]

Split the text into statements delimited by semicolons and yield each statement in turn. Yielded statements are stripped of both leading and trailing whitespace. Empty statements are skipped.

name = 'Cypher'

Full name of the lexer, in human-readable form

tokens = {'aliases': [('(AS)(\\s+)(`(?:``|[^`])+`)', <function bygroups.<locals>.callback>), ('(AS)(\\s+)([A-Za-z_][0-9A-Za-z_]*)', <function bygroups.<locals>.callback>)], 'comments': [('//', ('Comment', 'Single'), 'single-comments'), ('/\\*', ('Comment', 'Multiline'), 'multiline-comments')], 'constants': [('true\\b', ('Name', 'Constant')), ('null\\b', ('Name', 'Constant')), ('false\\b', ('Name', 'Constant'))], 'escape-commands': [('^\\s*\\/(?!\\/).*$', ('Comment', 'Single')), ('^\\s*:.*$', ('Comment', 'Single'))], 'expressions': ['procedures', 'functions', 'constants', 'aliases', 'variables', 'parameters', 'numbers'], 'functions': [('([A-Za-z_][0-9A-Za-z_\\.]*)(\\s*)(\\()', <function bygroups.<locals>.callback>, 'in-()')], 'in-()': ['strings', 'comments', 'keywords', ('[,|]', ('Punctuation',)), 'labels', 'operators', 'expressions', 'whitespace', ('\\(', ('Punctuation',), '#push'), ('\\)\\s*<?-+>?\\s*\\(', ('Punctuation',)), ('\\)\\s*<?-+\\s*\\[', ('Punctuation',), ('#pop', 'in-[]')), ('\\)', ('Punctuation',), '#pop'), ('\\[', ('Punctuation',), 'in-[]'), ('\\{', ('Punctuation',), 'in-{}')], 'in-[]': ['strings', 'comments', ('WHERE\\b', ('Keyword',)), ('[,|]', ('Punctuation',)), 'labels', 'operators', 'expressions', 'whitespace', ('\\(', ('Punctuation',), 'in-()'), ('\\[', ('Punctuation',), '#push'), ('\\]\\s*-+>?\\s*\\(', ('Punctuation',), ('#pop', 'in-()')), ('\\]', ('Punctuation',), '#pop'), ('\\{', ('Punctuation',), 'in-{}')], 'in-{}': ['strings', 'comments', ('[,:]', ('Punctuation',)), 'operators', 'expressions', 'whitespace', ('\\(', ('Punctuation',), 'in-()'), ('\\[', ('Punctuation',), 'in-[]'), ('\\{', ('Punctuation',), '#push'), ('\\}', ('Punctuation',), '#pop')], 'keywords': [('_PRAGMA\\b', ('Keyword',)), ('YIELD\\b', ('Keyword',)), ('WITH\\s+HEADERS\\b', ('Keyword',)), ('WITH\\s+DISTINCT\\b', ('Keyword',)), ('WITH\\b', ('Keyword',)), ('WHERE\\b', ('Keyword',)), ('USING\\s+SCAN\\b', ('Keyword',)), ('USING\\s+PERIODIC\\s+COMMIT\\b', ('Keyword',)), ('USING\\s+JOIN\\b', ('Keyword',)), ('USING\\s+INDEX\\b', ('Keyword',)), ('USE\\b', ('Keyword',)), ('USER\\b', ('Keyword',)), ('USERS\\b', ('Keyword',)), ('UNWIND\\b', ('Keyword',)), ('UNION\\s+ALL\\b', ('Keyword',)), ('UNION\\b', ('Keyword',)), ('TRAVERSE\\b', ('Keyword',)), ('TRANSACTION\\b', ('Keyword',)), ('TO\\b', ('Keyword',)), ('TERMINATE\\b', ('Keyword',)), ('TARGET\\b', ('Keyword',)), ('STOP\\b', ('Keyword',)), ('START\\b', ('Keyword',)), ('SOURCE\\b', ('Keyword',)), ('SNAPSHOT\\b', ('Keyword',)), ('SKIP\\b', ('Keyword',)), ('SHOW\\b', ('Keyword',)), ('SET\\s+USER\\s+STATUS\\b', ('Keyword',)), ('SET\\s+STATUS\\s+SUSPENDED\\b', ('Keyword',)), ('SET\\s+STATUS\\s+ACTIVE\\b', ('Keyword',)), ('SET\\s+PASSWORD\\b', ('Keyword',)), ('SET\\s+PASSWORDS\\b', ('Keyword',)), ('SET\\b', ('Keyword',)), ('ROLE\\b', ('Keyword',)), ('ROLES\\b', ('Keyword',)), ('REVOKE\\b', ('Keyword',)), ('RETURN\\s+DISTINCT\\b', ('Keyword',)), ('RETURN\\b', ('Keyword',)), ('REMOVE\\b', ('Keyword',)), ('RELOCATE\\b', ('Keyword',)), ('RELATIONSHIP\\b', ('Keyword',)), ('RELATIONSHIPS\\b', ('Keyword',)), ('PROFILE\\b', ('Keyword',)), ('PRIVILEGE\\b', ('Keyword',)), ('PRIVILEGES\\b', ('Keyword',)), ('POPULATED\\s+ROLES\\b', ('Keyword',)), ('PERSIST\\b', ('Keyword',)), ('ORDER\\s+BY\\b', ('Keyword',)), ('OPTIONAL\\s+MATCH\\b', ('Keyword',)), ('ON\\s+MATCH\\s+SET\\b', ('Keyword',)), ('ON\\s+CREATE\\s+SET\\b', ('Keyword',)), ('ON\\b', ('Keyword',)), ('NODE\\b', ('Keyword',)), ('NODES\\b', ('Keyword',)), ('NEW\\s+TYPES\\b', ('Keyword',)), ('NEW\\s+PROPERTY\\s+NAMES\\b', ('Keyword',)), ('NEW\\s+LABELS\\b', ('Keyword',)), ('NAME\\b', ('Keyword',)), ('MERGE\\b', ('Keyword',)), ('MATCH\\b', ('Keyword',)), ('MANAGEMENT\\b', ('Keyword',)), ('LOAD\\s+CSV\\b', ('Keyword',)), ('LOAD\\b', ('Keyword',)), ('LIMIT\\b', ('Keyword',)), ('IS\\s+UNIQUE\\b', ('Keyword',)), ('IS\\s+NODE\\s+KEY\\b', ('Keyword',)), ('INTO\\b', ('Keyword',)), ('INDEX\\b', ('Keyword',)), ('IF\\s+NOT\\s+EXISTS\\b', ('Keyword',)), ('IF\\s+EXISTS\\b', ('Keyword',)), ('GRAPH\\s+OF\\b', ('Keyword',)), ('GRAPH\\s+AT\\b', ('Keyword',)), ('GRAPH\\b', ('Keyword',)), ('GRANT\\b', ('Keyword',)), ('FROM\\b', ('Keyword',)), ('FOREACH\\b', ('Keyword',)), ('FIELDTERMINATOR\\b', ('Keyword',)), ('EXPLAIN\\b', ('Keyword',)), ('ELEMENTS\\b', ('Keyword',)), ('DROP\\b', ('Keyword',)), ('DO\\b', ('Keyword',)), ('DETACH\\s+DELETE\\b', ('Keyword',)), ('DESC\\b', ('Keyword',)), ('DESCENDING\\b', ('Keyword',)), ('DENY\\b', ('Keyword',)), ('DELETE\\b', ('Keyword',)), ('DEFAULT\\s+DATABASE\\b', ('Keyword',)), ('DBMS\\b', ('Keyword',)), ('DATABASE\\b', ('Keyword',)), ('DATABASES\\b', ('Keyword',)), ('CYPHER\\b', ('Keyword',)), ('CURRENT\\s+USER\\b', ('Keyword',)), ('CREATE\\s+UNIQUE\\b', ('Keyword',)), ('CREATE\\s+OR\\s+REPLACE\\b', ('Keyword',)), ('CREATE\\b', ('Keyword',)), ('CONSTRAINT\\b', ('Keyword',)), ('CHANGE\\s+REQUIRED\\b', ('Keyword',)), ('CHANGE\\s+NOT\\s+REQUIRED\\b', ('Keyword',)), ('CALL\\b', ('Keyword',)), ('AS\\s+COPY\\s+OF\\b', ('Keyword',)), ('AS\\b', ('Keyword',)), ('ASSIGN\\b', ('Keyword',)), ('ASSERT\\s+EXISTS\\b', ('Keyword',)), ('ASSERT\\b', ('Keyword',)), ('ASC\\b', ('Keyword',)), ('ASCENDING\\b', ('Keyword',)), ('ALTER\\b', ('Keyword',)), ('ALL\\b', ('Keyword',)), ('ACCESS\\b', ('Keyword',)), ('>>\\b', ('Keyword',))], 'labels': [('(:)(\\s*)(`(?:``|[^`])+`)', <function bygroups.<locals>.callback>), ('(:)(\\s*)([A-Za-z_][0-9A-Za-z_]*)', <function bygroups.<locals>.callback>)], 'multiline-comments': [('/\\*', ('Comment', 'Multiline'), 'multiline-comments'), ('\\*/', ('Comment', 'Multiline'), '#pop'), ('[^/*]+', ('Comment', 'Multiline')), ('[/*]', ('Comment', 'Multiline'))], 'numbers': [('[0-9]*\\.[0-9]*(e[+-]?[0-9]+)?', ('Literal', 'Number', 'Float')), ('[0-9]+e[+-]?[0-9]+', ('Literal', 'Number', 'Float')), ('[0-9]+', ('Literal', 'Number', 'Integer'))], 'operators': [('XOR\\b', ('Operator',)), ('WHEN\\b', ('Operator',)), ('THEN\\b', ('Operator',)), ('STARTS\\s+WITH\\b', ('Operator',)), ('OR\\b', ('Operator',)), ('NOT\\b', ('Operator',)), ('IS\\s+NULL\\b', ('Operator',)), ('IS\\s+NOT\\s+NULL\\b', ('Operator',)), ('IN\\b', ('Operator',)), ('END\\b', ('Operator',)), ('ENDS\\s+WITH\\b', ('Operator',)), ('ELSE\\b', ('Operator',)), ('DISTINCT\\b', ('Operator',)), ('CONTAINS\\b', ('Operator',)), ('CASE\\b', ('Operator',)), ('AND\\b', ('Operator',)), ('\\^', ('Operator',)), ('\\>\\=', ('Operator',)), ('\\>', ('Operator',)), ('\\=\\~', ('Operator',)), ('\\=', ('Operator',)), ('\\<\\>', ('Operator',)), ('\\<\\=', ('Operator',)), ('\\<', ('Operator',)), ('\\/', ('Operator',)), ('\\.', ('Operator',)), ('\\-', ('Operator',)), ('\\+\\=', ('Operator',)), ('\\+', ('Operator',)), ('\\*', ('Operator',)), ('\\%', ('Operator',)), ('\\!\\=', ('Operator',))], 'parameters': [('(\\$)(`(?:``|[^`])+`)', <function bygroups.<locals>.callback>), ('(\\$)([A-Za-z_][0-9A-Za-z_]*)', <function bygroups.<locals>.callback>)], 'procedures': [('(CALL)(\\s+)([A-Za-z_][0-9A-Za-z_\\.]*)', <function bygroups.<locals>.callback>)], 'pseudo-keywords': [('ROLLBACK\\b', ('Keyword',)), ('COMMIT\\b', ('Keyword',)), ('BEGIN\\b', ('Keyword',))], 'root': ['strings', 'comments', 'keywords', 'pseudo-keywords', 'escape-commands', ('[,;]', ('Punctuation',)), 'labels', 'operators', 'expressions', 'whitespace', ('\\(', ('Punctuation',), 'in-()'), ('\\[', ('Punctuation',), 'in-[]'), ('\\{', ('Punctuation',), 'in-{}')], 'single-comments': [('.*$', ('Comment', 'Single'), '#pop')], 'strings': [('\'(?:\\\\[bfnrt\\"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}|[^\\\\\'])*\'', ('Literal', 'String')), ('"(?:\\\\[bfnrt\\\'"\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}|[^\\\\"])*"', ('Literal', 'String'))], 'variables': [('`(?:``|[^`])+`', ('Name', 'Variable')), ('[A-Za-z_][0-9A-Za-z_]*', ('Name', 'Variable'))], 'whitespace': [('\\s+', ('Text', 'Whitespace'))]}

At all time there is a stack of states. Initially, the stack contains a single state ‘root’. The top of the stack is called “the current state”.

Dict of {'state': [(regex, tokentype, new_state), ...], ...}

new_state can be omitted to signify no state transition. If new_state is a string, it is pushed on the stack. This ensure the new current state is new_state. If new_state is a tuple of strings, all of those strings are pushed on the stack and the current state will be the last element of the list. new_state can also be combined('state1', 'state2', ...) to signify a new, anonymous state combined from the rules of two or more existing ones. Furthermore, it can be ‘#pop’ to signify going back one step in the state stack, or ‘#push’ to push the current state on the stack again. Note that if you push while in a combined state, the combined state itself is pushed, and not only the state in which the rule is defined.

The tuple can also be replaced with include('state'), in which case the rules from the state named by the string are included in the current one.