py2neo.cypher.lexer
– Cypher Lexer¶
This module contains a Cypher language lexer based on the Pygments lexer framework. This can be used to parse statements and expressions for the Cypher variant available in Neo4j 3.4.
To parse a Cypher statement, create a CypherLexer
or select by the name py2neo.cypher,
then invoke the get_tokens()
method:
>>> from pygments.lexers import get_lexer_by_name
>>> lexer = get_lexer_by_name("py2neo.cypher")
>>> list(lexer.get_tokens("MATCH (a:Person)-[:KNOWS]->(b) RETURN a"))
[(Token.Keyword, 'MATCH'),
(Token.Text.Whitespace, ' '),
(Token.Punctuation, '('),
(Token.Name.Variable, 'a'),
(Token.Punctuation, ':'),
(Token.Name.Label, 'Person'),
(Token.Punctuation, ')-['),
(Token.Punctuation, ':'),
(Token.Name.Label, 'KNOWS'),
(Token.Punctuation, ']->('),
(Token.Name.Variable, 'b'),
(Token.Punctuation, ')'),
(Token.Text.Whitespace, ' '),
(Token.Keyword, 'RETURN'),
(Token.Text.Whitespace, ' '),
(Token.Name.Variable, 'a'),
(Token.Text.Whitespace, '\n')]
To split multiple semicolon-separated statements within a single string, use instead the get_statements()
method:
>>> list(lexer.get_statements("CREATE (:Person {name:'Alice'}); MATCH (a:Person {name:'Alice'}) RETURN id(a)"))
["CREATE (:Person {name:'Alice'})",
"MATCH (a:Person {name:'Alice'}) RETURN id(a)"]
- class py2neo.cypher.lexer.CypherLexer(*args, **kwds)[source]¶
Pygments lexer for the Cypher Query Language as available in Neo4j 4.2.
- aliases = ['cypher', 'py2neo.cypher']¶
A list of short, unique identifiers that can be used to look up the lexer from a list, e.g., using get_lexer_by_name().
- filenames = ['*.cypher', '*.cyp']¶
A list of fnmatch patterns that match filenames which contain content for this lexer. The patterns in this list should be unique among all lexers.
- flags = 42¶
Flags for compiling the regular expressions. Defaults to MULTILINE.
- get_statements(text)[source]¶
Split the text into statements delimited by semicolons and yield each statement in turn. Yielded statements are stripped of both leading and trailing whitespace. Empty statements are skipped.
- name = 'Cypher'¶
Full name of the lexer, in human-readable form
- tokens = {'aliases': [('(AS)(\\s+)(`(?:``|[^`])+`)', <function bygroups.<locals>.callback>), ('(AS)(\\s+)([A-Za-z_][0-9A-Za-z_]*)', <function bygroups.<locals>.callback>)], 'comments': [('//', ('Comment', 'Single'), 'single-comments'), ('/\\*', ('Comment', 'Multiline'), 'multiline-comments')], 'constants': [('true\\b', ('Name', 'Constant')), ('null\\b', ('Name', 'Constant')), ('false\\b', ('Name', 'Constant'))], 'escape-commands': [('^\\s*\\/(?!\\/).*$', ('Comment', 'Single')), ('^\\s*:.*$', ('Comment', 'Single'))], 'expressions': ['procedures', 'functions', 'constants', 'aliases', 'variables', 'parameters', 'numbers'], 'functions': [('([A-Za-z_][0-9A-Za-z_\\.]*)(\\s*)(\\()', <function bygroups.<locals>.callback>, 'in-()')], 'in-()': ['strings', 'comments', 'keywords', ('[,|]', ('Punctuation',)), 'labels', 'operators', 'expressions', 'whitespace', ('\\(', ('Punctuation',), '#push'), ('\\)\\s*<?-+>?\\s*\\(', ('Punctuation',)), ('\\)\\s*<?-+\\s*\\[', ('Punctuation',), ('#pop', 'in-[]')), ('\\)', ('Punctuation',), '#pop'), ('\\[', ('Punctuation',), 'in-[]'), ('\\{', ('Punctuation',), 'in-{}')], 'in-[]': ['strings', 'comments', ('WHERE\\b', ('Keyword',)), ('[,|]', ('Punctuation',)), 'labels', 'operators', 'expressions', 'whitespace', ('\\(', ('Punctuation',), 'in-()'), ('\\[', ('Punctuation',), '#push'), ('\\]\\s*-+>?\\s*\\(', ('Punctuation',), ('#pop', 'in-()')), ('\\]', ('Punctuation',), '#pop'), ('\\{', ('Punctuation',), 'in-{}')], 'in-{}': ['strings', 'comments', ('[,:]', ('Punctuation',)), 'operators', 'expressions', 'whitespace', ('\\(', ('Punctuation',), 'in-()'), ('\\[', ('Punctuation',), 'in-[]'), ('\\{', ('Punctuation',), '#push'), ('\\}', ('Punctuation',), '#pop')], 'keywords': [('_PRAGMA\\b', ('Keyword',)), ('YIELD\\b', ('Keyword',)), ('WITH\\s+HEADERS\\b', ('Keyword',)), ('WITH\\s+DISTINCT\\b', ('Keyword',)), ('WITH\\b', ('Keyword',)), ('WHERE\\b', ('Keyword',)), ('USING\\s+SCAN\\b', ('Keyword',)), ('USING\\s+PERIODIC\\s+COMMIT\\b', ('Keyword',)), ('USING\\s+JOIN\\b', ('Keyword',)), ('USING\\s+INDEX\\b', ('Keyword',)), ('USE\\b', ('Keyword',)), ('USER\\b', ('Keyword',)), ('USERS\\b', ('Keyword',)), ('UNWIND\\b', ('Keyword',)), ('UNION\\s+ALL\\b', ('Keyword',)), ('UNION\\b', ('Keyword',)), ('TRAVERSE\\b', ('Keyword',)), ('TRANSACTION\\b', ('Keyword',)), ('TO\\b', ('Keyword',)), ('TERMINATE\\b', ('Keyword',)), ('TARGET\\b', ('Keyword',)), ('STOP\\b', ('Keyword',)), ('START\\b', ('Keyword',)), ('SOURCE\\b', ('Keyword',)), ('SNAPSHOT\\b', ('Keyword',)), ('SKIP\\b', ('Keyword',)), ('SHOW\\b', ('Keyword',)), ('SET\\s+USER\\s+STATUS\\b', ('Keyword',)), ('SET\\s+STATUS\\s+SUSPENDED\\b', ('Keyword',)), ('SET\\s+STATUS\\s+ACTIVE\\b', ('Keyword',)), ('SET\\s+PASSWORD\\b', ('Keyword',)), ('SET\\s+PASSWORDS\\b', ('Keyword',)), ('SET\\b', ('Keyword',)), ('ROLE\\b', ('Keyword',)), ('ROLES\\b', ('Keyword',)), ('REVOKE\\b', ('Keyword',)), ('RETURN\\s+DISTINCT\\b', ('Keyword',)), ('RETURN\\b', ('Keyword',)), ('REMOVE\\b', ('Keyword',)), ('RELOCATE\\b', ('Keyword',)), ('RELATIONSHIP\\b', ('Keyword',)), ('RELATIONSHIPS\\b', ('Keyword',)), ('PROFILE\\b', ('Keyword',)), ('PRIVILEGE\\b', ('Keyword',)), ('PRIVILEGES\\b', ('Keyword',)), ('POPULATED\\s+ROLES\\b', ('Keyword',)), ('PERSIST\\b', ('Keyword',)), ('ORDER\\s+BY\\b', ('Keyword',)), ('OPTIONAL\\s+MATCH\\b', ('Keyword',)), ('ON\\s+MATCH\\s+SET\\b', ('Keyword',)), ('ON\\s+CREATE\\s+SET\\b', ('Keyword',)), ('ON\\b', ('Keyword',)), ('NODE\\b', ('Keyword',)), ('NODES\\b', ('Keyword',)), ('NEW\\s+TYPES\\b', ('Keyword',)), ('NEW\\s+PROPERTY\\s+NAMES\\b', ('Keyword',)), ('NEW\\s+LABELS\\b', ('Keyword',)), ('NAME\\b', ('Keyword',)), ('MERGE\\b', ('Keyword',)), ('MATCH\\b', ('Keyword',)), ('MANAGEMENT\\b', ('Keyword',)), ('LOAD\\s+CSV\\b', ('Keyword',)), ('LOAD\\b', ('Keyword',)), ('LIMIT\\b', ('Keyword',)), ('IS\\s+UNIQUE\\b', ('Keyword',)), ('IS\\s+NODE\\s+KEY\\b', ('Keyword',)), ('INTO\\b', ('Keyword',)), ('INDEX\\b', ('Keyword',)), ('IF\\s+NOT\\s+EXISTS\\b', ('Keyword',)), ('IF\\s+EXISTS\\b', ('Keyword',)), ('GRAPH\\s+OF\\b', ('Keyword',)), ('GRAPH\\s+AT\\b', ('Keyword',)), ('GRAPH\\b', ('Keyword',)), ('GRANT\\b', ('Keyword',)), ('FROM\\b', ('Keyword',)), ('FOREACH\\b', ('Keyword',)), ('FIELDTERMINATOR\\b', ('Keyword',)), ('EXPLAIN\\b', ('Keyword',)), ('ELEMENTS\\b', ('Keyword',)), ('DROP\\b', ('Keyword',)), ('DO\\b', ('Keyword',)), ('DETACH\\s+DELETE\\b', ('Keyword',)), ('DESC\\b', ('Keyword',)), ('DESCENDING\\b', ('Keyword',)), ('DENY\\b', ('Keyword',)), ('DELETE\\b', ('Keyword',)), ('DEFAULT\\s+DATABASE\\b', ('Keyword',)), ('DBMS\\b', ('Keyword',)), ('DATABASE\\b', ('Keyword',)), ('DATABASES\\b', ('Keyword',)), ('CYPHER\\b', ('Keyword',)), ('CURRENT\\s+USER\\b', ('Keyword',)), ('CREATE\\s+UNIQUE\\b', ('Keyword',)), ('CREATE\\s+OR\\s+REPLACE\\b', ('Keyword',)), ('CREATE\\b', ('Keyword',)), ('CONSTRAINT\\b', ('Keyword',)), ('CHANGE\\s+REQUIRED\\b', ('Keyword',)), ('CHANGE\\s+NOT\\s+REQUIRED\\b', ('Keyword',)), ('CALL\\b', ('Keyword',)), ('AS\\s+COPY\\s+OF\\b', ('Keyword',)), ('AS\\b', ('Keyword',)), ('ASSIGN\\b', ('Keyword',)), ('ASSERT\\s+EXISTS\\b', ('Keyword',)), ('ASSERT\\b', ('Keyword',)), ('ASC\\b', ('Keyword',)), ('ASCENDING\\b', ('Keyword',)), ('ALTER\\b', ('Keyword',)), ('ALL\\b', ('Keyword',)), ('ACCESS\\b', ('Keyword',)), ('>>\\b', ('Keyword',))], 'labels': [('(:)(\\s*)(`(?:``|[^`])+`)', <function bygroups.<locals>.callback>), ('(:)(\\s*)([A-Za-z_][0-9A-Za-z_]*)', <function bygroups.<locals>.callback>)], 'multiline-comments': [('/\\*', ('Comment', 'Multiline'), 'multiline-comments'), ('\\*/', ('Comment', 'Multiline'), '#pop'), ('[^/*]+', ('Comment', 'Multiline')), ('[/*]', ('Comment', 'Multiline'))], 'numbers': [('[0-9]*\\.[0-9]*(e[+-]?[0-9]+)?', ('Literal', 'Number', 'Float')), ('[0-9]+e[+-]?[0-9]+', ('Literal', 'Number', 'Float')), ('[0-9]+', ('Literal', 'Number', 'Integer'))], 'operators': [('XOR\\b', ('Operator',)), ('WHEN\\b', ('Operator',)), ('THEN\\b', ('Operator',)), ('STARTS\\s+WITH\\b', ('Operator',)), ('OR\\b', ('Operator',)), ('NOT\\b', ('Operator',)), ('IS\\s+NULL\\b', ('Operator',)), ('IS\\s+NOT\\s+NULL\\b', ('Operator',)), ('IN\\b', ('Operator',)), ('END\\b', ('Operator',)), ('ENDS\\s+WITH\\b', ('Operator',)), ('ELSE\\b', ('Operator',)), ('DISTINCT\\b', ('Operator',)), ('CONTAINS\\b', ('Operator',)), ('CASE\\b', ('Operator',)), ('AND\\b', ('Operator',)), ('\\^', ('Operator',)), ('\\>\\=', ('Operator',)), ('\\>', ('Operator',)), ('\\=\\~', ('Operator',)), ('\\=', ('Operator',)), ('\\<\\>', ('Operator',)), ('\\<\\=', ('Operator',)), ('\\<', ('Operator',)), ('\\/', ('Operator',)), ('\\.', ('Operator',)), ('\\-', ('Operator',)), ('\\+\\=', ('Operator',)), ('\\+', ('Operator',)), ('\\*', ('Operator',)), ('\\%', ('Operator',)), ('\\!\\=', ('Operator',))], 'parameters': [('(\\$)(`(?:``|[^`])+`)', <function bygroups.<locals>.callback>), ('(\\$)([A-Za-z_][0-9A-Za-z_]*)', <function bygroups.<locals>.callback>)], 'procedures': [('(CALL)(\\s+)([A-Za-z_][0-9A-Za-z_\\.]*)', <function bygroups.<locals>.callback>)], 'pseudo-keywords': [('ROLLBACK\\b', ('Keyword',)), ('COMMIT\\b', ('Keyword',)), ('BEGIN\\b', ('Keyword',))], 'root': ['strings', 'comments', 'keywords', 'pseudo-keywords', 'escape-commands', ('[,;]', ('Punctuation',)), 'labels', 'operators', 'expressions', 'whitespace', ('\\(', ('Punctuation',), 'in-()'), ('\\[', ('Punctuation',), 'in-[]'), ('\\{', ('Punctuation',), 'in-{}')], 'single-comments': [('.*$', ('Comment', 'Single'), '#pop')], 'strings': [('\'(?:\\\\[bfnrt\\"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}|[^\\\\\'])*\'', ('Literal', 'String')), ('"(?:\\\\[bfnrt\\\'"\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}|[^\\\\"])*"', ('Literal', 'String'))], 'variables': [('`(?:``|[^`])+`', ('Name', 'Variable')), ('[A-Za-z_][0-9A-Za-z_]*', ('Name', 'Variable'))], 'whitespace': [('\\s+', ('Text', 'Whitespace'))]}¶
At all time there is a stack of states. Initially, the stack contains a single state ‘root’. The top of the stack is called “the current state”.
Dict of
{'state': [(regex, tokentype, new_state), ...], ...}
new_state
can be omitted to signify no state transition. Ifnew_state
is a string, it is pushed on the stack. This ensure the new current state isnew_state
. Ifnew_state
is a tuple of strings, all of those strings are pushed on the stack and the current state will be the last element of the list.new_state
can also becombined('state1', 'state2', ...)
to signify a new, anonymous state combined from the rules of two or more existing ones. Furthermore, it can be ‘#pop’ to signify going back one step in the state stack, or ‘#push’ to push the current state on the stack again. Note that if you push while in a combined state, the combined state itself is pushed, and not only the state in which the rule is defined.The tuple can also be replaced with
include('state')
, in which case the rules from the state named by the string are included in the current one.