Dana Grammar¶
⚠️ IMPORTANT FOR AI CODE GENERATORS: Always use colon notation for explicit scopes:
private:x
,public:x
,system:x
,local:x
NEVER use dot notation:private.x
,public.x
, etc. Prefer using unscoped variables (auto-scoped to local) instead of explicitprivate:
scope unless private scope is specifically needed.
Files:
- opendxa/dana/language/dana_grammar.lark
: The Lark grammar file.
The Dana Parser uses the Lark parser to parse the Dana source code into a parse tree.
This document describes the formal grammar definition for the Dana language, as implemented in the Lark grammar file. The grammar defines the syntax rules for parsing Dana source code into a parse tree, which is then transformed into an AST.
Overview¶
The Dana grammar is written in Lark EBNF syntax. It specifies the structure of valid Dana programs, including statements, expressions, literals, and control flow constructs. The grammar is designed to be readable, extensible, and to support indentation-based blocks.
Dana vs. Python: Key Differences¶
-
Scope Prefixes: Dana allows explicit scope prefixes for variables and functions (e.g.,
private:x
,public:y
). Python uses naming conventions and modules for visibility, not explicit prefixes. -
Null Value: Dana uses
None
(capitalized, like Python), but it is a literal in the grammar, not a reserved keyword. -
Comments: Dana only supports single-line comments with
#
. Python also supports docstrings ('''
or"""
), which Dana does not. -
F-Strings: Dana supports f-strings with embedded expressions (e.g.,
f"Value: {x+1}"
), but the implementation and parsing are defined by a formal grammar. Some advanced Python f-string features (like format specifiers) may not be supported. -
Operator Precedence: Dana's operator precedence is defined explicitly in its grammar. While similar to Python, there may be subtle differences—check the grammar if you rely on complex expressions.
-
Comments in Parse Tree: In Dana, comments are ignored by the parser and do not appear in the parse tree. In Python, comments are ignored by the interpreter, but some tools can access them via the AST.
-
Formal Grammar: Dana is defined by a strict formal grammar (Lark), which may restrict or clarify certain constructs more than Python's more flexible syntax.
Main Rules¶
- start: Entry point for parsing; matches a complete Dana program.
- program: Sequence of statements.
- statement: Assignment, conditional, while loop, function call, or newline.
- assignment: Variable assignment (
x = expr
). - conditional: If/else block with indented body.
- while_loop: While loop with indented body.
- function_call: Function or core function call.
- bare_identifier: Standalone identifier.
- expression: Supports logical, comparison, arithmetic, and unary operations.
- literal: String, number, boolean, or null.
- identifier: Variable or function name, with optional scope prefix.
Grammar Structure Diagram¶
graph TD
Start["start"] --> Program["program"]
Program --> Statements
subgraph Statements
direction TB
Assignment
Conditional
WhileLoop
FunctionCall
BareIdentifier
ETC[...]
Conditional --> Statement
WhileLoop --> Statement
Assignment --> Expression
Conditional --> Expression
WhileLoop --> Expression
FunctionCall --> Expression
BareIdentifier --> Identifier
end
Statements --> Expressions
subgraph Expressions
direction TB
Expression
Identifier
Literal
ETC2[...]
Expression --> Identifier
Expression --> Literal
Identifier --> ETC2
Literal --> ETC2
end
Special Syntax and Features¶
- Indentation: Uses
INDENT
andDEDENT
tokens for block structure (handled by the parser's indenter). - Comments: Supports C-style (
/* ... */
) and C++-style (// ...
) comments. - Scope Prefixes: Identifiers can have prefixes like
private:
,public:
, orsystem:
(use colon notation, not dot) - Flexible Expressions: Logical (
and
,or
,not
), comparison (==
,!=
,<
,>
, etc.), arithmetic (+
,-
,*
,/
,%
), and function calls. - Literals: Strings, numbers, booleans, and null values.
Extensibility¶
The grammar is designed to be extensible. New statements, expressions, or literal types can be added by extending the grammar file and updating the parser and transformers accordingly.
Formal Grammar (Minimal EBNF)¶
This EBNF is kept in sync with the Lark grammar and parser implementation in
opendxa/dana/language/dana_grammar.lark
.
program ::= statement+
statement ::= assignment | function_call | conditional | while_loop | for_loop | break_stmt | continue_stmt | function_def | bare_identifier | comment | NEWLINE
assignment ::= identifier '=' expression
expression ::= literal | identifier | function_call | binary_expression
literal ::= string | number | boolean | null | fstring | list | dict | set
function_call ::= identifier '(' [expression (',' expression)*] ')'
conditional ::= 'if' expression ':' NEWLINE INDENT program DEDENT [ 'else:' NEWLINE INDENT program DEDENT ]
while_loop ::= 'while' expression ':' NEWLINE INDENT program DEDENT
for_loop ::= 'for' identifier 'in' expression ':' NEWLINE INDENT program DEDENT
break_stmt ::= 'break'
continue_stmt ::= 'continue'
function_def ::= 'def' identifier '(' [identifier (',' identifier)*] ')' ':' NEWLINE INDENT program DEDENT
bare_identifier ::= identifier
comment ::= ('//' | '#') .*
identifier ::= [a-zA-Z_][a-zA-Z0-9_.]*
list ::= '[' expression (',' expression)* ']'
fstring ::= 'f' ( '"' <any chars except unescaped '"'> '"' | '\'' <any chars except unescaped '\''> '\'' )
fstring_parts ::= (fstring_text | fstring_expr)*
fstring_expr ::= '{' expression '}'
fstring_text ::= <any text not containing '{' or '}'>
fstring_start ::= '"' | '\''
fstring_end ::= fstring_start
dict ::= '{' [key_value_pair (',' key_value_pair)*] '}'
key_value_pair ::= expression ':' expression
set ::= '{' expression (',' expression)* '}'
binary_expression ::= expression binary_op expression
binary_op ::= '==' | '!=' | '<' | '>' | '<=' | '>=' | 'and' | 'or' | 'in' | '+' | '-' | '*' | '/'
string ::= '"' <any chars except unescaped '"'> '"' | '\'' <any chars except unescaped '\''> '\''
- All blocks must be indented consistently
- One instruction per line
- F-strings support expressions inside curly braces:
f"Value: {x+1}"
and can contain multiple text and expression parts. - Built-in functions like
len()
are supported via transformer logic and do not require specific grammar rules. - The Lark grammar is more explicit about operator precedence (logical, comparison, arithmetic, unary) than this EBNF, which is more abstract.
- In the Lark grammar,
NEWLINE
is a possible statement, allowing for blank lines in code. - In this EBNF, comments are treated as statements and could appear in the parse tree. In the actual Lark grammar, comments (lines starting with
#
) are ignored and do not appear in the parse tree at all. - Both single (
'...'
) and double ("..."
) quotes are accepted for string literals and f-strings, just like in Python.