The Edge Specification

All Edge, No Drag.

This document defines Edge, a domain specific language for the Ethereum Virtual Machine (EVM).

Edge is a high level, strongly statically typed, multi-paradigm language. It provides:

A thin layer of abstraction over the EVM's instruction set architecture (ISA).
An extensible polymorphic type system with subtyping.
First class support for modules and code reuse.
Compile time code execution to fine-tune the compiler's input.

Edge's syntax is similar to Rust and Zig where intuitive, however, the language is not designed to be a general purpose language with EVM features as an afterthought. Rather, it is designed to extend the EVM instruction set with a reasonable type system and syntax sugar over universally understood programming constructs.

Notation

This specification uses a grammar similar to Extended Backus-Naur Form (EBNF) with the following rules.

Non-terminal tokens are wrapped in angle brackets <ident>.
Terminal tokens are wrapped in double quotes "const".
Optional items are wrapped in brackets ["mut"].
Sequences of zero or more items are wrapped in parenthesis and suffixed with a star ("," <ident>)*.
Sequences of one or more items are wrapped in parenthesis and suffixed with a plus (<ident>)+.

In contrast to EBNF, we define a rule that all items are non-atomic, that is to say arbitrary whitespace characters \n, \t, and \r may surround all tokens unless wrapped with curly braces { "0x" (<hex_digit>)* }.

Generally, we use long-formed names for clarity of each token, however, common tokens are abbreviated and defined as follows:

"ident": "identifier"
"expr": "expression"
"stmt": "statement"

Disambiguation

This section contains context that be required throughout the specification.

Return vs Return™️

The word "return" refers to two different behaviors, returned values from expressions and the halting return opcode.

When "return" is used, this refers to the values returned from expressions, that is to say the values left on the stack, if any.

When "halting return" is used, this refers to the EVM opcode return that halts execution and returns a value from a slice of memory to the caller of the current execution context.

Syntax

Conceptually, all EVM contracts are single-entry point executables and at compile time, Edge programs are no different.

Other languages have used primarily the contract-is-an-object paradigm, mapping fields to storage layouts and methods to "external functions" that may read and write the storage. Inheritance enables interface constraints, code reuse, and a reasonable model for message passing that relates to the EVM external call model.

However, this is limited in scope. Conceptually, the contract object paradigm groups stateful data and functionality, limiting the deployability to the product type. Extending the deployability to arbitrary data types allows for contracts to be functions, type unions, product types, and more. While most of these are not particularly useful, this simplifies the type system as well as opens the design space to new contract paradigms.

The core syntax of Edge is derived from commonly used patterns in modern programming. Functions, branches, and loops are largely intuitive for engineers with experience in C, Rust, Javascript, etc. Parametric polymorphism uses syntax similar to Rust and Typescript. Compiler built-in functions and "comptime" constructs follow the syntax of Zig.

Comments

<line_comment> ::= "//" (!"\n" <ascii_char>)* "\n" ;

<block_comment> ::= "/*" (!"*/" <ascii_char>)* "*/" ;

<item_devdoc> ::= "///" (!"\n" <ascii_char>)* "\n" ;

<module_devdoc> ::= "//!" (!"\n" <ascii_char>)* "\n" ;

The <line_comment> is a single line comment, ignored by the parser.

The <block_comment> is a multi line comment, ignored by the parser.

The <item_devdoc> is a developer documentation comment, treated as documentation for the immediately following item.

The <module_devdoc> is a developer documentation comment, treated as documentation for the module in which it is defined.

Developer documentation comments are treated as Github-flavored markdown.

Identifiers

<ident> ::= (<alpha_char> | "_") (<alpha_char> | <dec_digit> | "_")* ;

Dependencies:

The <ident> is a C-style identifier, beginning with an alphabetic character or underscore, followed by zero or more alphanumeric or underscore characters.

Data Locations

<storage_pointer> ::= "&s" ;
<transient_storage_pointer> ::= "&t" ;
<memory_pointer> ::= "&m" ;
<calldata_pointer> ::= "&cd" ;
<returndata_pointer> ::= "&rd" ;
<internal_code_pointer> ::= "&ic" ;
<external_code_pointer> ::= "&ec" ;

<data_location> ::=
    | <storage_pointer>
    | <transient_storage_pointer>
    | <memory_pointer>
    | <calldata_pointer>
    | <returndata_pointer>
    | <internal_code_pointer>
    | <external_code_pointer> ;

The <location> is a data location annotation indicating to which data location a pointer's data exists. We define seven distinct annotations for data location pointers. This is a divergence from general purpose programming languages to more accurately represent the EVM execution environment.

&s persistent storage
&t transient storage
&m memory
&cd calldata
&rd returndata
&ic internal (local) code
&ec external code

Semantics

Data locations can be grouped into two broad categories, buffers and maps.

Maps

Persistent and transient storage are part of the map category, 256 bit keys map to 256 bit values. Both may be written or read one word at a time.

Buffers

Memory, calldata, returndata, internal code, and external code are all linear data buffers. All can be either read to the stack or copied into memory, but only memory can be written or copied to.

Name	Read to Stack	Copy to Memory	Write
memory	true	true	true
calldata	true	true	false
returndata	false	true	false
internal code	false	true	false
external code	false	true	false

Transitions

Transitioning from map to memory buffer is performed by loading each element from the map to the stack and storing each stack item in memory O(N).

Transitioning from memory buffer to a map is performed by loading each element from memory to the stack and storing each stack item in the map O(N).

Transitioning from any other buffer to a map is performed by copying the buffer's data into memory then transitioning the data from memory into the map O(N+1).

Pointer Bit Sizes

Pointers to different data locations consist of different sizes based on the properties of that data location. In depth semantics of each data location are specified in the type system documents.

Location	Bit Size	Reason
persistent storage	256	Storage is 256 bit key value hashmap
transient storage	256	Transient storage is 256 bit key value hashmap
memory	32	Theoretical maximum memory size does not grow to `0xffffffff`
calldata	32	Theoretical maximum calldata size does not grow to `0xffffffff`
returndata	32	Maximum returndata size is equal to maximum memory size
internal code	16	Code size is less than `0xffff`
external code	176	Contains 160 bit address and 16 bit code pointer

Expressions

<binary_operation> ::= <expr> <binary_operator> <expr> ;
<unary_operation> ::= <unary_operator> <expr> ;

<expr> ::=
    | <array_instantiation>
    | <array_element_access>
    | <struct_instantiation>
    | <tuple_instantiation>
    | <struct_field_access>
    | <tuple_field_access>
    | <union_instantiation>
    | <pattern_match>
    | <arrow_function>
    | <function_call>
    | <binary_operation>
    | <unary_operation>
    | <ternary>
    | <literal>
    | <ident>
    | ("(" <expr> ")");

Dependencies:

The <expr> is defined as an item that returns¹ a value.

The <binary_operation> is an expression composed of two sub-expressions with an infixed binary operator. Semantics are beyond the scope of the syntax specification, see operator precedence semantics for more.

The <unary_operation> is an expression composed of a prefixed unary operator and a sub-expression.

See Disambiguation: Return vs Return™️

Statements

<stmt> ::=
    | <variable_declaration>
    | <variable_assignment>
    | <type_declaration>
    | <type_assignment>
    | <trait_declaration>
    | <impl_block>
    | <function_declaration>
    | <function_assignment>
    | <abi_declaration>
    | <contract_declaration>
    | <contract_impl_block>
    | <core_loop>
    | <for_loop>
    | <while_loop>
    | <do_while_loop>
    | <code_block>
    | <if_else_if_branch>
    | <if_match_branch>
    | <match>
    | <constant_assignment>
    | <comptime_branch>
    | <comptime_function>
    | <module_declaration>
    | <module_import> ;

Dependencies:

The <stmt> is similar to an expression, however the item does not return¹ a value.

See Disambiguation: Return vs Return™️

Variables

Declaration

<variable_declaration> ::= "let" <ident> [":" <type_signature>] ;

Dependencies:

The <variable_declaration> marks the declaration of a variable, it may optionally be assigned at the time of declaration.

Assignment

<variable_assignment> ::= <ident> "=" <expr> ;

Dependencies:

The <variable_assignment> is the assignment of a variable. Its identifier is assigned the returned value of an expression using the assignment operator.

Type System

The type system builds on core primitive types inherent to the EVM with abstract data types for parametric polymorphism, nominative subtyping, and compile time monomorphization.

Primitive Types

<integer_size> ::= "8" | "16" | "24" | "32" | "40" | "48" | "56" | "64" | "72" | "80" | "88" | "96"
    | "104" | "112" | "120" | "128" | "136" | "144" | "152" | "160" | "168" | "176" | "184" | "192"
    | "200" | "208" | "216" | "224" | "232" | "240" | "248" | "256" ;

<fixed_bytes_size> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12"
    | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23" | "24" | "25"
    | "26" | "27" | "28" | "29" | "30" | "31" | "32" ;

<signed_integer> ::= {"i" <integer_size>} ;
<unsigned_integer> ::= {"u" <integer_size>} ;
<fixed_bytes> ::= {"b" <fixed_bytes_size>} ;
<address> ::= "addr" ;
<boolean> ::= "bool" ;
<bit> ::= "bit" ;
<pointer> ::= <data_location> "ptr" ;

<numeric_type> ::= <signed_integer> | <unsigned_integer> | <fixed_bytes> | <address> ;

<primitive_data_type> ::=
    | <numeric_type>
    | <boolean>
    | <pointer> ;

Dependencies:

<data_location>

The <primitive_data_type> contains signed and unsigned integers, boolean, address, and fixed bytes types. Additionally, we introduce a pointer type that must be prefixed with a data location annotation.

Examples

u8
u256
i8
i256
b4
b32
addr
bool
bit
&s ptr

Semantics

Integers occupy the number of bits indicated by their size. Fixed bytes types occupy the number of bytes indicated by their size, or size * 8 bits. Address occupies 160 bits. Booleans occupy eight bits. Bit occupies a single bit. Pointers occupy a number of bits equal to their data location annotation.

Pointers can point to both primitive and complex data types.

Type Assignment

Signature

<type_signature> ::=
    | <array_signature>
    | <struct_signature>
    | <tuple_signature>
    | <union_signature>
    | <function_signature>
    | <ident>
    | (<ident> [<type_parameters>]) ;

Dependencies:

Type assignments assign identifiers to type signatures. It may have a struct, tuple, union, or function signature as well as an identifier followed by optional type parameters.

Declaration

<type_declaration> ::= ["pub"] "type" <ident> [<type_parameters>]

Dependencies:

The <type_declaration> is prefixed with "type" and contains an identifier with optional type parameters.

Assignment

<type_assignment> ::= <type_declaration> "=" <type_signature> ;

The <type_assignment> is a type declaration followed by a type signature separated by an assignment operator.

Semantics

Type assignment entails creating an identifier associated with a certain data structure or existing type. If the assignment is to an existing data type, it contains the same fields or members, if any, and exposes the same associated items, if any.

type MyCustomType = packed (u8, u8, u8);
type MyCustomAlias = MyCustomType;

fn increment(rgb: MyCustomType) -> MyCustomType {
    return (rgb.0 + 1, rgb.1 + 1, rgb.2 + 1);
}

increment(MyCustomType(1, 2, 3));
increment(MyCustomAlias(1, 2, 3));

A way to create a wrapper around an existing type without exposing the existing type's external interface, the type may be wrapped in parenthesis, creating a "tuple" of one element, which comes without overhead.

type MyCustomType = packed (u8, u8, u8);
type MyNewCustomType = (MyCustomType);

Array Types

The array type is a list of elements of a single type.

Signature

<array_signature> ::= ["packed"] "[" <type_signature> ";" <expr> "]" ;

Dependencies:

The <array_signature> consists of an optional "packed" keyword prefix to a type signature and expression separated by a colon, delimited by brackets.

Instantiation

<array_instantiation> ::= [<data_location>] "[" <expr> ("," <expr>)* [","] "]" ;

Dependencies:

The <array_instantiation> is an optional data location annotation followed by a comma separated list of expressions delimited by brackets.

Element Access

<array_element_access> ::= <ident> "[" <expr> [":" <expr>] "]" ;

Dependencies:

The <array_element_access> is the array's identifier followed a bracket-delimited expression and optionally a second expression, colon separated.

Examples

type TwoElementIntegerArray = [u8; 2];
type TwoElementPackedIntegerArray = packed [u8; 2];

const arr: TwoElementIntegerArray = [1, 2];

const elem: u8 = arr[0];

Semantics

Instantiation

Instantiation of a fixed-length array stores one element per 32 byte word in either data location. The only difference between data locations in terms of instantiation behavior is if all elements of the array are populated with constant values and the array belongs in memory, a performance optimization may include code-copying an instance of the constant array from the bytecode into memory.

Access

Array element access depends on whether the second expression is included. If a single expression is inside the access brackets, the single element is returned from the array. If a second expression follows the first with a colon in between, a pointer of the same data location is returned. The type of the new array pointer is the same type but the size is now the size of the second expression's value minus the first expression's value. If the index values are known at compile time and are greater than or equal to the array's length, a compiler error is thrown, else a bounds check against the array's length is added into the runtime bytecode.

Product Type

The product type is a compound type composed of none or more internal types.

Signature

<struct_field_signature> ::= <ident> ":" <type_signature> ;
<struct_signature> ::=
    ["packed"] "{"
        [<struct_field_signature> ("," <struct_field_signature>)* [","]]
    "}" ;
<tuple_signature> ::= ["packed"] "(" <type_signature> ("," <type_signature>)* [","] ")" ;

Dependencies:

Instantiation

<struct_field_instantiation> ::= <ident> ":" <expr> ;

<struct_instantiation> ::=
    [<data_location>] <struct_signature> "{"
        [<struct_field_instantiation> ("," <struct_field_instantiation>)* [","]]
    "}" ;

<tuple_instantiation> ::= [<data_location>] <ident> "(" [<expr> ("," <expr>)* [","]] ")" ;

Dependencies:

The <struct_instantiation> is an instantiation, or creation, of a struct. It may optionally include a data location annotation, however the semantic rules for this are in the data location semantic rules. It is instantiated by the struct identifier followed by a comma separated list of field name and value pairs delimited by curly braces.

The <tuple_instantiation> is an instantiation, or creation, of a tuple. It may optionally include a data location annotation, however the semantic rules for this are in the data location semantic rules. It is instantiated by a comma separated list of expressions delimited by parenthesis.

Field Access

<struct_field_access> ::= <ident> "." <ident> ;
<tuple_field_access> ::= <ident> "." <dec_char> ;

Dependencies:

The <struct_field_access> is written as the struct's identifier followed by the field's identifier separated by a period.

The <tuple_field_access> is written as the tuple's identifier followed by the field's index separated by a period.

Examples

type PrimitiveStruct = {
    a: u8,
    b: u8,
    c: u8,
};

const primitiveStruct: PrimitiveStruct = PrimitiveStruct { a: 1, b: 2, c: 3 };

const a = primitiveStruct.a;

type PackedTuple = packed (u8, u8, u8);

const packedTuple: PackedTuple = (1, 2, 3);

const one = packedTuple.0;

Semantics

The struct field signature maps a type identifier to a type signature. The field may be accessed by the struct's identifier and field identifier separated by a dot.

Prefixing the signature with the "packed" keyword will pack the fields by their bitsize, otherwise each field is padded to its own 256 bit word.

type Rgb = packed { r: u8, g: u8, b: u8 };

let rgb = Rgb { r: 1, g: 2, b: 3 };
// rbg = 0x010203

Instantiation depends on the data location. Structs that can fit into a single word, either a single field struct or a packed struct with a bitsize sum less than or equal to 256, sit on the stack by default. Instantiating a struct in memory requires the memory data location annotation. If a struct that does not fit into a single word does not have a data location annotation, a compiler error is thrown.

Stack struct instantiation consists of optionally bitpacking fields and leaving the struct on the stack. Memory instantiation consists of allocating new memory, optionally bitpacking fields, storing the struct in memory, and leaving the pointer to it on the stack.

type MemoryRgb = { r: u8, g: u8, b: u8 };

let memoryRgb = MemoryRgb{ r: 1, g: 2, b: 3 };
// ptr = ..
// mstore(ptr, 1)
// mstore(add(32, ptr), 2)
// mstore(add(64, ptr), 3)

Persistent and transient storage structs must be instantiated at the file level. If anything except zero values are assigned, storage writes will be injected into the initcode to be run on deployment. A reasonable convention for creating a storage layout without the contract object abstraction would be to create a Storage type which is a struct, mapping identifiers to storage slots. Nested structs will also allow granular control over which variables get packed.

type Storage = {
    a: u8,
    b: u8,
    c: packed {
        a: u8,
        b: u8
    }
}

const storage = @default<Storage>();

fn main() {
    storage.a = 1;      // sstore(0, 1)
    storage.b = 2;      // sstore(1, 2)
    storage.c.a = 3;    // ca = shl(8, 3)
    storage.c.b = 4;    // sstore(2, or(ca, 4))
}

Packing rules for buffer locations is to pack everything exactly by its bit length. Packing rules for map locations is to right-align the first field, for each subsequent field, if its bitsize fits into the same word as the previous, it is left-shifted to the first available bits, otherwise, if the bitsize would overflow, it becomes a new word.

type Storage = {
    a: u128,
    b: u8,
    c: addr,
    d: u256
}

const storage = Storage {
    a: 1,
    b: 2,
    c: 0x3,
    d: 4,
};

Slot	Value
0x00	0x0000000000000000000000000000000200000000000000000000000000000001
0x01	0x0000000000000000000000000000000000000000000000000000000000000003
0x02	0x0000000000000000000000000000000000000000000000000000000000000004

Sum Types

The sum type is a union of multiple types where the data type represents one of the inner types.

Signature

<union_member_signature> ::= <ident> ["(" <type_signature> ")"] ;
<union_signature> ::= ["|"] <union_member_signature> ("|" <union_member_signature>)* ;

Dependencies:

The <union_declaration> is a declaration of a sum type, or data structure that contains one of its internally declared members. Each <union_member> is named by an identifier, optionally followed by a number of comma separated types delimited by parenthesis.

Instantiation

<union_instantiation> ::= <ident> "::" <ident> "(" [<expr> ("," <expr>)* [","]] ")" ;

Dependencies:

The <union_instantiation> instantiates, or creates, the sum type. This consists of the union's identifier, followed by the member's identifier, followed by an optional comma separated list of expressions.

Behavior of instantiation is defined in the data location rule.

Union Pattern

<union_pattern> ::= <ident> "::" <ident> ["(" <ident> ("," <ident>)* [","] ")"];

Dependencies:

<ident>

The <union_pattern> is a pattern consisting of the union's name and a member's name separated by a double colon.

Pattern Match

<pattern_match> ::= <ident> "matches" <union_pattern> ;

Dependencies:

<ident>

Semantics

All unions have a Unions where no member has its own internal type is effectively an enumeration over integers.

type Mutex = Locked | Unlocked;

// Mutex::Locked == 0
// Mutex::Unlocked == 1

Unions where any members have an internal type become proper type unions. The only case in which a union can exist on the stack rather than another data location is if the largest of the internal types has a bitsize of 248 or less. If any member's internal type is greater than 248, a data location must be specified.

type StackUnion = A(u8) | B(u248);

type MemoryUnion = A(u256) | B | C(u8);

A union pattern consists of its identifier and the member identifier separated by colons. This pattern may be used both in match statements and if statements.

type Option<T> = None | Some(T);

impl Option<T> {
    fn unwrap(self) -> T {
        match self {
            Option::Some(inner) => return inner,
            Option::None => revert(),
        };
    }

    fn unwrapOr(self, default: T) -> T {
        let mut value = defaut;
        if self matches Option::Some(inner) {
            value = inner;
        }
        return value;
    }
}

Generics

Generics are polymorphic types enabling function and type reuse across different types.

Type Parameters

<type_parameter_single> ::= <ident> <trait_constraints> ;

<type_parameters> ::= "<" <type_parameter_single> ("," <type_parameter_single>)* [","] ">" ;

Dependencies:

The <type_parameter_single> is an individual type parameter for parametric polymorphic types and functions. We define this as a type name optionally followed by a trait constraint.

The <type_parameters> is a comma separated list of individual type parameters delimited by angle brackets.

Semantics

Generics are resolved at compile time through monomorphization. Generic functions and data types are monomorphized into distinct unique functions and data types. Function duplication can become problematic due to the EVM bytecode size limit, so a series of steps will be taken to allow for granular control over bytecode size. Those semantics are defined in the Codesize document.

Traits

Traits are interface-like declarations that constrain generic types to implement specific methods or contain specific properties.

Declaration

<trait_declaration> ::=
    ["pub"] "trait" <ident> [<type_parameters>] [<trait_constraints>] "{"
    (
        | <type_declaration>
        | <type_assignment>
        | <constant_declaration>
        | <constant_assignment>
        | <function_declaration>
        | <function_assignment>
    )*
    "}" ;

Dependencies:

The <trait_declaration> is a declaration of a set of associated types, constants, and functions that may itself take type parameters and may be constrained to a super type. Semantics of the declaration are listed under trait solving rules.

Constraints

<trait_constraints> ::= ":" <ident> ("&" <ident>)* ;

Dependencies:

<ident>

The <trait_constraints> contains a colon followed by an ampersand separated list of identifiers of implemented traits. The ampersand is meant to indicate that all of the trait identifiers are implemented for the type.

Semantics

Traits can be defined with associated types, constants, and functions. The trait declaration itself allows for optional assignment for each item as a default. Any declarations in the trait that are not assigned in the trait declaration must be assigned in the implementation of the trait for the data type. Additionally, any assignments in the trait declaration can be overridden in the trait implementation.

While types can depend on trait constraints, traits can also depend on other trait constraints. These assert that types that implement a given trait also implement its "super traits".

Solving

todo

Implementation

Implementation blocks enable method-call syntax.

Implementation Block

<impl_block> ::=
    "impl" <ident> [<type_parameters>] [":" <ident> [<type_parameters>]] "{"
        (
            | <function_assignment>
            | <constant_assignment>
            | <type_assignment>
        )*
    "}"

Dependencies:

The <impl_block> is the implementation block for a given type. The type identifier is optionally followed by type parameters then optionally followed by a "for" clause. The "for" clause contains trait identifiers and optional type parameters for the traits. Followed by this is a list of function, constant, and type assignments delimited by curly braces.

Semantics

Associated functions, constants, and types are defined for a given type. If the type contains any generics in any of its internal assignments, the type parameters must be brought into scope by annotating them directly following the type's identifier.

If the impl block is to satisfy a trait's interface, the type's identifier and optional type parameters are followed by the trait's identifier and optional type parameters. In this case, only associated functions, constants, and types that are declared in the trait's declaration may be defined in the impl block. Additionally, all declarations in a trait's declaration that are not assigned in the trait's declaration must be assigned in the impl block for the given data type.

Function Types

The function type is a type composed of input and output types.

Signature

<function_signature> ::= <type_signature> "->" <type_signature> ;

Dependencies:

<type_signature>

The <function_signature> consists of an input type signature and an output type signature, separated by an arrow.

Note: <type_signature> also contains a tuple signature, therefore a function with multiple inputs and outputs is implicitly operating on a tuple.

Declaration

<function_declaration> ::= 
    "fn" <ident> "("
        [(<ident> ":" <type_signature>) ("," <ident> ":" <type_signature>)* [","]]
    ")" ["->" "(" <type_signature> ("," <type_signature>)* [","] ")"] ;

Dependencies:

Assignment

<function_assignment> ::= <function_declaration> <code_block> ;

Dependencies:

<code_block>

The <function_assignment> is defined as the "fn" keyword followed by its identifier, followed by optional comma separated pairs of identifiers and type signatures, delimited by parenthesis, then optionally followed by an arrow and a list of comma separated return types signatures delimited by parenthesis, then finally the code block of the function body.

Arrow Functions

<arrow_function> ::= (<ident> | ("(" <ident> ("," <ident>)* [","] ")")) "=>" <code_block> ;

Dependencies:

The <arrow_function> is defined as either a single identifier or a comma separated, parenthesis delimited list of identifiers, followed by the "=>" bigram, followed by a code block.

Call

<function_call> ::= <ident> "(" [<expr> ("," <expr>) [","]] ")" ;

Dependencies:

The <function_call> is an identifier followed by a comma separated list of expressions delimited by parenthesis.

Semantics

todo

Event Types

The event type is a custom type to be logged.

Signature

<event_field_signature> ::= <ident> ":" ( "indexed" "<" <type_signature> ">" | <type_signature> ) ;

<event_signature> ::=
    ["anon"] "event" "{" [<event_field_signature> ("," <event_field_signature>)* [","]] "}" ;

Dependencies:

The <event_field_signature> is an optional "anon" word, followed by "event", followed by either a type signature or a type signature delimited by angle brackets and prefixed with "indexed".

Semantics

The event type is assigned an identifier the same way other types are assigned an identifier. The EVM allows up to four topics, therefore if "anon" is used, the event may contain four "indexed" values, else the event may contain three. If the event is not anonymous, the first topic follows Solidity's ABI specification. That is to say the first topic is the keccak256 hash digest of the event identifier, followed by a comma separated list of the event type names with no whitespace, delimited by parenthesis.

ABI

The application binary interface is both a construct to generate a JSON ABI by the compiler as well as a subtyping construct for contract objects.

Declaration

<abi_declaration> ::=
    "abi" <ident> [":" <ident> ("&" <ident>)*] "{"
        (
            ["mut"] <function_declaration> ";"
        )*
    "}" ;

Dependencies:

The <abi_declaration> is prefixed with "abi", followed by its identifier, then an optional colon and list of ampersand separated identifiers, and finally a series of zero or more function declarations optionally prefixed by "mut" and delimited by curly braces.

Semantics

The optional "mut" keyword indicates whether the function will mutate the state of the smart contract or the EVM. This allows contracts to determine whether to use the call or staticcall instruction to interface with a target conforming to the given ABI.

The optional ampersand separated list of identifiers represents other ABI identifiers to enable ABI subtyping.

todo: revisit this. do traits satisfy this instead?

Contract Objects

Contract objects serve as an object-like interface to contract constructs.

Declaration

<contract_field_declaration> ::= <ident> ":" <type_signature> ;
<contract_declaration> ::=
    "contract" <ident> "{"
        [<contract_field_declaration> ("," <contract_field_declaration>)* [","]]
    "}" ;

Dependencies:

The <contract_field_declaration> is an identifier and type signature, separated by a colon.

The <contract_declaration> is the contract keyword, followed by its identifier, followed by a curly brace delimited, comma separated list of field declarations.

Implementation

<contract_impl_block> ::=
    "impl" <ident> [":" <ident>] "{"
        (["ext"] ["mut"] <function_declaration>)*
    "}"

Dependencies:

The <contract_impl_block> is composed of the "impl" keyword, followed by its identifier, optionally followed by a colon and abi identifier, followed by list of function declarations, optionally "ext" and/or "mut", delimited by curly braces.

Semantics

The contract object desugars to a single main function and storage layout with a disptacher.

Contract field declarations create the storage layout which start at zero and increment by one for each field. Fields are never packed, however, storage packing may be achieved by declaring contract fields as packed structs or tuples.

Contract implementation blocks contain definitions of external functions in the contract object. If the impl block contains a colon and identifier, this indicates the impl block is satisfying an abi's constrained functions. The "ext" keyword indicates the function is publicly exposed via the contract's dispatcher. The "mut" keyword indicates the function may mutate the global state in the EVM-sense; that is to say "mut" functions require a "call" instruction while those without may use "call" or "staticcall" to interface with the contract.

todo: revisit this. do types satisfy this instead?

Control Flow

Control flow is composed of loops, branches, and pattern matching.

Loops

Loops are blocks of code that may be executed repeatedly based on some conditions.

Loop Control

<loop_break> ::= "break" ;
<loop_continue> ::= "continue" ;

The <loop_break> keyword "breaks" the loop's execution, jumping to the end of the loop immediately.

The <loop_continue> keyword "continues" the loop's execution from the start, short circuiting the remainder of the loop.

Loop Block

<loop_block> ::= "{" ((<expr> | <stmt> | <loop_break> | <loop_continue>) ";")* "}" ;

Dependencies:

The <loop_block> is a block of code to be executed repeatedly. All other loops are derived from this single loop block.

Core Loop

<core_loop> ::= "loop" <loop_block> ;

The core loop block is the simplest of blocks, it contains no code to be injected anywhere else. All other loops are syntactic sugar over the core loop. The "desugaring" step for each loop is in the control flow semantic rules.

For Loop

<for_loop> ::= "for" "(" [(<stmt> | <expr>)]";" [<expr>] ";" [(<stmt> | <expr>)] ")" <loop_block> ;

Dependencies:

The <for_loop> is a loop block prefixed with three individually optional items. The first may be a statement or expression, the second may only be an expression, and the third may be an expression or statement.

While Loop

<while_loop> ::= "while" "(" <expr> ")" <loop_block> ;

Dependencies:

<expr>

The <while_loop> is a loop block prefixed with one required expression.

Do While Loop

<do_while_loop> ::= "do" "while" <loop_block> "(" <expr> ")" ;

Dependencies:

<expr>

The <do_while_loop> is a loop block suffixed with one required expression.

Semantics

todo

Code Block

A code block is a sequence of items with its own scope. It may be used independently or in tandem with conditional statements.

Declaration

<code_block> ::= "{" ((<stmt> | <expr>) ";")* "}" ;

Dependencies:

The <code_block> is a semi-colon separated list of expressions or statements delimited by curly braces.

Semantics

Code blocks may be contained in loops, branching statements, or standalone statements.

Code blocks represent a distinct scope. Identifiers declared in a code block are dropped once the code block ends.

Branching

Branching refers to blocks of code that may be executed based on a defined condition.

If Else If Branch

<if_else_if_branch> ::= "if" "(" <expr> ")" <code_block>
    ("else" "if" "(" <expr> ")" <code_block>)*
    ["else" <code_block>] ;

Dependencies:

The <branch> contains an "if" keyword followed by a parenthesis delimited expression and a code block. It may be followed by zero or more conditions under "else" "if" keywords followed by a parenthesis delimited expression and a code block, and finally it may optionally be suffixed with an "else" keyword followed by a code block.

If Match

<if_match_branch> ::= "if" <pattern_match> <code_block> ;

Dependencies:

The <if_match_branch> contains a pattern match expression followed by an optionally typed identifier followed by a code block.

Match

<match_arm> ::= (<union_pattern> | <ident> | "_") "=>" <code_block> ;

<match> ::=
    "match" <expr> "{"
    [<match_arm> ("," <match_arm>)* [","]]
    "}" ;

Dependencies:

The <match_arm> is a single arm of a match statement. It may optionally be prefixed with a union pattern and contains a lambda.

The <match> statement is a group of match arms that may pattern match against an expression.

Semantics of the match statement are defined in the control flow semantics.

Ternary

<ternary> ::= <expr> "?" <expr> ":" <expr> ;

Dependencies:

<expr>

The <ternary> is a branching statement that takes an expression, followed by a question mark, or ternary operator, followed by two colon separated expressions.

Semantics

If Else If Branch

The expression of the "if" statement is evaluated. The type of the expression must either be a boolean or it must be a value that can be cast to a boolean. If the result is true, the subsequent block of code is executed. Otherwise the next branch is checked. If the optional "else if" follows, the above process is repeated until either there are no more branches or the optional "else" follows. If no branches have resolved to true, the "else" block is executed.

fn main() {
    let n = 3;

    if (n == 1) {
        // ..
    } else if (n == 2) {
        // ..
    } else {
        // ..
    }
}

If Match

The "if match" statement executes as the "if" statement does, however, the expression to evaluate is a pattern match. While the pattern match semantics are specified elsewhere, the "if match" branch brings into scope the identifier(s) of the inner type(s) of the matched pattern.

type Union = A(u8) | B;

fn main() {
    let u = Union::A(1);

    if u matches Union::A(n) {
        assert(n == 1);
    }
}

Match

Matching requires all possible patterns for a given expression's type to be evaluated. If any pattern is not matched in a match block, a compiler error is thrown. The semantics for match arms are the same as those for the "if match" statement.

The remaining branches for a pattern match may be grouped together either with an identifier or if the identifier is unnecessary, an underscore. Using an identifier assigns a subset of the associated type into scope. The subset of the type contains one of the unmatched members. This does not create a new distinct data type, rather it infers the non-existence of the pre-matched branches.

type Ua = A | B;
type Ub = A | B;

fn main() {
    let u_a = Ua::B;
    let u_b = Ub::B;

    match u_a {
        Ua::A => {},
        Ub::B => {},
    }

    match u_b {
        Ua::A => {},
        n => {
            // `n` inferred to have type `Ub::B`
        }
    }
}

Ternary

The ternary operator evaluates the expression, the first expression's result must be of type boolean, and if the expression evaluates to true, the second expression is evaluated, otherwise the third expression is evaluated.

fn main() {
    let condition = true;

    let mut a = 0;

    if (condition) {
        a = 1;
    } else {
        a = 2;
    }

    let b = condition ? 1 : 2;

    assert(a == b);
}

Short Circuiting

For all branch statements that evaluate a boolean expression to determine which branches to take, the following statements hold if the expression is composed of multiple inner boolean expressions separated by logical operators.

if <expr0> && <expr1> and <expr0> is false, short circuit to false
if <expr0> || <expr1> and <expr0> is true, short circuit to true

Also, for all chains of "if else if" statements, if the first evaluates to true, do not evaluate the remaining chained statements.

Operators

Operators are syntax sugar over built-in functions.

Binary

<arithmetic_binary_operator> ::=
    | "+" | "+="
    | "-" | "-="
    | "*" | "*="
    | "/" | "/="
    | "%" | "%="
    | "**" | "**=" ;
<bitwise_binary_operator> ::=
    | "|" | "|="
    | ">>" | ">>="
    | "<<" | "<<="
    | "&" | "&="
    | "^" | "^=" ;

<logical_binary_operator> ::=
    | "=="
    | "!="
    | "&&"
    | "||"
    | ">" | ">="
    | "<" | "<=" ;

<binary_operator> ::=
    | <arithmetic_binary_operator>
    | <bitwise_binary_operator>
    | <logical_binary_operator> ;

Unary

<arithmetic_unary_operator> ::= "-" ;

<bitwise_unary_operator> ::= "~" ;

<logical_unary_operator> ::= "!" ;

<unary_operator> ::=
    | <arithmetic_unary_operator>
    | <bitwise_unary_operator>
    | <logical_unary_operator> ;

Semantics

Operator overloading is disallowed.

operator	types	behavior	panic case
`+`	integers	checked addition	overflow
`-`	integers	checked subtraction (binary)	underflow
`-`	integers	checked negation (unary)	overflow
`*`	integers	checked multiplication	overflow
`/`	integers	checked division	divide by zero
`%`	integers	checked modulus	divide by zero
`**`	integers	exponentiation	-
`&`	integers	bitwise AND	-
`\\|`	integers	bitwise OR	-
`~`	integers	bitwise NOT	-
`^`	integers	bitwise XOR	-
`>>`	integers	bitwise shift right	-
`<<`	integers	bitwise shift left	-
`==`	any	equality	-
`!=`	any	inequality	-
`&&`	booleans	logical AND	-
`\\|\\|`	booleans	logical OR	-
`!`	booleans	logical NOT	-
`>`	integers	greater than	-
`>=`	integers	greater than or equal to	-
`<`	integers	less than	-
`<=`	integers	less than or equal to	-

Compile Time

Compile time, also referred to as comptime, is an expression, function, branch, or macro, that may be resolved during compilation. Comptime expressions and functions resolve to constant values at compile time, while comptime branches provide conditional compilation.

Literals

Characters

<bin_char> ::= "0" | "1" ;
<dec_char> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
<hex_char> ::=
    | "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "a"
    | "b" | "c" | "d" | "e" | "f" | "A" | "B" | "C" | "D" | "E" | "F";
<alpha_char> ::=
    | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p"
    | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" ;
<alphanumeric_char> ::= <alpha_char> | <dec_char> ;

<unicode_char> ::= ? "i ain't writing all that. happy for you tho. or sorry that happened" ? ;

Numeric

<bin_literal> ::= { "0b" (<bin_char> | "_")+ [<numeric_type>]} ;
<dec_literal> ::= { (<dec_char> | "_")+ [<numeric_type>]} ;
<hex_literal> ::= { "0x" (<hex_char> | "_")+ [<numeric_type>]} ;

<numeric_literal> ::= <bin_literal> | <dec_literal> | <hex_literal> ;

Numeric literals are composed of binary, decimal, and hexadecimal digits. Each digit may contain an arbitrary number of underscore characters in them and may be suffixed with a numeric type.

Binary literals are prefixed with 0b and hexadecimal literals are prefixed with 0x.

String

<string_literal> ::= { '"' (!'"' <unicode_char>)* '"' } | { "'" (!"'" <unicode_char>)* "'" };

String literals contain alphanumeric characters delimited by double or single quotes.

Boolean

<boolean_literal> ::= "true" | "false" ;

Boolean literals may be either "true" or "false".

Literal

<literal> ::= <numeric_literal> | <string_literal> | <boolean_literal> ;

Semantics

Numeric literals may contain arbitrary underscores in the same literal. Numeric literals may also be suffixed with the numeric type to constrain its type. If there is no type suffix, the type is inferred by the context. If a type cannot be inferred, it will default to a u256.

Both numeric and boolean literals are roughly translated to pushing the value onto the stack.

String literals represent string instantiation. String instantiation behaves as a packed u8 array instantiation.

const A = 1;
const B = 1u8;
const C = 0b11001100;
const D = 0xffFFff;
const E = true;
const F = "asdf";
const G = "💩";

Constants

Declaration

<constant_declaration> ::= "const" <ident> [<type_signature>] ;

Dependencies:

The <constant_declaration> is a "const" followed by an identifier and optional type signature.

Assignment

<constant_assignment> ::=
    <constant_declaration> "=" <expr> ;

Dependencies:

<expr>

The <constant_assignment> is a constant declaration followed by an assignment operator and either an expression or a comma separated list of identifiers delimited by parentheses followed by a code block.

Note: The expression must be a comptime expression, but the grammar should not constrain this.

Semantics

Constants must be resolvable at compile time either by assigning it a literal, another constant, or an expression that can be resolved at compile time.

The type of a constant will only be inferred if its assignment is a literal with a type annotation, another constant with a resolved type, or an expression with a resolved type such as a function call.

const A: u8 = 1;
const B = 1u8;
const C = B;
const D = a();
const E: u8 = b();

comptime fn a() -> u8 {
    1
}

fn b() -> T {
    2
}

Branching

<comptime_branch> ::=
    "comptime" (
        | <if_else_if_branch>
        | <if_match>
        | <match>
        | <ternary>
    ) ;

Dependencies:

The <comptime_branch> is a branch that is evaluated at compile time where only the truthy branch is compiled. It is defined as the "comptime" keyword followed by any of the branches.

Semantics

Since comptime must be resolved at compile time, the branching expression must be resolvable at compile time and of type bool. That is to say the expression must itself be a literal, constant, or another expression resolvable at compile time.

In the case of compile time branching, branches that are not matched will be removed from the code at compile time.

use std::{
    builtin::HardFork,
    op::{tstore, tload, sstore, sload},
};

const LOCK_SLOT: u256 = keccak256("mutex").into() - 1;

enum Lock {
    Locked,
    Unlocked,
}

fn reader() -> (u256 -> u256) {
    match @hardFork() {
        HardFork::Cancun => tload,
        _ => sload,
    }
}

fn writer() -> ((u256, u256) -> ()) {
    match @hardFork() {
        HardFork::Cancun => tstore,
        _ => sstore,
    }
}

fn nonreentrant(action: T -> U) {
    if reader()(LOCK_SLOT) matches Lock::Locked {
        revert();
    }
    writer()(LOCK_SLOT, Lock::Locked);
    let res = action();
    writer()(LOCK_SLOT, Lock::Unlocked);
    return res;
}

Functions

<comptime_function> ::= "comptime" <function_assignment> ;

Dependencies:

<function_assignment>

The <comptime_function> is a function that is evaluated at compile time. It is defined as the "comptime" keyword followed by a <function_assignment>.

Semantics

Since comptime must be resolved at compile time, the function must contain only expressions resolvable at compile time.

comptime fn a() -> u8 {
    1
}

comptime fn b(arg: u8) -> u8 {
    arg * 2
}

comptime fn c(arg: u8) -> u8 {
    a(b(arg))
}

const A = c(1);
const B = c(A);

Modules

Declaration

<module_declaration> ::= ["pub"] "mod" <ident> "{" [<module_devdoc>] (<stmt>)* "}" ;

Dependencies:

The <module_declaration> is composed of an optional "pub" prefix, the "mod" keyword followed by an identifier then the body of the module containing an optional devdoc, followed by a list of declarations and module items, delimited by curly braces.

Import

<module_import_item> ::=
    <ident> (
        "::" (
          | ("{" <module_import_item> ("," <module_import_item>)* [","] "}")
          | <module_import_item>
        )
    )* ;

<module_import> ::= ["pub"] "use" <ident> ["::" module_import_item] ;

Dependencies:

<ident>

The <module_import_item> is a recursive token, containing either another module import item or a comma separated list of module import items delimited by curly braces.

The <module_import> is an optional "pub" annotation followed by "use", the module name, then module import items.

Semantics

Namespace semantics in modules are defined in the namespace document.

Visibility semantics in modules are defined in the visibility document.

Modules can contain developer documentation, declarations, and assignments. If the module contains developer documentation, it must be the first item in the module. This is for readability.

Files are implicitly modules with a name equivalent to the file name.

todo: should this sanitize file names or require filenames to contain only valid ident chars?

Type, function, abi, and contract declarations must be assigned in the same module. However, trait are declared without assignment and submodules may be declared without a block only if there is a file with a matching name.

The super identifier represents the direct parent module of teh module in which it's invoked.

Syntax Showcase

type PrimitiveStruct = {
    a: u8,
    b: u8,
    c: u8,
};

type PackedStruct = packed {
    a: u8,
    b: u8,
};

type GenericStruct<T> = {
    a: T,
    b: T,
};

type PrimitiveTuple = (u8, u8, u8);

type PackedTuple = packed (u8, u8, u8);

type GenericTuple<T> = (T, T, T);

type Enum =
    | Option1
    | Option2;

type PrimitiveUnion =
    | Type1(u8)
    | Type2(MyPrimitiveStruct);

type GenericUnion<T> =
    | Some(T)
    | None;

type PrimitiveFn = u8 -> u8;

type PrimitiveMultiArgFn = (u8, u8) -> (u8, u8);

type GenericFn<T> = T -> T;

trait Add {
    fn add(lhs: Self, rhs: Self) -> Self;
}

impl PrimitiveStruct {
    fn default() -> Self {
        return Self { 0, 0, 0 };
    }
}

impl PrimitiveStruct: Add {
    fn add(lhs: Self, rhs: Self) -> Self {
        return Self {
            a: lhs.a + rhs.a,
            b: lhs.b + rhs.b,
            c: lhs.c + rhs.c,
        };
    }
}

mod module {
    mod nestedModule {
        type A = u256;
    }
    pub use nestedModule::A;
}
use module::A;

abi ERC165 {
    fn supportsInterface(interfaceId: b4) -> bool;
}

contract MyContract;

impl MyContract: ERC165 {
    fn supportsInterface(interfaceId: b4) -> bool {
        return true;
    }
}

// `MyContract` de-sugars roughly to:
fn main<Cd: ERC165>(calldata: Cd) {
    if callvalue() > 0 { revert(); }
    match calldata {
        ERC165::supportsInterface(interfaceId) => {
            return true;
        },
        _ => revert(),
    };
}

Full sugared ERC20 example:

abi ERC20 {
    fn balanceOf(owner: addr) -> u256;
    fn allowance(owner: addr, spender: addr) -> u256;
    fn totalSupply() -> u256;
    fn transfer(receiver: addr, amount: u256) -> bool;
    fn transferFrom(sender: addr, receiver: addr, amount: u256) -> bool;
    fn approve(spender: addr, amount: u256) -> bool;
}

contract MyContract {
    balances: HashMap<addr, u256>,
    allowances: HashMap<addr, HashMap<addr, u256>>,
    supply: u256,
}

impl MyContract: ERC20 {
    type Transfer = event {
        sender: indexed<addr>,
        receiver: indexed<addr>,
        amount: u256,
    }

    type Approval = event {
        owner: indexed<addr>,
        spender: indexed<addr>,
        amount: u256,
    }

    fn balanceOf(self: Self, owner: addr) -> u256 {
        return self.balances.get(owner);
    }

    fn allowance(self: Self, owner: addr, spender: addr) -> u256 {
        return self.allowances.get(owner).get(spender);
    }

    fn totalSupply() -> u256 {
        return self.supply;
    }

    fn transfer(mut self: Self, receiver: addr, amount: u256) -> bool {
        self.balances.set(caller(), storage.balances.get(caller()) - amount);
        self.balances.set(receiver, storage.balances.get(receiver) + amount);
        log(Self::Transfer { sender: caller(), receiver, amount });
        return true;
    }

    fn transferFrom(mut self: Self, sender: addr, receiver: addr, amount: u256) -> bool {
        if sender != caller() {
            let senderCallerAllowance = self.allowances.get(sender).get(caller());
            if senderCallerAllowance < max<u256>() {
                self.allowances.get(sender).set(caller(), senderCallerAllowance - amount);
            }
        }
        self.balances.set(sender, self.balances.get(sender) - amount);
        self.balances.set(receiver, self.balances.get(receiver) + amount);
        log(Self::Transfer { sender, receiver, amount });
        return true;
    }
    fn approve(mut self: Self, spender: addr, amount: u256) -> bool {
        self.allowances.get(caller()).set(spender, amount);
        log(Approval { owner: caller(), spender, amount });
        return true;
    }
}

Full de-sugared ERC20 example:

type Transfer = event {
    sender: indexed<addr>,
    receiver: indexed<addr>,
    amount: u256,
}

type Approval = event {
    owner: indexed<addr>,
    spender: indexed<addr>,
    amount: u256,
}

abi ERC20 {
    fn balanceOf(owner: addr) -> u256;
    fn allowance(owner: addr, spender: addr) -> u256;
    fn totalSupply() -> u256;
    fn transfer(receiver: addr, amount: u256) -> bool;
    fn transferFrom(sender: addr, receiver: addr, amount: u256) -> bool;
    fn approve(spender: addr, amount: u256) -> bool;
}

type Storage = {
    balances: HashMap<addr, u256>,
    allowances: HashMap<addr, HashMap<addr, u256>>,
    supply: u256,
}

const storage = Storage::default();

fn main<Cd: ERC20>(calldata: Cd) {
    if callvalue() > 0 { revert() };
    match calldata {
        ERC20::balanceOf(owner) => {
            return storage.balances.get(owner);
        },
        ERC20::allowance(owner, spender) => {
            return storage.allowances.get(owner).get(spender);
        },
        ERC20::totalSupply() => {
            return storage.supply;
        },
        ERC20::transfer(receiver, amount) => {
            storage.balances.set(caller(), storage.balances.get(caller()) - amount);
            storage.balances.set(receiver, storage.balances.get(receiver) + amount);
            log(Transfer { sender: caller(), receiver, amount });
            return true;
        },
        ERC20::transferFrom(sender, receiver, amount) => {
            if sender != caller() {
                let senderCallerAllowance = storage.allowances.get(sender).get(caller());
                if senderCallerAllowance < max<u256>() {
                    storage.allowances.get(sender).set(caller(),senderCallerAllowance - amount);
                }
            }
            storage.balances.set(sender, storage.balances.get(sender) - amount);
            storage.balances.set(receiver, storage.balances.get(receiver) + amount);
            log(Transfer { sender, receiver, amount });
            return true;
        },
        ERC20::approve(spender, amount) => {
            storage.allowances.get(caller()).set(spender, amount);
            log(Approval { owner: caller(), spender, amount });
            return true;
        },
        _ => revert(),
    };
}

Semantics

The semantics section contains semantics that are not defined under specific syntax constructs, but rather are more general features or features not in the frontend.

Codesize

This document details the different options for codesize optimization. Generally, codesize and runtime efficiency are inversely correlated. Developers will have granular control both in the compiler's configuration and in the language's syntax.

Inlining Heuristics

Function inlining is a direct tradeoff of codesize and runtime efficiency. Codesize optimization may be used for reducing deployment cost or for keeping the codesize below the EVM's codesize limit.

Scoring

Functions are assigned a score based on a combination of its projected bytecode size, projected number of calls, and an optional manually entered score.

Name	Score
Bytecode Size	`fn.bytecode.len()`
Call Count	`fn.calls()`
Manual Score	`u8`
Total	`(fn.bytecode.len() + 5 * fn.calls()) * man`

todo rewrite this based on gas estimations of each call

A compiler configuration can be specified for the threshold for function inlining.

todo decide on this

Analysis

The analysis for function inline scoring requires the traveral of a directed graph containing each function and other functions called within it. Traversal is depth first, as function inline scores are dependent on their bytecode size which is dependent on the inline scores of functions called within its body. Once a terminal function, a function with no internal function dependencies, is found, its inline score will be compared against the configuration threshold. If the score is greater than the threshold, it is to be inlined and a flag will be stored in the graph for future references.

Cycle detection will both prevent infinite loops in the compiler as well as detect recursion and corecursion. Recursive and corecursive functions will never be inlined for simplicity.

Dead Code Elimination

Eliminating dead code will cut codesize and improve the function inlining score, as number of calls and projected codesize of each function are both factors in the function inline score.

Syntax Modifications

todo

Namespaces

A namespace contains valid identifiers for items that may be used.

todo

Scoping

Items are brought into scope by import or declaration.

Module

The module scope contains items explicitly imported from another scope or explicitly declared in the current module scope. Items may be accessed directly by their identifier with no other annotations.

Files are implicitly modules.

mod moduleA {
    // `TypeA` declared.
    type TypeA = u8;
    // `TypeA` may be accessed as follows:
    const CONST_A: TypeA = 0u8;
}

mod moduleB {
    // import `TypeA` into the local module scope
    use super::moduleA::TypeA;
    // `TypeA` may now be accessed as follows:
    const CONST_A: TypeA = 0u8;
}

mod moduleC {
    // publicly import `TypeA` into the local module scope. "pub" enables exporting.
    pub use super::moduleA::TypeA;
}

mod moduleD {
    // publicly import `moduleA` into the local module scope. "pub" enables exporting.
    pub use super::moduleA;
}

mod moduleF {
    // `TypeA` may be accessed in one of the following ways.
    const CONST_A: super::moduleA::TypeA = 0u8;
    const CONST_B: super::moduleC::TypeA = 0u8;
    const CONST_C: super::moduleD::moduleA::TypeA = 0u8;
}

Implementation

The implementation block scope contains items explicitly imported from another scope or explicitly declared in the current implementation block scope. Items may be accessed either directly or under the Self namespace.

type MyStruct<T> = { inner: T };
type MyError = Overflow | Underflow;

trait TryPlusOne: Add {
    type Error;

    fn tryPlusOne(self: Self) -> Result<Self, Self::Error>;
}

impl MyStruct<T> {
    fn new(inner: T) -> Self {
        return Self { inner: T };
    }
}

impl MyStruct<T>: Add {
    fn add(lhs: Self, rhs: Self) -> Self {
        return Self { inner: lhs.inner + rhs.inner };
    }
}

impl MyStruct<T: Add>: TryPlusOne {
    type Error = MyError;

    fn tryPlusOne(self: Self) -> Result<Self, Self::Error> {
        if self.inner > max<T>() - 1 {
            return Result::Err(Error::Overflow);
        }
        return Add::add(self, Self { inner: 1 });
    }
}

Function

The function scope implicitly import items from parent scopes up to the parent module. Items may be explicitly declared or imported from external modules.

mod moduleA {
    const CONST_A = 0u8;
    const CONST_B = 1u8;
    const CONST_C = 2u8;
}

use moduleA::CONST_A;

const CONST_D = 3u8;

fn func() -> u8 {
    use moduleA::CONST_B;

    fn innerFunc() -> u8 {
        return CONST_A + CONST_B + moduleA::CONST_C + CONST_D;
    }

    return innerFunc();
}

Blocks

Code blocks, branch blocks, loop blocks, and match blocks implicitly import items from the parent scopes up until the parent module. Items may be imported from external module explicitly and items may be defined in each.

Visibility

todo

Inline Assembly

Opcodes

<opcode> ::=
    | "stop" | "add" | "mul" | "sub" | "div" | "sdiv" | "mod" | "smod" | "addmod" | "mulmod" | "exp"
    | "signextend" | "lt" | "gt" | "slt" | "sgt" | "eq" | "iszero" | "and" | "or" | "xor" | "not"
    | "byte" | "shl" | "shr" | "sar" | "sha3" | "address" | "balance" | "origin" | "caller"
    | "callvalue" | "calldataload" | "calldatasize" | "calldatacopy" | "codesize" | "codecopy"
    | "gasprice" | "extcodesize" | "extcodecopy" | "returndatasize" | "returndatacopy"
    | "extcodehash" | "blockhash" | "coinbase" | "timestamp" | "number" | "prevrandao" | "gaslimit"
    | "chainid" | "selfbalance" | "basefee" | "pop" | "mload" | "mstore" | "mstore8" | "sload"
    | "sstore" | "jump" | "jumpi" | "pc" | "msize" | "gas" | "jumpdest" | "push0" | "dup1" | "dup2"
    | "dup3" | "dup4" | "dup5" | "dup6" | "dup7" | "dup8" | "dup9" | "dup10" | "dup11" | "dup12"
    | "dup13" | "dup14" | "dup15" | "dup16" | "swap1" | "swap2" | "swap3" | "swap4" | "swap5"
    | "swap6" | "swap7" | "swap8" | "swap9" | "swap10" | "swap11" | "swap12" | "swap13" | "swap14"
    | "swap15" | "swap16" | "log0" | "log1" | "log2" | "log3" | "log4" | "create" | "call"
    | "callcode" | "return" | "delegatecall" | "create2" | "staticcall" | "revert" | "invalid"
    | "selfdestruct" | <numeric_literal> | <ident> ;

Dependencies:

The <opcode> is one of the mnemonic EVM instructions, or a numeric literal, or an identifier.

Inline Assembly Block

<assembly_output> ::= <ident> | "_" ;

<inline_assembly> ::=
    "asm"
    "(" [<expr> ("," <expr>)* [","]] ")"
    "->" "(" [<assembly_output> ("," <assembly_output>)* [","]] ")"
    "{" (<opcode>)* "}"

Dependencies:

<expr>

The <inline_assembly> consists of the "asm" keyword, followed by an optional comma separated, parenthesis delimited list of argument expressions, then an arrow, an optional comma separated, parenthesis delimited list of return identifiers, and finally a code block containing only the <opcodes>.

Semantics

Arguments are ordered such that the state of the stack at the start of the block, top to bottom, is the list of arguments, left to right. Identifiers in the output list are ordered such that the state of the stack at the end of the assembly block, top to bottom, is the list of outputs, left to right.

Note that if the input arguments contain local variables, the stack scheduling required to construct the pre-assembly stack state may be unprofitable in cases with small assembly code blocks.

asm (1, 2, 3) -> (a) {
    // state:   // [1, 2, 3]
    add         // [3, 3]
    mul         // [9]
}

Inside the assembly block, numeric literals are implicitly converted into pushN instructions. All literals are put into the smallest N for pushN by bits, however, this is also accounting for leading zeros. For example, 0x0000 would become push2 0000 to allow for bytecode padding. Identifiers may be variables, constants, or ad hoc opcodes. When identifiers are variables, they are scheduled in the stack. When identifiers are constants, they are replaced with their push instructions just as numeric literals are. When identifiers are ad hoc opcodes, they are replaced with their respective byte(s).

Built-In

Built-in functionality refers to functionality that is only available during the compiler runtime and not the EVM runtime that is otherwise inaccessible through the language's syntax.

Macros contain their own syntax and semantics, however, comptime functionality and built-in assistants cover most of the use cases for macros without leaving the language's native syntax.

Types

`PrimitiveType`

type PrimitiveType;

`StructType`

type StructType;

`EnumType`

type EnumType;

`UnionType`

type UnionType;

`FunctionType`

type FunctionType;

`TypeInfo`

type TypeInfo =
    | Primitive(PrimitiveType)
    | Struct(StructType)
    | Enum(EnumType)
    | Union(UnionType)
    | Function(FunctionType);

`HardFork`

type HardFork =
    | Frontier
    | Homestead
    | Dao
    | Tangerine
    | SpuriousDragon
    | Byzantium
    | Constantinople
    | Petersburg
    | Istanbul
    | MuirGlacier
    | Berlin
    | London
    | ArrowGlacier
    | GrayGlacier
    | Paris
    | Shanghai
    | Cancun;

Functions

`@typeInfo`

@typeInfo(typeSignature) -> TypeInfo;

The typeInfo function takes a single <type_signature> as an argument and returns a union of types, TypeInfo.

`@bitsize`

@bitsize(typeSignature) -> u256;

The bitsize function takes a single <type_signature> as an argument and returns an integer indicating the bitsize of the underlying type.

`@fields`

@fields(structType) -> [T, N];

The fields function takes a single StructType as an argument and returns an array of type signatures of length N where N is the number of fields in the struct.

`@compilerError`

@compilerError(errorMessage);

The compilerError function takes a single string as an argument and throws an error at compile time with the provided message.

`@hardFork`

@hardFork() -> HardFork;

The hardFork function returns an enumeration of the built in HardFork type. This is derived from the compiler configuration.

`@bytecode`

@bytecode(T -> U) -> Bytes;

The bytecode function takes an arbitrary function and returns its bytecode in Bytes.