The Edge Specification
All Edge, No Drag.
This document defines Edge, a domain specific language for the Ethereum Virtual Machine (EVM).
Edge is a high level, strongly statically typed, multi-paradigm language. It provides:
- A thin layer of abstraction over the EVM's instruction set architecture (ISA).
- An extensible polymorphic type system with subtyping.
- First class support for modules and code reuse.
- Compile time code execution to fine-tune the compiler's input.
Edge's syntax is similar to Rust and Zig where intuitive, however, the language is not designed to be a general purpose language with EVM features as an afterthought. Rather, it is designed to extend the EVM instruction set with a reasonable type system and syntax sugar over universally understood programming constructs.
Notation
This specification uses a grammar similar to Extended Backus-Naur Form (EBNF) with the following rules.
- Non-terminal tokens are wrapped in angle brackets
<ident>
. - Terminal tokens are wrapped in double quotes
"const"
. - Optional items are wrapped in brackets
["mut"]
. - Sequences of zero or more items are wrapped in parenthesis and suffixed with a star
("," <ident>)*
. - Sequences of one or more items are wrapped in parenthesis and suffixed with a plus
(<ident>)+
.
In contrast to EBNF, we define a rule that all items are non-atomic, that is to say arbitrary
whitespace characters \n
, \t
, and \r
may surround all tokens unless wrapped with curly braces
{ "0x" (<hex_digit>)* }
.
Generally, we use long-formed names for clarity of each token, however, common tokens are abbreviated and defined as follows:
Disambiguation
This section contains context that be required throughout the specification.
Return vs Return™️
The word "return" refers to two different behaviors, returned values from expressions and the
halting return
opcode.
When "return" is used, this refers to the values returned from expressions, that is to say the values left on the stack, if any.
When "halting return" is used, this refers to the EVM opcode return
that halts execution and
returns a value from a slice of memory to the caller of the current execution context.
Syntax
Conceptually, all EVM contracts are single-entry point executables and at compile time, Edge programs are no different.
Other languages have used primarily the contract-is-an-object paradigm, mapping fields to storage layouts and methods to "external functions" that may read and write the storage. Inheritance enables interface constraints, code reuse, and a reasonable model for message passing that relates to the EVM external call model.
However, this is limited in scope. Conceptually, the contract object paradigm groups stateful data and functionality, limiting the deployability to the product type. Extending the deployability to arbitrary data types allows for contracts to be functions, type unions, product types, and more. While most of these are not particularly useful, this simplifies the type system as well as opens the design space to new contract paradigms.
The core syntax of Edge is derived from commonly used patterns in modern programming. Functions, branches, and loops are largely intuitive for engineers with experience in C, Rust, Javascript, etc. Parametric polymorphism uses syntax similar to Rust and Typescript. Compiler built-in functions and "comptime" constructs follow the syntax of Zig.
Comments
<line_comment> ::= "//" (!"\n" <ascii_char>)* "\n" ;
<block_comment> ::= "/*" (!"*/" <ascii_char>)* "*/" ;
<item_devdoc> ::= "///" (!"\n" <ascii_char>)* "\n" ;
<module_devdoc> ::= "//!" (!"\n" <ascii_char>)* "\n" ;
The <line_comment>
is a single line comment, ignored by the parser.
The <block_comment>
is a multi line comment, ignored by the parser.
The <item_devdoc>
is a developer documentation comment, treated as documentation for the
immediately following item.
The <module_devdoc>
is a developer documentation comment, treated as documentation for the module
in which it is defined.
Developer documentation comments are treated as Github-flavored markdown.
Identifiers
<ident> ::= (<alpha_char> | "_") (<alpha_char> | <dec_digit> | "_")* ;
Dependencies:
The <ident>
is a C-style identifier, beginning with an alphabetic character or underscore,
followed by zero or more alphanumeric or underscore characters.
Data Locations
<storage_pointer> ::= "&s" ;
<transient_storage_pointer> ::= "&t" ;
<memory_pointer> ::= "&m" ;
<calldata_pointer> ::= "&cd" ;
<returndata_pointer> ::= "&rd" ;
<internal_code_pointer> ::= "&ic" ;
<external_code_pointer> ::= "&ec" ;
<data_location> ::=
| <storage_pointer>
| <transient_storage_pointer>
| <memory_pointer>
| <calldata_pointer>
| <returndata_pointer>
| <internal_code_pointer>
| <external_code_pointer> ;
The <location>
is a data location annotation indicating to which data location a pointer's data
exists. We define seven distinct annotations for data location pointers. This is a divergence from
general purpose programming languages to more accurately represent the EVM execution environment.
&s
persistent storage&t
transient storage&m
memory&cd
calldata&rd
returndata&ic
internal (local) code&ec
external code
Semantics
Data locations can be grouped into two broad categories, buffers and maps.
Maps
Persistent and transient storage are part of the map category, 256 bit keys map to 256 bit values. Both may be written or read one word at a time.
Buffers
Memory, calldata, returndata, internal code, and external code are all linear data buffers. All can be either read to the stack or copied into memory, but only memory can be written or copied to.
Name | Read to Stack | Copy to Memory | Write |
---|---|---|---|
memory | true | true | true |
calldata | true | true | false |
returndata | false | true | false |
internal code | false | true | false |
external code | false | true | false |
Transitions
Transitioning from map to memory buffer is performed by loading each element from the map to the stack and storing each stack item in memory O(N).
Transitioning from memory buffer to a map is performed by loading each element from memory to the stack and storing each stack item in the map O(N).
Transitioning from any other buffer to a map is performed by copying the buffer's data into memory then transitioning the data from memory into the map O(N+1).
Pointer Bit Sizes
Pointers to different data locations consist of different sizes based on the properties of that data location. In depth semantics of each data location are specified in the type system documents.
Location | Bit Size | Reason |
---|---|---|
persistent storage | 256 | Storage is 256 bit key value hashmap |
transient storage | 256 | Transient storage is 256 bit key value hashmap |
memory | 32 | Theoretical maximum memory size does not grow to 0xffffffff |
calldata | 32 | Theoretical maximum calldata size does not grow to 0xffffffff |
returndata | 32 | Maximum returndata size is equal to maximum memory size |
internal code | 16 | Code size is less than 0xffff |
external code | 176 | Contains 160 bit address and 16 bit code pointer |
Expressions
<binary_operation> ::= <expr> <binary_operator> <expr> ;
<unary_operation> ::= <unary_operator> <expr> ;
<expr> ::=
| <array_instantiation>
| <array_element_access>
| <struct_instantiation>
| <tuple_instantiation>
| <struct_field_access>
| <tuple_field_access>
| <union_instantiation>
| <pattern_match>
| <arrow_function>
| <function_call>
| <binary_operation>
| <unary_operation>
| <ternary>
| <literal>
| <ident>
| ("(" <expr> ")");
Dependencies:
<binary_operator>
<unary_operator>
<array_instantiation>
<array_element_access>
<struct_instantiation>
<tuple_instantiation>
<struct_field_access>
<tuple_field_access>
<union_instantiation>
<pattern_match>
<arrow_function>
<function_call>
<ternary>
<literal>
<ident>
The <expr>
is defined as an item that returns1 a value.
The <binary_operation>
is an expression composed of two sub-expressions with an infixed binary
operator. Semantics are beyond the scope of the syntax specification, see
operator precedence semantics for more.
The <unary_operation>
is an expression composed of a prefixed unary operator and a sub-expression.
Statements
<stmt> ::=
| <variable_declaration>
| <variable_assignment>
| <type_declaration>
| <type_assignment>
| <trait_declaration>
| <impl_block>
| <function_declaration>
| <function_assignment>
| <abi_declaration>
| <contract_declaration>
| <contract_impl_block>
| <core_loop>
| <for_loop>
| <while_loop>
| <do_while_loop>
| <code_block>
| <if_else_if_branch>
| <if_match_branch>
| <match>
| <constant_assignment>
| <comptime_branch>
| <comptime_function>
| <module_declaration>
| <module_import> ;
Dependencies:
<variable_declaration>
<variable_assignment>
<type_declaration>
<type_assignment>
<trait_declaration>
<impl_block>
<function_declaration>
<function_assignment>
<abi_declaration>
<contract_declaration>
<contract_impl_block>
<core_loop>
<for_loop>
<while_loop>
<do_while_loop>
<code_block>
<if_else_if_branch>
<if_match_branch>
<match>
<constant_assignment>
<comptime_branch>
<comptime_function>
<module_declaration>
<module_import>
The <stmt>
is similar to an expression, however the item does not return1 a value.
Variables
Declaration
<variable_declaration> ::= "let" <ident> [":" <type_signature>] ;
Dependencies:
The <variable_declaration>
marks the declaration of a variable, it may optionally be assigned at
the time of declaration.
Assignment
<variable_assignment> ::= <ident> "=" <expr> ;
Dependencies:
The <variable_assignment>
is the assignment of a variable. Its identifier is assigned the returned
value of an expression using the assignment operator.
Type System
The type system builds on core primitive types inherent to the EVM with abstract data types for parametric polymorphism, nominative subtyping, and compile time monomorphization.
- Primitive Types
- Type Assignment
- Array Types
- Product Types
- Sum Types
- Generics
- Trait Constraints
- Implementation
- Function Types
- Event Types
- Application Binary Interface
- Contract Objects
Primitive Types
<integer_size> ::= "8" | "16" | "24" | "32" | "40" | "48" | "56" | "64" | "72" | "80" | "88" | "96"
| "104" | "112" | "120" | "128" | "136" | "144" | "152" | "160" | "168" | "176" | "184" | "192"
| "200" | "208" | "216" | "224" | "232" | "240" | "248" | "256" ;
<fixed_bytes_size> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12"
| "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23" | "24" | "25"
| "26" | "27" | "28" | "29" | "30" | "31" | "32" ;
<signed_integer> ::= {"i" <integer_size>} ;
<unsigned_integer> ::= {"u" <integer_size>} ;
<fixed_bytes> ::= {"b" <fixed_bytes_size>} ;
<address> ::= "addr" ;
<boolean> ::= "bool" ;
<bit> ::= "bit" ;
<pointer> ::= <data_location> "ptr" ;
<numeric_type> ::= <signed_integer> | <unsigned_integer> | <fixed_bytes> | <address> ;
<primitive_data_type> ::=
| <numeric_type>
| <boolean>
| <pointer> ;
Dependencies:
The <primitive_data_type>
contains signed and unsigned integers, boolean, address, and fixed bytes
types. Additionally, we introduce a pointer type that must be prefixed with a data location
annotation.
Examples
u8
u256
i8
i256
b4
b32
addr
bool
bit
&s ptr
Semantics
Integers occupy the number of bits indicated by their size. Fixed bytes types occupy the number of
bytes indicated by their size, or size * 8
bits. Address occupies 160 bits. Booleans occupy eight
bits. Bit occupies a single bit. Pointers occupy a number of bits equal to their data location annotation.
Pointers can point to both primitive and complex data types.
Type Assignment
Signature
<type_signature> ::=
| <array_signature>
| <struct_signature>
| <tuple_signature>
| <union_signature>
| <function_signature>
| <ident>
| (<ident> [<type_parameters>]) ;
Dependencies:
<array_signature>
<struct_signature>
<tuple_signature>
<union_signature>
<function_signature>
<ident>
<type_parameters>
Type assignments assign identifiers to type signatures. It may have a struct, tuple, union, or function signature as well as an identifier followed by optional type parameters.
Declaration
<type_declaration> ::= ["pub"] "type" <ident> [<type_parameters>]
Dependencies:
The <type_declaration>
is prefixed with "type" and contains an identifier with optional type
parameters.
Assignment
<type_assignment> ::= <type_declaration> "=" <type_signature> ;
The <type_assignment>
is a type declaration followed by a type signature separated by an
assignment operator.
Semantics
Type assignment entails creating an identifier associated with a certain data structure or existing type. If the assignment is to an existing data type, it contains the same fields or members, if any, and exposes the same associated items, if any.
type MyCustomType = packed (u8, u8, u8);
type MyCustomAlias = MyCustomType;
fn increment(rgb: MyCustomType) -> MyCustomType {
return (rgb.0 + 1, rgb.1 + 1, rgb.2 + 1);
}
increment(MyCustomType(1, 2, 3));
increment(MyCustomAlias(1, 2, 3));
A way to create a wrapper around an existing type without exposing the existing type's external interface, the type may be wrapped in parenthesis, creating a "tuple" of one element, which comes without overhead.
type MyCustomType = packed (u8, u8, u8);
type MyNewCustomType = (MyCustomType);
Array Types
The array type is a list of elements of a single type.
Signature
<array_signature> ::= ["packed"] "[" <type_signature> ";" <expr> "]" ;
Dependencies:
The <array_signature>
consists of an optional "packed" keyword prefix to a type signature and
expression separated by a colon, delimited by brackets.
Instantiation
<array_instantiation> ::= [<data_location>] "[" <expr> ("," <expr>)* [","] "]" ;
Dependencies:
The <array_instantiation>
is an optional data location annotation followed by a comma separated
list of expressions delimited by brackets.
Element Access
<array_element_access> ::= <ident> "[" <expr> [":" <expr>] "]" ;
Dependencies:
The <array_element_access>
is the array's identifier followed a bracket-delimited expression and
optionally a second expression, colon separated.
Examples
type TwoElementIntegerArray = [u8; 2];
type TwoElementPackedIntegerArray = packed [u8; 2];
const arr: TwoElementIntegerArray = [1, 2];
const elem: u8 = arr[0];
Semantics
Instantiation
Instantiation of a fixed-length array stores one element per 32 byte word in either data location. The only difference between data locations in terms of instantiation behavior is if all elements of the array are populated with constant values and the array belongs in memory, a performance optimization may include code-copying an instance of the constant array from the bytecode into memory.
Access
Array element access depends on whether the second expression is included. If a single expression is inside the access brackets, the single element is returned from the array. If a second expression follows the first with a colon in between, a pointer of the same data location is returned. The type of the new array pointer is the same type but the size is now the size of the second expression's value minus the first expression's value. If the index values are known at compile time and are greater than or equal to the array's length, a compiler error is thrown, else a bounds check against the array's length is added into the runtime bytecode.
Product Type
The product type is a compound type composed of none or more internal types.
Signature
<struct_field_signature> ::= <ident> ":" <type_signature> ;
<struct_signature> ::=
["packed"] "{"
[<struct_field_signature> ("," <struct_field_signature>)* [","]]
"}" ;
<tuple_signature> ::= ["packed"] "(" <type_signature> ("," <type_signature>)* [","] ")" ;
Dependencies:
Instantiation
<struct_field_instantiation> ::= <ident> ":" <expr> ;
<struct_instantiation> ::=
[<data_location>] <struct_signature> "{"
[<struct_field_instantiation> ("," <struct_field_instantiation>)* [","]]
"}" ;
<tuple_instantiation> ::= [<data_location>] <ident> "(" [<expr> ("," <expr>)* [","]] ")" ;
Dependencies:
The <struct_instantiation>
is an instantiation, or creation, of a struct. It may optionally
include a data location annotation, however the semantic rules for this are in the
data location semantic rules. It is instantiated
by the struct identifier followed by a comma separated list of field name and value pairs delimited
by curly braces.
The <tuple_instantiation>
is an instantiation, or creation, of a tuple. It may optionally include
a data location annotation, however the semantic rules for this are in the
data location semantic rules. It is instantiated
by a comma separated list of expressions delimited by parenthesis.
Field Access
<struct_field_access> ::= <ident> "." <ident> ;
<tuple_field_access> ::= <ident> "." <dec_char> ;
Dependencies:
The <struct_field_access>
is written as the struct's identifier followed by the field's identifier
separated by a period.
The <tuple_field_access>
is written as the tuple's identifier followed by the field's index
separated by a period.
Examples
type PrimitiveStruct = {
a: u8,
b: u8,
c: u8,
};
const primitiveStruct: PrimitiveStruct = PrimitiveStruct { a: 1, b: 2, c: 3 };
const a = primitiveStruct.a;
type PackedTuple = packed (u8, u8, u8);
const packedTuple: PackedTuple = (1, 2, 3);
const one = packedTuple.0;
Semantics
The struct field signature maps a type identifier to a type signature. The field may be accessed by the struct's identifier and field identifier separated by a dot.
Prefixing the signature with the "packed" keyword will pack the fields by their bitsize, otherwise each field is padded to its own 256 bit word.
type Rgb = packed { r: u8, g: u8, b: u8 };
let rgb = Rgb { r: 1, g: 2, b: 3 };
// rbg = 0x010203
Instantiation depends on the data location. Structs that can fit into a single word, either a single field struct or a packed struct with a bitsize sum less than or equal to 256, sit on the stack by default. Instantiating a struct in memory requires the memory data location annotation. If a struct that does not fit into a single word does not have a data location annotation, a compiler error is thrown.
Stack struct instantiation consists of optionally bitpacking fields and leaving the struct on the stack. Memory instantiation consists of allocating new memory, optionally bitpacking fields, storing the struct in memory, and leaving the pointer to it on the stack.
type MemoryRgb = { r: u8, g: u8, b: u8 };
let memoryRgb = MemoryRgb{ r: 1, g: 2, b: 3 };
// ptr = ..
// mstore(ptr, 1)
// mstore(add(32, ptr), 2)
// mstore(add(64, ptr), 3)
Persistent and transient storage structs must be instantiated at the file level. If anything except
zero values are assigned, storage writes will be injected into the initcode to be run on deployment.
A reasonable convention for creating a storage layout without the contract object abstraction would
be to create a Storage
type which is a struct, mapping identifiers to storage slots. Nested
structs will also allow granular control over which variables get packed.
type Storage = {
a: u8,
b: u8,
c: packed {
a: u8,
b: u8
}
}
const storage = @default<Storage>();
fn main() {
storage.a = 1; // sstore(0, 1)
storage.b = 2; // sstore(1, 2)
storage.c.a = 3; // ca = shl(8, 3)
storage.c.b = 4; // sstore(2, or(ca, 4))
}
Packing rules for buffer locations is to pack everything exactly by its bit length. Packing rules for map locations is to right-align the first field, for each subsequent field, if its bitsize fits into the same word as the previous, it is left-shifted to the first available bits, otherwise, if the bitsize would overflow, it becomes a new word.
type Storage = {
a: u128,
b: u8,
c: addr,
d: u256
}
const storage = Storage {
a: 1,
b: 2,
c: 0x3,
d: 4,
};
Slot | Value |
---|---|
0x00 | 0x0000000000000000000000000000000200000000000000000000000000000001 |
0x01 | 0x0000000000000000000000000000000000000000000000000000000000000003 |
0x02 | 0x0000000000000000000000000000000000000000000000000000000000000004 |
Sum Types
The sum type is a union of multiple types where the data type represents one of the inner types.
Signature
<union_member_signature> ::= <ident> ["(" <type_signature> ")"] ;
<union_signature> ::= ["|"] <union_member_signature> ("|" <union_member_signature>)* ;
Dependencies:
The <union_declaration>
is a declaration of a sum type, or data structure that contains one of its
internally declared members. Each <union_member>
is named by an identifier, optionally followed by
a number of comma separated types delimited by parenthesis.
Instantiation
<union_instantiation> ::= <ident> "::" <ident> "(" [<expr> ("," <expr>)* [","]] ")" ;
Dependencies:
The <union_instantiation>
instantiates, or creates, the sum type. This consists of the union's
identifier, followed by the member's identifier, followed by an optional comma separated list of
expressions.
Behavior of instantiation is defined in the data location rule.
Union Pattern
<union_pattern> ::= <ident> "::" <ident> ["(" <ident> ("," <ident>)* [","] ")"];
Dependencies:
The <union_pattern>
is a pattern consisting of the union's name and a member's name separated by a
double colon.
Pattern Match
<pattern_match> ::= <ident> "matches" <union_pattern> ;
Dependencies:
Semantics
All unions have a Unions where no member has its own internal type is effectively an enumeration over integers.
type Mutex = Locked | Unlocked;
// Mutex::Locked == 0
// Mutex::Unlocked == 1
Unions where any members have an internal type become proper type unions. The only case in which a union can exist on the stack rather than another data location is if the largest of the internal types has a bitsize of 248 or less. If any member's internal type is greater than 248, a data location must be specified.
type StackUnion = A(u8) | B(u248);
type MemoryUnion = A(u256) | B | C(u8);
A union pattern consists of its identifier and the member identifier separated by colons. This pattern may be used both in match statements and if statements.
type Option<T> = None | Some(T);
impl Option<T> {
fn unwrap(self) -> T {
match self {
Option::Some(inner) => return inner,
Option::None => revert(),
};
}
fn unwrapOr(self, default: T) -> T {
let mut value = defaut;
if self matches Option::Some(inner) {
value = inner;
}
return value;
}
}
Generics
Generics are polymorphic types enabling function and type reuse across different types.
Type Parameters
<type_parameter_single> ::= <ident> <trait_constraints> ;
<type_parameters> ::= "<" <type_parameter_single> ("," <type_parameter_single>)* [","] ">" ;
Dependencies:
The <type_parameter_single>
is an individual type parameter for parametric polymorphic types and
functions. We define this as a type name optionally followed by a trait constraint.
The <type_parameters>
is a comma separated list of individual type parameters delimited by angle
brackets.
Semantics
Generics are resolved at compile time through monomorphization. Generic functions and data types are monomorphized into distinct unique functions and data types. Function duplication can become problematic due to the EVM bytecode size limit, so a series of steps will be taken to allow for granular control over bytecode size. Those semantics are defined in the Codesize document.
Traits
Traits are interface-like declarations that constrain generic types to implement specific methods or contain specific properties.
Declaration
<trait_declaration> ::=
["pub"] "trait" <ident> [<type_parameters>] [<trait_constraints>] "{"
(
| <type_declaration>
| <type_assignment>
| <constant_declaration>
| <constant_assignment>
| <function_declaration>
| <function_assignment>
)*
"}" ;
Dependencies:
<ident>
<type_parameters>
<type_declaration
<type_assignment>
<constant_declaration>
<constant_assignment>
<function_declaration>
<function_assignment>
The <trait_declaration>
is a declaration of a set of associated types, constants, and functions
that may itself take type parameters and may be constrained to a super type. Semantics of the
declaration are listed under trait solving rules.
Constraints
<trait_constraints> ::= ":" <ident> ("&" <ident>)* ;
Dependencies:
The <trait_constraints>
contains a colon followed by an ampersand separated list of identifiers of
implemented traits. The ampersand is meant to indicate that all of the trait identifiers are
implemented for the type.
Semantics
Traits can be defined with associated types, constants, and functions. The trait declaration itself allows for optional assignment for each item as a default. Any declarations in the trait that are not assigned in the trait declaration must be assigned in the implementation of the trait for the data type. Additionally, any assignments in the trait declaration can be overridden in the trait implementation.
While types can depend on trait constraints, traits can also depend on other trait constraints. These assert that types that implement a given trait also implement its "super traits".
Solving
todo
Implementation
Implementation blocks enable method-call syntax.
Implementation Block
<impl_block> ::=
"impl" <ident> [<type_parameters>] [":" <ident> [<type_parameters>]] "{"
(
| <function_assignment>
| <constant_assignment>
| <type_assignment>
)*
"}"
Dependencies:
The <impl_block>
is the implementation block for a given type. The type identifier is optionally
followed by type parameters then optionally followed by a "for" clause. The "for" clause contains
trait identifiers and optional type parameters for the traits. Followed by this is a list of
function, constant, and type assignments delimited by curly braces.
Semantics
Associated functions, constants, and types are defined for a given type. If the type contains any generics in any of its internal assignments, the type parameters must be brought into scope by annotating them directly following the type's identifier.
If the impl block is to satisfy a trait's interface, the type's identifier and optional type parameters are followed by the trait's identifier and optional type parameters. In this case, only associated functions, constants, and types that are declared in the trait's declaration may be defined in the impl block. Additionally, all declarations in a trait's declaration that are not assigned in the trait's declaration must be assigned in the impl block for the given data type.
Function Types
The function type is a type composed of input and output types.
Signature
<function_signature> ::= <type_signature> "->" <type_signature> ;
Dependencies:
The <function_signature>
consists of an input type signature and an output type signature,
separated by an arrow.
Note:
<type_signature>
also contains a tuple signature, therefore a function with multiple inputs and outputs is implicitly operating on a tuple.
Declaration
<function_declaration> ::=
"fn" <ident> "("
[(<ident> ":" <type_signature>) ("," <ident> ":" <type_signature>)* [","]]
")" ["->" "(" <type_signature> ("," <type_signature>)* [","] ")"] ;
Dependencies:
Assignment
<function_assignment> ::= <function_declaration> <code_block> ;
Dependencies:
The <function_assignment>
is defined as the "fn" keyword followed by its identifier, followed
by optional comma separated pairs of identifiers and type signatures, delimited by parenthesis, then
optionally followed by an arrow and a list of comma separated return types signatures delimited by
parenthesis, then finally the code block of the function body.
Arrow Functions
<arrow_function> ::= (<ident> | ("(" <ident> ("," <ident>)* [","] ")")) "=>" <code_block> ;
Dependencies:
The <arrow_function>
is defined as either a single identifier or a comma separated,
parenthesis delimited list of identifiers, followed by the "=>" bigram, followed by a code block.
Call
<function_call> ::= <ident> "(" [<expr> ("," <expr>) [","]] ")" ;
Dependencies:
The <function_call>
is an identifier followed by a comma separated list of expressions delimited
by parenthesis.
Semantics
todo
Event Types
The event type is a custom type to be logged.
Signature
<event_field_signature> ::= <ident> ":" ( "indexed" "<" <type_signature> ">" | <type_signature> ) ;
<event_signature> ::=
["anon"] "event" "{" [<event_field_signature> ("," <event_field_signature>)* [","]] "}" ;
Dependencies:
The <event_field_signature>
is an optional "anon" word, followed by "event", followed by either
a type signature or a type signature delimited by angle brackets and prefixed with "indexed".
Semantics
The event type is assigned an identifier the same way other types are assigned an identifier. The EVM allows up to four topics, therefore if "anon" is used, the event may contain four "indexed" values, else the event may contain three. If the event is not anonymous, the first topic follows Solidity's ABI specification. That is to say the first topic is the keccak256 hash digest of the event identifier, followed by a comma separated list of the event type names with no whitespace, delimited by parenthesis.
ABI
The application binary interface is both a construct to generate a JSON ABI by the compiler as well as a subtyping construct for contract objects.
Declaration
<abi_declaration> ::=
"abi" <ident> [":" <ident> ("&" <ident>)*] "{"
(
["mut"] <function_declaration> ";"
)*
"}" ;
Dependencies:
The <abi_declaration>
is prefixed with "abi", followed by its identifier, then an optional colon
and list of ampersand separated identifiers, and finally a series of zero or more function
declarations optionally prefixed by "mut" and delimited by curly braces.
Semantics
The optional "mut" keyword indicates whether the function will mutate the state of the smart
contract or the EVM. This allows contracts to determine whether to use the call
or staticcall
instruction to interface with a target conforming to the given ABI.
The optional ampersand separated list of identifiers represents other ABI identifiers to enable ABI subtyping.
todo: revisit this. do traits satisfy this instead?
Contract Objects
Contract objects serve as an object-like interface to contract constructs.
Declaration
<contract_field_declaration> ::= <ident> ":" <type_signature> ;
<contract_declaration> ::=
"contract" <ident> "{"
[<contract_field_declaration> ("," <contract_field_declaration>)* [","]]
"}" ;
Dependencies:
The <contract_field_declaration>
is an identifier and type signature, separated by a colon.
The <contract_declaration>
is the contract keyword, followed by its identifier, followed by a
curly brace delimited, comma separated list of field declarations.
Implementation
<contract_impl_block> ::=
"impl" <ident> [":" <ident>] "{"
(["ext"] ["mut"] <function_declaration>)*
"}"
Dependencies:
The <contract_impl_block>
is composed of the "impl" keyword, followed by its identifier,
optionally followed by a colon and abi identifier, followed by list of function declarations,
optionally "ext" and/or "mut", delimited by curly braces.
Semantics
The contract object desugars to a single main function and storage layout with a disptacher.
Contract field declarations create the storage layout which start at zero and increment by one for each field. Fields are never packed, however, storage packing may be achieved by declaring contract fields as packed structs or tuples.
Contract implementation blocks contain definitions of external functions in the contract object. If the impl block contains a colon and identifier, this indicates the impl block is satisfying an abi's constrained functions. The "ext" keyword indicates the function is publicly exposed via the contract's dispatcher. The "mut" keyword indicates the function may mutate the global state in the EVM-sense; that is to say "mut" functions require a "call" instruction while those without may use "call" or "staticcall" to interface with the contract.
todo: revisit this. do types satisfy this instead?
Control Flow
Control flow is composed of loops, branches, and pattern matching.
Loops
Loops are blocks of code that may be executed repeatedly based on some conditions.
Loop Control
<loop_break> ::= "break" ;
<loop_continue> ::= "continue" ;
The <loop_break>
keyword "breaks" the loop's execution, jumping to the end of the loop
immediately.
The <loop_continue>
keyword "continues" the loop's execution from the start, short circuiting the
remainder of the loop.
Loop Block
<loop_block> ::= "{" ((<expr> | <stmt> | <loop_break> | <loop_continue>) ";")* "}" ;
Dependencies:
The <loop_block>
is a block of code to be executed repeatedly. All other loops are derived from
this single loop block.
Core Loop
<core_loop> ::= "loop" <loop_block> ;
The core loop block is the simplest of blocks, it contains no code to be injected anywhere else. All other loops are syntactic sugar over the core loop. The "desugaring" step for each loop is in the control flow semantic rules.
For Loop
<for_loop> ::= "for" "(" [(<stmt> | <expr>)]";" [<expr>] ";" [(<stmt> | <expr>)] ")" <loop_block> ;
Dependencies:
The <for_loop>
is a loop block prefixed with three individually optional items. The first may be a
statement or expression, the second may only be an expression, and the third may be an expression or
statement.
While Loop
<while_loop> ::= "while" "(" <expr> ")" <loop_block> ;
Dependencies:
The <while_loop>
is a loop block prefixed with one required expression.
Do While Loop
<do_while_loop> ::= "do" "while" <loop_block> "(" <expr> ")" ;
Dependencies:
The <do_while_loop>
is a loop block suffixed with one required expression.
Semantics
todo
Code Block
A code block is a sequence of items with its own scope. It may be used independently or in tandem with conditional statements.
Declaration
<code_block> ::= "{" ((<stmt> | <expr>) ";")* "}" ;
Dependencies:
The <code_block>
is a semi-colon separated list of expressions or statements delimited by curly
braces.
Semantics
Code blocks may be contained in loops, branching statements, or standalone statements.
Code blocks represent a distinct scope. Identifiers declared in a code block are dropped once the code block ends.
Branching
Branching refers to blocks of code that may be executed based on a defined condition.
If Else If Branch
<if_else_if_branch> ::= "if" "(" <expr> ")" <code_block>
("else" "if" "(" <expr> ")" <code_block>)*
["else" <code_block>] ;
Dependencies:
The <branch>
contains an "if" keyword followed by a parenthesis delimited expression and a code
block. It may be followed by zero or more conditions under "else" "if" keywords followed by a
parenthesis delimited expression and a code block, and finally it may optionally be suffixed with an
"else" keyword followed by a code block.
If Match
<if_match_branch> ::= "if" <pattern_match> <code_block> ;
Dependencies:
The <if_match_branch>
contains a pattern match expression followed by an optionally typed
identifier followed by a code block.
Match
<match_arm> ::= (<union_pattern> | <ident> | "_") "=>" <code_block> ;
<match> ::=
"match" <expr> "{"
[<match_arm> ("," <match_arm>)* [","]]
"}" ;
Dependencies:
The <match_arm>
is a single arm of a match statement. It may optionally be prefixed with a union
pattern and contains a lambda.
The <match>
statement is a group of match arms that may pattern match against an expression.
Semantics of the match statement are defined in the control flow semantics.
Ternary
<ternary> ::= <expr> "?" <expr> ":" <expr> ;
Dependencies:
The <ternary>
is a branching statement that takes an expression, followed by a question mark, or
ternary operator, followed by two colon separated expressions.
Semantics
If Else If Branch
The expression of the "if" statement is evaluated. The type of the expression must either be a boolean or it must be a value that can be cast to a boolean. If the result is true, the subsequent block of code is executed. Otherwise the next branch is checked. If the optional "else if" follows, the above process is repeated until either there are no more branches or the optional "else" follows. If no branches have resolved to true, the "else" block is executed.
fn main() {
let n = 3;
if (n == 1) {
// ..
} else if (n == 2) {
// ..
} else {
// ..
}
}
If Match
The "if match" statement executes as the "if" statement does, however, the expression to evaluate is a pattern match. While the pattern match semantics are specified elsewhere, the "if match" branch brings into scope the identifier(s) of the inner type(s) of the matched pattern.
type Union = A(u8) | B;
fn main() {
let u = Union::A(1);
if u matches Union::A(n) {
assert(n == 1);
}
}
Match
Matching requires all possible patterns for a given expression's type to be evaluated. If any pattern is not matched in a match block, a compiler error is thrown. The semantics for match arms are the same as those for the "if match" statement.
The remaining branches for a pattern match may be grouped together either with an identifier or if the identifier is unnecessary, an underscore. Using an identifier assigns a subset of the associated type into scope. The subset of the type contains one of the unmatched members. This does not create a new distinct data type, rather it infers the non-existence of the pre-matched branches.
type Ua = A | B;
type Ub = A | B;
fn main() {
let u_a = Ua::B;
let u_b = Ub::B;
match u_a {
Ua::A => {},
Ub::B => {},
}
match u_b {
Ua::A => {},
n => {
// `n` inferred to have type `Ub::B`
}
}
}
Ternary
The ternary operator evaluates the expression, the first expression's result must be of type boolean, and if the expression evaluates to true, the second expression is evaluated, otherwise the third expression is evaluated.
fn main() {
let condition = true;
let mut a = 0;
if (condition) {
a = 1;
} else {
a = 2;
}
let b = condition ? 1 : 2;
assert(a == b);
}
Short Circuiting
For all branch statements that evaluate a boolean expression to determine which branches to take, the following statements hold if the expression is composed of multiple inner boolean expressions separated by logical operators.
- if
<expr0> && <expr1>
and<expr0>
isfalse
, short circuit tofalse
- if
<expr0> || <expr1>
and<expr0>
istrue
, short circuit totrue
Also, for all chains of "if else if" statements, if the first evaluates to true, do not evaluate the remaining chained statements.
Operators
Operators are syntax sugar over built-in functions.
Binary
<arithmetic_binary_operator> ::=
| "+" | "+="
| "-" | "-="
| "*" | "*="
| "/" | "/="
| "%" | "%="
| "**" | "**=" ;
<bitwise_binary_operator> ::=
| "|" | "|="
| ">>" | ">>="
| "<<" | "<<="
| "&" | "&="
| "^" | "^=" ;
<logical_binary_operator> ::=
| "=="
| "!="
| "&&"
| "||"
| ">" | ">="
| "<" | "<=" ;
<binary_operator> ::=
| <arithmetic_binary_operator>
| <bitwise_binary_operator>
| <logical_binary_operator> ;
Unary
<arithmetic_unary_operator> ::= "-" ;
<bitwise_unary_operator> ::= "~" ;
<logical_unary_operator> ::= "!" ;
<unary_operator> ::=
| <arithmetic_unary_operator>
| <bitwise_unary_operator>
| <logical_unary_operator> ;
Semantics
Operator overloading is disallowed.
operator | types | behavior | panic case |
---|---|---|---|
+ | integers | checked addition | overflow |
- | integers | checked subtraction (binary) | underflow |
- | integers | checked negation (unary) | overflow |
* | integers | checked multiplication | overflow |
/ | integers | checked division | divide by zero |
% | integers | checked modulus | divide by zero |
** | integers | exponentiation | - |
& | integers | bitwise AND | - |
\| | integers | bitwise OR | - |
~ | integers | bitwise NOT | - |
^ | integers | bitwise XOR | - |
>> | integers | bitwise shift right | - |
<< | integers | bitwise shift left | - |
== | any | equality | - |
!= | any | inequality | - |
&& | booleans | logical AND | - |
\|\| | booleans | logical OR | - |
! | booleans | logical NOT | - |
> | integers | greater than | - |
>= | integers | greater than or equal to | - |
< | integers | less than | - |
<= | integers | less than or equal to | - |
Compile Time
Compile time, also referred to as comptime, is an expression, function, branch, or macro, that may be resolved during compilation. Comptime expressions and functions resolve to constant values at compile time, while comptime branches provide conditional compilation.
Literals
Characters
<bin_char> ::= "0" | "1" ;
<dec_char> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
<hex_char> ::=
| "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "a"
| "b" | "c" | "d" | "e" | "f" | "A" | "B" | "C" | "D" | "E" | "F";
<alpha_char> ::=
| "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p"
| "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" ;
<alphanumeric_char> ::= <alpha_char> | <dec_char> ;
<unicode_char> ::= ? "i ain't writing all that. happy for you tho. or sorry that happened" ? ;
Numeric
<bin_literal> ::= { "0b" (<bin_char> | "_")+ [<numeric_type>]} ;
<dec_literal> ::= { (<dec_char> | "_")+ [<numeric_type>]} ;
<hex_literal> ::= { "0x" (<hex_char> | "_")+ [<numeric_type>]} ;
<numeric_literal> ::= <bin_literal> | <dec_literal> | <hex_literal> ;
Numeric literals are composed of binary, decimal, and hexadecimal digits. Each digit may contain an arbitrary number of underscore characters in them and may be suffixed with a numeric type.
Binary literals are prefixed with 0b
and hexadecimal literals are prefixed with 0x
.
String
<string_literal> ::= { '"' (!'"' <unicode_char>)* '"' } | { "'" (!"'" <unicode_char>)* "'" };
String literals contain alphanumeric characters delimited by double or single quotes.
Boolean
<boolean_literal> ::= "true" | "false" ;
Boolean literals may be either "true" or "false".
Literal
<literal> ::= <numeric_literal> | <string_literal> | <boolean_literal> ;
Semantics
Numeric literals may contain arbitrary underscores in the same literal. Numeric literals may also be
suffixed with the numeric type to constrain its type. If there is no type suffix, the type is
inferred by the context. If a type cannot be inferred, it will default to a u256
.
Both numeric and boolean literals are roughly translated to pushing the value onto the stack.
String literals represent string instantiation. String instantiation behaves as a packed u8
array instantiation.
const A = 1;
const B = 1u8;
const C = 0b11001100;
const D = 0xffFFff;
const E = true;
const F = "asdf";
const G = "💩";
Constants
Declaration
<constant_declaration> ::= "const" <ident> [<type_signature>] ;
Dependencies:
The <constant_declaration>
is a "const" followed by an identifier and optional type signature.
Assignment
<constant_assignment> ::=
<constant_declaration> "=" <expr> ;
Dependencies:
The <constant_assignment>
is a constant declaration followed by an assignment operator and either
an expression or a comma separated list of identifiers delimited by parentheses followed by a code
block.
Note: The expression must be a comptime expression, but the grammar should not constrain this.
Semantics
Constants must be resolvable at compile time either by assigning it a literal, another constant, or an expression that can be resolved at compile time.
The type of a constant will only be inferred if its assignment is a literal with a type annotation, another constant with a resolved type, or an expression with a resolved type such as a function call.
const A: u8 = 1;
const B = 1u8;
const C = B;
const D = a();
const E: u8 = b();
comptime fn a() -> u8 {
1
}
fn b() -> T {
2
}
Branching
<comptime_branch> ::=
"comptime" (
| <if_else_if_branch>
| <if_match>
| <match>
| <ternary>
) ;
Dependencies:
The <comptime_branch>
is a branch that is evaluated at compile time where only the truthy branch
is compiled. It is defined as the "comptime" keyword followed by any of the branches.
Semantics
Since comptime
must be resolved at compile time, the branching expression must be resolvable at
compile time and of type bool
. That is to say the expression must itself be a literal, constant,
or another expression resolvable at compile time.
In the case of compile time branching, branches that are not matched will be removed from the code at compile time.
use std::{
builtin::HardFork,
op::{tstore, tload, sstore, sload},
};
const LOCK_SLOT: u256 = keccak256("mutex").into() - 1;
enum Lock {
Locked,
Unlocked,
}
fn reader() -> (u256 -> u256) {
match @hardFork() {
HardFork::Cancun => tload,
_ => sload,
}
}
fn writer() -> ((u256, u256) -> ()) {
match @hardFork() {
HardFork::Cancun => tstore,
_ => sstore,
}
}
fn nonreentrant(action: T -> U) {
if reader()(LOCK_SLOT) matches Lock::Locked {
revert();
}
writer()(LOCK_SLOT, Lock::Locked);
let res = action();
writer()(LOCK_SLOT, Lock::Unlocked);
return res;
}
Functions
<comptime_function> ::= "comptime" <function_assignment> ;
Dependencies:
The <comptime_function>
is a function that is evaluated at compile time. It is defined as the
"comptime" keyword followed by a <function_assignment>
.
Semantics
Since comptime
must be resolved at compile time, the function must contain only expressions
resolvable at compile time.
comptime fn a() -> u8 {
1
}
comptime fn b(arg: u8) -> u8 {
arg * 2
}
comptime fn c(arg: u8) -> u8 {
a(b(arg))
}
const A = c(1);
const B = c(A);
Modules
Declaration
<module_declaration> ::= ["pub"] "mod" <ident> "{" [<module_devdoc>] (<stmt>)* "}" ;
Dependencies:
The <module_declaration>
is composed of an optional "pub" prefix, the "mod" keyword followed by an
identifier then the body of the module containing an optional devdoc, followed by a list of
declarations and module items, delimited by curly braces.
Import
<module_import_item> ::=
<ident> (
"::" (
| ("{" <module_import_item> ("," <module_import_item>)* [","] "}")
| <module_import_item>
)
)* ;
<module_import> ::= ["pub"] "use" <ident> ["::" module_import_item] ;
Dependencies:
The <module_import_item>
is a recursive token, containing either another module import item or
a comma separated list of module import items delimited by curly braces.
The <module_import>
is an optional "pub" annotation followed by "use", the module name, then
module import items.
Semantics
Namespace semantics in modules are defined in the namespace document.
Visibility semantics in modules are defined in the visibility document.
Modules can contain developer documentation, declarations, and assignments. If the module contains developer documentation, it must be the first item in the module. This is for readability.
Files are implicitly modules with a name equivalent to the file name.
todo: should this sanitize file names or require filenames to contain only valid ident chars?
Type, function, abi, and contract declarations must be assigned in the same module. However, trait are declared without assignment and submodules may be declared without a block only if there is a file with a matching name.
The super
identifier represents the direct parent module of teh module in which it's invoked.
Syntax Showcase
type PrimitiveStruct = {
a: u8,
b: u8,
c: u8,
};
type PackedStruct = packed {
a: u8,
b: u8,
};
type GenericStruct<T> = {
a: T,
b: T,
};
type PrimitiveTuple = (u8, u8, u8);
type PackedTuple = packed (u8, u8, u8);
type GenericTuple<T> = (T, T, T);
type Enum =
| Option1
| Option2;
type PrimitiveUnion =
| Type1(u8)
| Type2(MyPrimitiveStruct);
type GenericUnion<T> =
| Some(T)
| None;
type PrimitiveFn = u8 -> u8;
type PrimitiveMultiArgFn = (u8, u8) -> (u8, u8);
type GenericFn<T> = T -> T;
trait Add {
fn add(lhs: Self, rhs: Self) -> Self;
}
impl PrimitiveStruct {
fn default() -> Self {
return Self { 0, 0, 0 };
}
}
impl PrimitiveStruct: Add {
fn add(lhs: Self, rhs: Self) -> Self {
return Self {
a: lhs.a + rhs.a,
b: lhs.b + rhs.b,
c: lhs.c + rhs.c,
};
}
}
mod module {
mod nestedModule {
type A = u256;
}
pub use nestedModule::A;
}
use module::A;
abi ERC165 {
fn supportsInterface(interfaceId: b4) -> bool;
}
contract MyContract;
impl MyContract: ERC165 {
fn supportsInterface(interfaceId: b4) -> bool {
return true;
}
}
// `MyContract` de-sugars roughly to:
fn main<Cd: ERC165>(calldata: Cd) {
if callvalue() > 0 { revert(); }
match calldata {
ERC165::supportsInterface(interfaceId) => {
return true;
},
_ => revert(),
};
}
Full sugared ERC20 example:
abi ERC20 {
fn balanceOf(owner: addr) -> u256;
fn allowance(owner: addr, spender: addr) -> u256;
fn totalSupply() -> u256;
fn transfer(receiver: addr, amount: u256) -> bool;
fn transferFrom(sender: addr, receiver: addr, amount: u256) -> bool;
fn approve(spender: addr, amount: u256) -> bool;
}
contract MyContract {
balances: HashMap<addr, u256>,
allowances: HashMap<addr, HashMap<addr, u256>>,
supply: u256,
}
impl MyContract: ERC20 {
type Transfer = event {
sender: indexed<addr>,
receiver: indexed<addr>,
amount: u256,
}
type Approval = event {
owner: indexed<addr>,
spender: indexed<addr>,
amount: u256,
}
fn balanceOf(self: Self, owner: addr) -> u256 {
return self.balances.get(owner);
}
fn allowance(self: Self, owner: addr, spender: addr) -> u256 {
return self.allowances.get(owner).get(spender);
}
fn totalSupply() -> u256 {
return self.supply;
}
fn transfer(mut self: Self, receiver: addr, amount: u256) -> bool {
self.balances.set(caller(), storage.balances.get(caller()) - amount);
self.balances.set(receiver, storage.balances.get(receiver) + amount);
log(Self::Transfer { sender: caller(), receiver, amount });
return true;
}
fn transferFrom(mut self: Self, sender: addr, receiver: addr, amount: u256) -> bool {
if sender != caller() {
let senderCallerAllowance = self.allowances.get(sender).get(caller());
if senderCallerAllowance < max<u256>() {
self.allowances.get(sender).set(caller(), senderCallerAllowance - amount);
}
}
self.balances.set(sender, self.balances.get(sender) - amount);
self.balances.set(receiver, self.balances.get(receiver) + amount);
log(Self::Transfer { sender, receiver, amount });
return true;
}
fn approve(mut self: Self, spender: addr, amount: u256) -> bool {
self.allowances.get(caller()).set(spender, amount);
log(Approval { owner: caller(), spender, amount });
return true;
}
}
Full de-sugared ERC20 example:
type Transfer = event {
sender: indexed<addr>,
receiver: indexed<addr>,
amount: u256,
}
type Approval = event {
owner: indexed<addr>,
spender: indexed<addr>,
amount: u256,
}
abi ERC20 {
fn balanceOf(owner: addr) -> u256;
fn allowance(owner: addr, spender: addr) -> u256;
fn totalSupply() -> u256;
fn transfer(receiver: addr, amount: u256) -> bool;
fn transferFrom(sender: addr, receiver: addr, amount: u256) -> bool;
fn approve(spender: addr, amount: u256) -> bool;
}
type Storage = {
balances: HashMap<addr, u256>,
allowances: HashMap<addr, HashMap<addr, u256>>,
supply: u256,
}
const storage = Storage::default();
fn main<Cd: ERC20>(calldata: Cd) {
if callvalue() > 0 { revert() };
match calldata {
ERC20::balanceOf(owner) => {
return storage.balances.get(owner);
},
ERC20::allowance(owner, spender) => {
return storage.allowances.get(owner).get(spender);
},
ERC20::totalSupply() => {
return storage.supply;
},
ERC20::transfer(receiver, amount) => {
storage.balances.set(caller(), storage.balances.get(caller()) - amount);
storage.balances.set(receiver, storage.balances.get(receiver) + amount);
log(Transfer { sender: caller(), receiver, amount });
return true;
},
ERC20::transferFrom(sender, receiver, amount) => {
if sender != caller() {
let senderCallerAllowance = storage.allowances.get(sender).get(caller());
if senderCallerAllowance < max<u256>() {
storage.allowances.get(sender).set(caller(),senderCallerAllowance - amount);
}
}
storage.balances.set(sender, storage.balances.get(sender) - amount);
storage.balances.set(receiver, storage.balances.get(receiver) + amount);
log(Transfer { sender, receiver, amount });
return true;
},
ERC20::approve(spender, amount) => {
storage.allowances.get(caller()).set(spender, amount);
log(Approval { owner: caller(), spender, amount });
return true;
},
_ => revert(),
};
}
Semantics
The semantics section contains semantics that are not defined under specific syntax constructs, but rather are more general features or features not in the frontend.
Codesize
This document details the different options for codesize optimization. Generally, codesize and runtime efficiency are inversely correlated. Developers will have granular control both in the compiler's configuration and in the language's syntax.
Inlining Heuristics
Function inlining is a direct tradeoff of codesize and runtime efficiency. Codesize optimization may be used for reducing deployment cost or for keeping the codesize below the EVM's codesize limit.
Scoring
Functions are assigned a score based on a combination of its projected bytecode size, projected number of calls, and an optional manually entered score.
Name | Score |
---|---|
Bytecode Size | fn.bytecode.len() |
Call Count | fn.calls() |
Manual Score | u8 |
Total | (fn.bytecode.len() + 5 * fn.calls()) * man |
todo rewrite this based on gas estimations of each call
A compiler configuration can be specified for the threshold for function inlining.
todo decide on this
Analysis
The analysis for function inline scoring requires the traveral of a directed graph containing each function and other functions called within it. Traversal is depth first, as function inline scores are dependent on their bytecode size which is dependent on the inline scores of functions called within its body. Once a terminal function, a function with no internal function dependencies, is found, its inline score will be compared against the configuration threshold. If the score is greater than the threshold, it is to be inlined and a flag will be stored in the graph for future references.
Cycle detection will both prevent infinite loops in the compiler as well as detect recursion and corecursion. Recursive and corecursive functions will never be inlined for simplicity.
Dead Code Elimination
Eliminating dead code will cut codesize and improve the function inlining score, as number of calls and projected codesize of each function are both factors in the function inline score.
Syntax Modifications
todo
Namespaces
A namespace contains valid identifiers for items that may be used.
todo
Scoping
Items are brought into scope by import or declaration.
Module
The module scope contains items explicitly imported from another scope or explicitly declared in the current module scope. Items may be accessed directly by their identifier with no other annotations.
Files are implicitly modules.
mod moduleA {
// `TypeA` declared.
type TypeA = u8;
// `TypeA` may be accessed as follows:
const CONST_A: TypeA = 0u8;
}
mod moduleB {
// import `TypeA` into the local module scope
use super::moduleA::TypeA;
// `TypeA` may now be accessed as follows:
const CONST_A: TypeA = 0u8;
}
mod moduleC {
// publicly import `TypeA` into the local module scope. "pub" enables exporting.
pub use super::moduleA::TypeA;
}
mod moduleD {
// publicly import `moduleA` into the local module scope. "pub" enables exporting.
pub use super::moduleA;
}
mod moduleF {
// `TypeA` may be accessed in one of the following ways.
const CONST_A: super::moduleA::TypeA = 0u8;
const CONST_B: super::moduleC::TypeA = 0u8;
const CONST_C: super::moduleD::moduleA::TypeA = 0u8;
}
Implementation
The implementation block scope contains items explicitly imported from another scope or explicitly
declared in the current implementation block scope. Items may be accessed either directly or under
the Self
namespace.
type MyStruct<T> = { inner: T };
type MyError = Overflow | Underflow;
trait TryPlusOne: Add {
type Error;
fn tryPlusOne(self: Self) -> Result<Self, Self::Error>;
}
impl MyStruct<T> {
fn new(inner: T) -> Self {
return Self { inner: T };
}
}
impl MyStruct<T>: Add {
fn add(lhs: Self, rhs: Self) -> Self {
return Self { inner: lhs.inner + rhs.inner };
}
}
impl MyStruct<T: Add>: TryPlusOne {
type Error = MyError;
fn tryPlusOne(self: Self) -> Result<Self, Self::Error> {
if self.inner > max<T>() - 1 {
return Result::Err(Error::Overflow);
}
return Add::add(self, Self { inner: 1 });
}
}
Function
The function scope implicitly import items from parent scopes up to the parent module. Items may be explicitly declared or imported from external modules.
mod moduleA {
const CONST_A = 0u8;
const CONST_B = 1u8;
const CONST_C = 2u8;
}
use moduleA::CONST_A;
const CONST_D = 3u8;
fn func() -> u8 {
use moduleA::CONST_B;
fn innerFunc() -> u8 {
return CONST_A + CONST_B + moduleA::CONST_C + CONST_D;
}
return innerFunc();
}
Blocks
Code blocks, branch blocks, loop blocks, and match blocks implicitly import items from the parent scopes up until the parent module. Items may be imported from external module explicitly and items may be defined in each.
Visibility
todo
Inline Assembly
Opcodes
<opcode> ::=
| "stop" | "add" | "mul" | "sub" | "div" | "sdiv" | "mod" | "smod" | "addmod" | "mulmod" | "exp"
| "signextend" | "lt" | "gt" | "slt" | "sgt" | "eq" | "iszero" | "and" | "or" | "xor" | "not"
| "byte" | "shl" | "shr" | "sar" | "sha3" | "address" | "balance" | "origin" | "caller"
| "callvalue" | "calldataload" | "calldatasize" | "calldatacopy" | "codesize" | "codecopy"
| "gasprice" | "extcodesize" | "extcodecopy" | "returndatasize" | "returndatacopy"
| "extcodehash" | "blockhash" | "coinbase" | "timestamp" | "number" | "prevrandao" | "gaslimit"
| "chainid" | "selfbalance" | "basefee" | "pop" | "mload" | "mstore" | "mstore8" | "sload"
| "sstore" | "jump" | "jumpi" | "pc" | "msize" | "gas" | "jumpdest" | "push0" | "dup1" | "dup2"
| "dup3" | "dup4" | "dup5" | "dup6" | "dup7" | "dup8" | "dup9" | "dup10" | "dup11" | "dup12"
| "dup13" | "dup14" | "dup15" | "dup16" | "swap1" | "swap2" | "swap3" | "swap4" | "swap5"
| "swap6" | "swap7" | "swap8" | "swap9" | "swap10" | "swap11" | "swap12" | "swap13" | "swap14"
| "swap15" | "swap16" | "log0" | "log1" | "log2" | "log3" | "log4" | "create" | "call"
| "callcode" | "return" | "delegatecall" | "create2" | "staticcall" | "revert" | "invalid"
| "selfdestruct" | <numeric_literal> | <ident> ;
Dependencies:
The <opcode>
is one of the mnemonic EVM instructions, or a numeric literal, or an identifier.
Inline Assembly Block
<assembly_output> ::= <ident> | "_" ;
<inline_assembly> ::=
"asm"
"(" [<expr> ("," <expr>)* [","]] ")"
"->" "(" [<assembly_output> ("," <assembly_output>)* [","]] ")"
"{" (<opcode>)* "}"
Dependencies:
The <inline_assembly>
consists of the "asm" keyword, followed by an optional comma separated,
parenthesis delimited list of argument expressions, then an arrow, an optional comma separated,
parenthesis delimited list of return identifiers, and finally a code block containing only the
<opcodes>
.
Semantics
Arguments are ordered such that the state of the stack at the start of the block, top to bottom, is the list of arguments, left to right. Identifiers in the output list are ordered such that the state of the stack at the end of the assembly block, top to bottom, is the list of outputs, left to right.
Note that if the input arguments contain local variables, the stack scheduling required to construct the pre-assembly stack state may be unprofitable in cases with small assembly code blocks.
asm (1, 2, 3) -> (a) {
// state: // [1, 2, 3]
add // [3, 3]
mul // [9]
}
Inside the assembly block, numeric literals are implicitly converted into pushN
instructions. All
literals are put into the smallest N
for pushN
by bits, however, this is also accounting for
leading zeros. For example, 0x0000
would become push2 0000
to allow for bytecode padding.
Identifiers may be variables, constants, or ad hoc opcodes. When identifiers are variables, they are
scheduled in the stack. When identifiers are constants, they are replaced with their push
instructions just as numeric literals are. When identifiers are ad hoc opcodes, they are replaced
with their respective byte(s).
Built-In
Built-in functionality refers to functionality that is only available during the compiler runtime and not the EVM runtime that is otherwise inaccessible through the language's syntax.
Macros contain their own syntax and semantics, however, comptime functionality and built-in assistants cover most of the use cases for macros without leaving the language's native syntax.
Types
PrimitiveType
type PrimitiveType;
StructType
type StructType;
EnumType
type EnumType;
UnionType
type UnionType;
FunctionType
type FunctionType;
TypeInfo
type TypeInfo =
| Primitive(PrimitiveType)
| Struct(StructType)
| Enum(EnumType)
| Union(UnionType)
| Function(FunctionType);
HardFork
type HardFork =
| Frontier
| Homestead
| Dao
| Tangerine
| SpuriousDragon
| Byzantium
| Constantinople
| Petersburg
| Istanbul
| MuirGlacier
| Berlin
| London
| ArrowGlacier
| GrayGlacier
| Paris
| Shanghai
| Cancun;
Functions
@typeInfo
@typeInfo(typeSignature) -> TypeInfo;
The typeInfo
function takes a single
<type_signature>
as an argument and returns a union
of types, TypeInfo
.
@bitsize
@bitsize(typeSignature) -> u256;
The bitsize
function takes a single
<type_signature>
as an argument and returns an
integer indicating the bitsize of the underlying type.
@fields
@fields(structType) -> [T, N];
The fields
function takes a single StructType
as an argument and returns an array
of type signatures of length N
where N
is the number of fields in the struct.
@compilerError
@compilerError(errorMessage);
The compilerError
function takes a single string as an argument and throws an error at compile
time with the provided message.
@hardFork
@hardFork() -> HardFork;
The hardFork
function returns an enumeration of the built in HardFork
type. This is
derived from the compiler configuration.
@bytecode
@bytecode(T -> U) -> Bytes;
The bytecode
function takes an arbitrary function and returns its bytecode in Bytes
.