Case-Study: Daedalus VM¶
Implementation of a virtual machine able to execute Daedalus Bytecode.
Daedalus is the scripting language used in the original gothic games. It was made for the use within the ZenGin specifically and is not used outside of the Gothic games. Hence, the language and bytecode format are proprietary.
DaedalusVM in ZenLib¶
This is a re-write of the DaedalusVM found inside ZenLib which might get replaced by this one at some point. The reason for rewriting is that we want to keep the door open for a possible different scripting backend to be used by REGoth. The ZenLib’s DaedalusVM is not structured in a way which makes that possible.
Design decisions¶
Unlinke the DaedalusVM in ZenLib, this VM will handle script symbols very differently.
While previously we had created native classes for each script class we needed to
access from engine code, this VM uses general key
/value
pairs to store
data from script instances. This saves us from maintaining a copy of every script class
which might even be different between games or mods.
It will also make the internal differences between global variables, arrays and classes simpler.
Parts of a DaedalusVM¶
To execute daedalus code, the VM needs the following basic parts:
- Instruction Memory
- Instruction Interpreter
- Instruction Executer
- Symbol Data Storage
- External function mapping
- Global registers
- Stack
Each of these will be discussed in the further sections.
Instruction Memory¶
The Instruction Memory is where the compiled byte code is stored and waiting to be executed. The byte code itself is compiled from text files by the original game and stored inside a .DAT-file, which can be read by ZenLib.
In daedalus byte code a single instruction can be made up from multiple bytes. However, as instructions are interpreted on the fly, there is no need to further process the byte code to split it into the real instruction counterparts. Another reason for not doing that is, that jump target will always reference raw byte code addresses.
Instruction Interpreter¶
This part is what interprets the bytecode to make the actual instruction from multiple bytes. It was decoupled from the actual execution part for a better software strucutre and better debugging possibilities.
The interpreter can be given a raw byte code address and it will try to pack all information from the instruction so the execution stage can work with it.
Instruction Executor¶
Once an instruction has been interpreted, this part will execute it. Executing an instruction
will modify the internal state of the VM. For example, an ADD
-instruction will pop two
values from the stack, add them together and push the result onto the stack.
Because of the way the original games VM is structured, there can be recursive calls to
the instruction interpreter and executor stages once a CALL
-instruction is encountered.
Symbol Data Storage¶
As many programming languages do, daedalus calls variables or functions symbols internally.
Symbols can have a name and a kind (e.g. integer variable or external function) so they
can be identified. Which symbol is valid to be used in which context is something the compiler
would check, but the daedalus VM needs to also adhere to the same rulse. For example,
it makes sense to CALL
a function, but it makes no sense to CALL
a string.
Other than having a kind, symbols will also have data associated with them. For an interger symbol that would be the value, for example. A function symbol will also carry the address of the function with it.
The following kinds of symbols exist:
- Void
- Float
- Int
- String
- Class
- Func
- Instance
- Prototype
Warning
Special care has to be taken as any symbol with the Class-Variable flag set could be part of a script object so the underlaying data could change as the Current Instance is set to a different object! See Global Registers for more information.
Symbolkind Void¶
The Void type is only used for the return type of functions/externals stored in
the symbol offset member. The symbols return flag should be present if not the
functions return type is any other than void
.
Else, Void is not expected to be used as symbol type (only for error handling and internally at runtime for end-of-parameters in external definitions).
Symbolkind Float¶
This kind of symbol will store one or more 32-bit floating point values. The arraylength is comming from the .DAT-file.
Note that daedalus does not have actual support for floats, but they can be still used as a parameter for externals.
Symbolkind Int¶
This kind of symbol will store one or more 32-bit signed integer values. The arraylength is comming from the .DAT-file.
Symbolkind String¶
This kind of symbol will store one or more string values. The arraylength is comming from the .DAT-file.
Symbolkind Class¶
In the original game, script classes were embedded into their native engine counterparts. Embedded means, that the data of the script class was a substructure inside the native class. The script code itself would get a raw native pointer to the native object and an offset to the location where the data of the script class started. Then it would just do raw memory access to modify that data.
Hence, symbols of this kind will store the offset of the script data inside the native class. Since REGoth uses a safer approach, this is not used.
Symbolkind Function¶
Symbols of the Function-kind describe script functions which can be called by the VM itself or by the script code. The symbol will hold the address of the function in the instruction memory (byte code).
Note
If the External-flag is set, the address of the function would not point into the instruction memory but rather be a raw native function pointer into the native game code.
Symbolkind Instance¶
This kind of symbol stores a reference to a script object. This could be a character, an item, a quest or others.
Warning
This is not to be confused with an Instance Function. In Daedalus, an Instance is a function similar to a constructor of an object. Once a script object is created, its Instance function needs to be ran. For example, after creating a blank script object for an NPC, we can run an Instance-function to make the NPC whoever we want.
An instance-function can also call arbitrary functions. It really is a usual script function with a fancy name.
Symbolkind Prototype¶
In Daedalus, a Prototype is similar to an abstract class. It is like an Instance that you cannot construct, but you can derive from it. Code put within the Prototypes constructor will run before the Instance constructor so they is mostly used for some general setup while an Instance sets more specialized parameters of whatever it describes.
For example, NPC_Default
is a prototype which sets up a default character. All other
Characters will derive from it and modify only what they need.
External Function Mapping¶
Functions called by script code can not only be other script functions but also native engine functions, called Externals. This is used whenever a something was too hard to implement in daedalus, not fast enough or simply not possible. Most of the external functions however trigger some sort of game mechanics related actions which are then handled by the native engine, for example letting a character run to some location or adding a quest log entry.
The original game stored raw native function addresses within the .DAT-file so their DaedalusVM could call directly into the native code. However, for better compatibility between version, those values are scrapped and re-evaluated after loading the .DAT-File.
In REGoth, we just do the lookup of the native function address in a similar
fashion, by keeping a mapping of External Symbol to Native Function with the
VM which is generated after loading. Once the Executor encouters a
CALL_EXTERNAL
it can then look up which native function to call via the
symbol referenced within the instruction.
Global Registers¶
Unlike a real processor, the Daedalus VM does not need registers for adding or subtracting numbers as it can just use those of the host CPU. However, there are a small number of specical registers controlling the executing of script code:
- Program Counter (PC)
- Current Instance
Program Counter¶
The Program Counter register is just as one would expect: It points to the instruction which is
to be executed next within the instruciton memory. It is either increased as the program flow
continues or set to a completely different location after a JUMP
or function call.
Note
There is no need to push it to the stack, since the stack of the host machine can be used.
Current Instance¶
Within Current Instance, the game can set something similar to a this-pointer, which is used by the Instance constructors.
It is usually set via the SET_INSTANCE
-instruction, which takes a script
symbol of the Instance-kind, which stores a reference to a script object. The
Current Instance will then be set to the referenced script object.
All variables accessed which have the Class-Variable-flag set will then need to look up their data values from the referenced script object.