Moving to a ByteCode Representation?


#1

From: https://github.com/ChaiScript/ChaiScript/issues/266

Rather than posting an issue, I’d like to have a discussion on the feasibility of switching ChaiScript from a Syntax Tree based evaluation over to a ByteCode pattern. The main reason behind this would be performance. What would need to be done for ChaiScript to use this under the hood.


#2

I personally have very little interest in moving to a bytecode representation. I’m more interested in seeing just how far I can push the current architecture, for reasons I’ll list below:

I have no idea what it would take

I’ve never implemented a VM of any kind, so I really don’t know what it would take.

I do welcome other opinions / knowledge

Since I don’t have any idea what it would take, I do welcome those who have an idea and what to give it a shot.

I don’t think bytecode would help much

The main cost in ChaiScript occurs at function dispatch time, and this process would be unaffected by using a bytecode representation.

Most specifically, ChaiScript’s flexibility combined with its overloads and guards would allow you to do something like:

for (var i = 0; i < 100; ++i) {
  print(i);
  if (i == 2) {
    eval("def print(int i) : i < 100 { print(\" < 100: ${i}\"); }");
  }
}

This makes caching the specific overload to call somewhere between really hard and impossible, without occurring too much overhead or breaking multithreading capability.

A similar problem exists with this code, from the existing performance test suite

def go() {
  var my_array=["1", 4, 6.6l, 10ul, "1000", 100, 10.9f ];

  var q = 0;

  for (var j = 0; j < 10000; ++j)   {
    for (var i = 0; i < 6; ++i)  {
      to_string(my_array[i]);
    }
    q += j;
  }
}
    
go();

Determining which version of to_string to call is relatively expensive.

I think there is more hope with the AST approach

With smarter code analysis we can transform a ChaiScript parse from

for (var i = 0; i < 1000; ++i) {
  // do something with i
}

into this C++ fragment:

void do_for_loop(const std::string &iter_name, const int begin, const int end, ChaiScript_Engine &e, AST_Node_Ptr body) {
  int i = begin;
  e.add_object(var(&i), iter_name);
  for (; i < end; ++i) {
    body->eval(e);
  }
}

By taking advantage of the extraordinary capabilities of C++'s optimizing compilers, this version is likely to be much faster than anything we could come up with in our own bytecode representation.


#3

First, performance is good. I like performance. Better performance is a nice reason to do anything.

That said, is it a good enough reason in this specific case?

What niche purpose does chiascript fill? I don’t know about anyone else, but for me its possibly going to serve as glue and logic that isn’t performance critical. Granted, faster is always nicer then slower, but I don’t really use embedded scripting languages in performance-sensitive areas.

That being said, I find the last reason given very compelling. Compiling to bytecode is doing an end-run around the C++ compiler in a way, ISTM.


#4

Almost every example I’ve received from people testing ChaiScript comes down to something like “it cannot calculate a recursive Fibonacci algorithm as fast as Lua.”

  1. That’s factually correct. ChaiScript does not do things like tail call elimination, and it likely never will.
  2. Why do you need to calculate a Fibonacci sequence in your embedded script?
  3. That’s pointless to do with ChaiScript, since you could easily implement it in C++ and call that version (I’m strongly considering building a Fib calculator into the standard library just for this specific example)

I strongly welcome real world use cases where ChaiScript’s engine is too slow. I am extremely interested in optimizing these cases.


String representation of a function
#5

First I would like to say that I do like ChaiScript, and I absolutely love the simplicity of the embedding (miles ahead of Lua, Squirrel, Python, and just about all other script engines/libraries). The issue for me in the last couple projects has come down to speed, in both cases I ended up using Lua.

  1. Indie game project: I am still working on this game project and initially tried using ChaiScript for the scripting, both for initial game content and user generated mods. here recently I was forced to switch over to Lua as it could give me the performance I needed to update the Actors (Entities) in a game to the level needed (It is an RTS game, so tons of entities, for which the logic cannot be moved into C++, since the mods will need to be able to control the logic also, and they will only have access to the script layer.)
  2. PLC real-time calculation. I have worked in the past on an application for some specialized PLC devices trying to build a an App like marketplace for a specific industrial computer. Due to the processor running at 900MHz and the limited 256MB of RAM, I was forced to using Lua. I was able to get it to use very little RAM (about 800KB including all of my library exposed to the VM) during runtime and it ran at an acceptable speed. (One of the tricks here was to precompile to bytecode from a desktop, and send the byte code to the PLC)

The second while it did come down to speed, is not so much a fair example, since another piece of the project was a GUI that ran on the desktop and allowed the user to graphically build the flow of data from the physical sources, to be passed to scripted modules, and the modules’ output to other modules or physical output. Lua’s VM was actually a great fit for this with how it loads chunks and such. But maybe that would also be true of ChaiScript if it was on a VM.

This is just my 2 Cents, and experience. I will try ‘none the less’ to find a good use for ChaiScript in my projects, when I come across one that doesn’t have strict resource or performance requirements.

Thanks,


#6

Thank you for the excellent information. Is there any chance you could give me some specific examples for how you were trying to using ChaiScript (code examples, that is) that was not fast enough for your needs?

Ideally, something that I could add to my performance test suite for a real-world test case?

Thank you,
Jason


#7

I can look into my VCS for the game project and pull out the an example, however for the PLC system, I am unable to, as the decision was made early on to go to Lua to save RAM during runtime and ship only bytecode to the PLC. And no ChaiScript was actually written past the evaluation phase.

It might take me a few days to get to the Game Engine code, and find the correct version that was using ChaiScript.

Thanks,
Don


#8

That’d be great. Just so you know I’ve been working on code that actually optimizes the script passed in, and I’m having some great success so far, but the main thing I’m missing is real life examples to make sure I’m optimizing the correct cases.


#9

Hey Jason,
I have searched high and low for this old version of my game, and I am now convinced it was a commit in my SVN (pre Mercurial) repo. During the transition to Mercurial, I just decided to take the latest and lose all of the code history.

I am starting a new project soon which will really push the performance script engine. I will give ChaiScript a shot, and then send you the code for integration into your benchmark tests. Sorry I wasn’t able to recover the game engine code using Chai.

Thanks,
Don


#10

Thank you for looking. Let me know ASAP when you start testing it for your next project. I have a lot of reorganization and performance improvements in the works so I may be able to give you near-immediate help when you get to that point.

-Jason


#11

First of all, I want to say I love ChaiScript and I did work with ByteCode representation to write a compiler.

The single idea of using Bytecode is not affected with eval (because even as now, you replace the eval content with AST code, so now your AST code will be replaced with a sequence of bytecode.

So, how to make it:

  • make an AST visitor which convert leafs to simplest “atomic” operations. I don’t recommend you to use something too fancy, but what you will need is simply the “math” operation (or function call) and the operands are explicit.
    If you have the code:
test(a+2)

it should be converted into 2 separate atomic operations:

newVar = a+2
call test (newVar)

So, why the bytecode should be much faster:

  • it is easily optimizable: you look for example to all operations of “+” and when one of the operand is “0” (zero) you can replace the operation with an assignment.
  • it is contiguous in memory: so it means less CPU cache misses
  • based on bytecode, you can more safely remove dead code or check liveness

I can say some optimizations that can safely be done by bytecode that are not possible easily to be done on AST:

  • lowering: std::string s; s+=","; can be replaced with: s+= ‘,’; //replace string addition to char addition,
  • value propagation: if a value is read and there is only one way to make it to a lower assignment, you can say the value is lower
  • math simplification
  • after value propagation, it is very common to prove branches: “if true”, “if false”
  • dead code elimination: use variables with no output or blocks