Table-Driven Code Generation

Jul 30, 2022

A simple code generator design I've found to be useful, but still small and easy. Includes a sample version with source code.

Read →

13 Comments

jlombera

Jun 8, 2023Edited

In case I missed something, aren't you just describing (E)DSLs? In this sense, wouldn't languages that facilitate the creation of (E)DSLs (e.g. Lisp-like languages) be the way to go?

Expand full comment

Reply (1)

Ryan Fleury

Jun 8, 2023

Yes, although I don't use nor like that terminology, because I think there is an underlying uniformity to the idea of "levels of authorship". I don't think it's quite right to consider a "non-DSL PL" any more special than a "DSL". In my mind, in an ideal world, the entire computation pipeline would be accessible and visualizable as these various "levels of authorship".

Expand full comment

Reply (1)

jlombera

Jun 8, 2023Edited

I agree with you that the term DSL might be weakly defined, and that to an extent all (general-purpose) PLs are DSLs (geared towards certain use-cases more than others). But I don't feel quite comfortable with the idea of mapping the compiler pipeline to "levels of authorship". To me, levels 2-6 in your list are just implementation details of the compiler, not properties of the PL in particular nor programming in general. If we had to define levels of authorship/abstraction in programming, I would define just two: asm and everything else.

The impossibility of expressing certain constructs in a PL is a limitation of that language in particular, and can be of higher level of abstraction (like the table problem in the post) or lower level (like doing non-overflow addition in C). And solutions to that are specific to the problem/PL/compiler (some compilers provide extensions/builtins for that). But I don't think there is a foundational concept as "levels of authorship". I think (E)DSL serves much better to describe the idea you are conveying (it's simpler conceptually and well known, with lots of examples out there).

I agree that an utterly complex language that can accommodate all use-cases is not the solution (or even possible), and that (E)DSLs/"higher levels of authorship" is the way to go. This is uncommon in C (and many other "general-purpose" PL) because it's not easy to write (E)DSLs in it, and like in human communication, the language shapes the way you express, even the way you think. In contrast, (E)DSLs are very common in languages that make it easy to write them. But I agree that, even in C, we should consider DSLs as solutions to problems and limitations of the language more often.

Expand full comment

Reply (1)

Ryan Fleury

Jun 8, 2023

> To me, levels 2-6 in your list are just implementation details of the compiler

The issue is that locking them inside the compiler's implementation is a problem. They *should be* levels of authorship - if they were, then at any stage, someone could write to that level of authorship and produce whatever effects they wanted. Additionally, locking such things into the compiler forces them into a "compiler shape" - new architectures for code generation, visualizers, and so on are impossible without respecting the "compiler shape", which has bad consequences for computational efficiency and flexibility. In other words, compiler writers are not the only ones who need to care about this problem.

The reason that N% (where N>50) of the open computing infrastructure depends on either GCC or LLVM - two monstrosities - is partly because of this highly coupled architecture. My point is that you don't just want one of those layers to be ASM, and one to be C, and maybe a DSL here or there. In principle, the programmer should have the ability to drop into *any stage* of the pipeline and write procedural code which directly modifies that layer, such that any metaprogramming feature, any compiler feature, any editor feature depending on lower level parts of the pipeline (as well as higher level ones) is totally possible *in decentralized fashion*.

I certainly *do not* think - and it sounds like you agree - that the right approach is assuming that language owners (or especially committees) will address limitations or feature requests in any reasonable way by producing ever-more complex languages. There is virtually no evidence that that has worked out, nor that it will work out. So we're already in agreement about things that *translate down* to something like a C or ASM - the question is, why do you think that pattern stops at the beginning of the C -> machine code pipeline? In my eyes, the reason *it is assumed to* - from my perspective for purely historical reasons - has caused an entirely worse tooling ecosystem.

Expand full comment

Reply (1)

jlombera

Jun 8, 2023

I think you are talking about compiler technology/infrastructure. Levels 2-6 in that list have nothing to do with programming, they are relevant only in the context of a compiler. In programming, we are ultimately interested in the computer doing what we want it to do, not in how the compiler translates our instructions into machine code. This doesn't mean a technical solution, like hooking into the compiler pipeline, cannot be used to overcome theoretical problems, like insufficiently expressive PLs (it's already used in that way). In fact, I think the industry would greatly benefit from an "ideal", generalized compiler infrastructure as you are implying (don't know if it's possible, though). But I don't think it's inherent, or even relevant, to programming. Thus my problem with presenting this compiler-related "levels of authorship" as some sort of foundational programming concept. I think our disagreement (??) is only on semantics/terminology.

Personally, I prefer not have to deal with compiler technicalities when programming, and would rather have a solution on the PL side instead. I think a (relative) simple PL with enough low-level control (when required) and simple but powerful metaprogramming capabilities (so that you can easily create ad-hoc EDSLs/abstractions) would be a better solution. Of course, you'll always need good tooling (compilers, debuggers, editors, profilers, analyzers, etc).

Expand full comment

Reply (1)

Ryan Fleury

Jun 8, 2023

> I think our disagreement (??) is only on semantics/terminology.

No, I don't think so, I think we have a stronger disagreement than just semantics.

> they are relevant only in the context of a compiler

This is just not the case in my view. You said that "we are ultimately interested in the computer doing what we want it to do" - what the computer does (which is what we care about) is a direct consequence of what ASM instructions are generated. We can't just write C code and cross our fingers - it is *required* that we have full, complete transparency and control *in principle* over the whole pipeline, when we need it.

> Of course, you'll always need good tooling (compilers, debuggers, editors, profilers, analyzers, etc).

What I'm saying is that the things you claim to be only relevant for compilers I claim are relevant for making good debuggers, editors, analyzers, and so on. That is why I am saying this is not just a compiler implementation detail - it's a crucial part of how human expression is eventually translated into computation, and all of the complexities of that picture. Tools need that information. I am not comfortable leaving that inside of compilers - I think there should be a pipeline, with each stage being a different level of authorship.

Expand full comment

Reply (1)

jlombera

Jun 8, 2023

I do agree a good, open, modular, hookable compiler infrastructure is crucial to have good tooling (of all sorts), and the latter are essential in software engineering/development (thus very relevant for our industry). But it's not intrinsic to programming (the more general, abstract definition).

I don't disagree on the importance of what you are suggesting, only disagree on describing it as inherent to programming in general.

Expand full comment

Mike Smullin

Apr 13

Thanks for this idea, Ryan!

Sharing my implementation on Github at mikesmullin/metacode

Expand full comment

Luca

Mar 14

> I hope this article was useful, or at least interesting!

What an understatement! It's very clear and interesting, especially since I'm new to programming I had a great time reading this, I can't wait to implememt my own generator!! ^^

Expand full comment

Jack

Aug 9

Great read, I was interested in how people deal with metaprogramming in C. Out of interest, how do you feel about compilers that do monomorphisation with generic types? It seems like an interesting world is one where that particular step of a compiler happens early and outputs to a file, rather than outputs to some chunk of assembly in the final result.

It's probably a result of too much enterprisey code distributed across too many physical systems, but I've come to very much appreciate being able to constrain the types of some data structure with some soft of behaviour (for instance it might implement some interface), which I think is easier done as a compilation step.

If nothing else I do thoroughly appreciate the ability to physically see outside of a debugger what the generated code looks like; I've interracted with far too many absolutely bizarre implementations that are significantly harder than necessary to debug.

Great article!

Expand full comment

Thomas

Oct 1, 2023

Here is my previous post a nice markdown view if you want (the code is more readable) : https://hackmd.io/@Mewily/HyLyk_wla

Expand full comment

Thomas

Oct 1, 2023

Thank for this post, I just discover the blog !

I personally use an different form similar to X macro, but I can add any number of parameter without the need of changing everything. And the way I use them is a tiny bit more flexible.

Here is a short (modified for the example) portion of code for an interpreter I wrote in C,

My ways of making extensible enum (enum with more data attach to it) look like:

```c

#define category_int 'i'

#define category_float 'f'

// defining the enum named bi (build in) : (c type, type category)

#define i32_bi (int32_t , category_int)

#define i64_bi (int64_t , category_int)

#define f32_bi (float, category_float)

#define f64_bi (double, category_float)

// each member of the enum should have the same number of value

// you can do add more data attach to it like: (int32_t , category_int, "INT", 1)

```

Getting the info you need about an bi :

```c

#define get_1(a, b) a

#define get_2(a, b) b

#define bi_type(bi) get_1 bi

#define bi_category(bi) get_2 bi

```

so things like `bi_category(f64_bi)` -> `category_float` === `'f'` work

Mapping enum to stuff (function, macro...) :

```c

//build_in = (integer_type) Union (float_type)

// map the macro on all integer type inside bi :

#define map_on_bi_integer(macro)\

macro(i32_bi)\

macro(i64_bi)

#define map_on_bi_float(macro)\

macro(f32_bi)\

macro(f64_bi)

#define map_on_bi(macro)\

map_on_bi_integer(macro)\

map_on_bi_float(macro)

```

And then map it to whatever ! You can generate enum, switch case, custom function body and definition...

```c

// glue without evaluating the macro a and b

#define glue_instant(a,b) a ## b

// eval the macro FIRST, and then glue the result (other wise you will glue the unevaluated macro

#define glue(a,b) glue_instant(a, b)

enum bi_enum

{

// note : the ## glue macro don't directly work here, you need

#define declare_bi_enum_member(bi) glue(enum_,bi_type(bi)),

map_on_bi(declare_bi_enum_member)

expend to :

enum_i32_bi, enum_i64_bi, enum_f32_bi, enum_f64_bi,

#undef declare_bi_enum_member

}

Here is a switch case exemple :

```c

bool get_category(bi_enum b)

{

switch(b)

{

#define handle_case(bi)\

case glue(enum_, bi): return bi_category(bi); break;

/*break is not really helpful but anyways*/

map_on_bi(handle_case)

default: TODO; break; // (some macro to crash and tell the case was not implemented)

}

// or you can define a global array of bool if you want

```

the `map_on...` macro are great to map stuff, especially because you can create different family like :

```c

bool display(bi_enum b)

{

switch(b)

{

#define handle_int(bi)\

case glue(enum_, bi): printf("some integer!"); break;

#define handle_float(bi)\

case glue(enum_, bi): printf("some float!"); break;

/*break is not really helpful but anyways*/

map_on_bi_integer(handle_int)

map_on_bi_float(handle_float)

default: TODO; break;

}

```

I generally I put the enum number inside the define (`#define i32_bi (int32_t , category_int, 1)`), and add :

```

#define get_3(a, b, c) c

#define bi_id(bi) get_3 bi

```

so I don't declare an enum and the macro are a tiny bit easier to write.

But you can do it without it like in this example if you want.

(Fun fact, I'm a 21 years student in CS)

Expand full comment

Louis

Aug 16, 2023

It looks a lot like Coq (proof assistant) with its tactics. You can automate the writing of your own program and you need to do it in hope of getting anything done. But you can also write the program yourself.

Anyway in Coq, you have two levels of authorship and it seems to work fine.

Expand full comment

Digital Grove

Table-Driven Code Generation