NOTE: This is a typical work-in-progress text as knowledge of packages improves. Should be ready somewhen, I'll need to nail down some technical explanations and samples and fix the emacs proposal. This is currently about 70% of the complete planned FFI text.
"A good foreign function interface is 25% code and 75% policy."
-- FFIGEN http://www.ccs.neu.edu/home/lth/ffigen/
Technical explanation and a survey of various foreign function interfaces ("FFI"), how various systems and languages designed it, implemented it. What other possibilities you have to extend a system or language. Especially stylistic problems, if to keep the interface small and simple, or better (Lisp-style: "Good" or "MIT/Stanford-style"), how to "Do-the-Right-Thing".
A overview from simple static to more sophisticated dynamic external function invocation in the same process space, even for languages which have no eval. Methods to communicate to external processes or processes on other machines include: sockets, RMI, COM or CORBA.
At the end I want show how to extend typical static languages as C or C++ dynamically, such as define functions, callbacks and extend objects at run-time. This might be useful for GUI's: callbacks vs. events, design issues, how COM should be extended, what COM+ will not solve and what we can do nevertheless to have dynamic objects sooner or later.
This file was created because of sheer frustration with the current way in emacs to handle calls to non-interactive external processes, (or the lack to extend elisp dynamically). Every major problem seems to be solved by creating an external executable, passing information on the command-line as typical unix filter, whilst it could be solved easier by calling known functions in shared libraries. Language and system independent of course. So you can read this text as some kind of proposal for an emacs FFI.
The second reason is that last year I implemented my own foreign function interface ("FFI") for AutoLISP (Win32-MSVC only) which was very easy to implement but very hard "to do it the right way". I still don't know it.
The third reason is that I'll have to create a massively large interface from various languages (Corman Lisp, perl) to AutoCAD, a wide-spread Win32 CAD program with thousands of objects, functions and properties. Writing and maintaining this interface by hand is not a delightful task. I do see the advantage of doing it stupid: mass process the names and signatures somehow, create the wrappers automatically (e.g with the editor or perl) and do fixes on updates or errors manually. With the next release cycle you do the same again and have new set of sources. But I want to create release independent versions and I want to keep my simple syntax of the various high-level languages I like. I can stand delving into C, or even worse C++, only some weeks a year, that's why a favor a FFI (If for perl or lisp I don't care that much, just not C).
Note, that I'm no expert in FFI's, I didn't have a class on this, I'm just a plain user of some FFI's and implementor with basic assembler and compiler knowledge who wants to understand all of them to do it right. Maybe I take this text as base for an AutoCAD developers conference note. This opinion is highly personal and generic by seeing all other existing possibilities as well. So don't bark when you will read in the emacs section: "Doing GUI is hard.". It is hard compared to others from my little horizon.
Interestingly FFI's seem to be a hot topic now. Just this week I heard about completely new projects: perl FFI, which is the first project which uses the Haible ffcall (sigh!), a new mzScheme FFI and a new Scheme 48 FFI (0.53), besides the ongoing projects for emacs, AutoLISP and more Corman Lisp abstraction utilities. I would not be impressed to see a python FFI and dynamic Gambit FFI as well soon. Maybe I just oversaw it. Gambit is near to its completely new incarnation as Visual Scheme anyway.
A foreign function interface is orginally a lisp term for dynamically defining and calling functions in outside shared libraries ("foreign function", "call-out") and as additional option also creating functions which can be called from outside functions ("callback", "foreign-callable", "call-in").
FFI's can only be used to communicate between libraries in the same process space. To talk to other processes better use sockets, RMI or COM/CORBA.
In the next chapter I describe how this works in a simple way. (finding the function and doing the argument translation).
But a typical FFI also involves difficulties with:
Other interfaces had been designed to overcome even more problems:
The most typical usage is to extend scripting and high-level languages with an external API, so that the maintainer of the language doesn't have to provide all new and requested functionality by himself. Help yourself. Static or dynamic C extensions are targeted only to experienced users. You had to be an expert for the scripting language, for C and for the application also if the scripting language was embedded into an application. However, static or dynamically loaded C extensions are preferred for bindings to larger libraries, FFI's should be considered only for minor or intermediate problems or users unable to do C.
Dynamic FFI's as Visual Basic's Declare statement let the user define the extension in the native scripting language syntax. You only have to consult the API Reference to search for the needed Win32 API function, and translate the required argument and return types to the FFI syntax.
This works good for systems with a large amount of shared libraries such as Win32, but could also be used for systems with big static monolithic kernels such as unix with almost no dynamic libraries but a lot of useful functions reachable from the process space. Esp. in unix where the kernel does not provide memory protection across processes it would be easy to access most of the useful functions in external processes. (An impossibility on Win32). FFI's to static external functions (not sitting in shared libs) are currently not implemented to my knowledge, because it is rather hacky. Hope someone could enlighten me.
Andrea Raab wrote this nice summary: [www]
The biggest problem is always to search in the various API references for the fitting function. With the large amount of available COM components and documented COM interfaces to external applications the number of good foreign callable functions increased dramatically. You don't have to write COM clients to access custom or early bound COM interfaces, but most likely you will just use OLE Automation via IDispatch, which does all the horrible work for you. A FFI is very similar to the IDispatch mechanism, only much hackier and not limited to COM servers which expose just this interface. However COM should be more robust (if the programmer did everything right).
- The FFI is slow compared to pluggable primitives. Since there are many generic conversions necessary the overhead of calling C functions directly is very high. That doesn't matter as long as you're calling functions that actually do quite a bit of work but if you're concerned about the speed (e.g., when you call these functions a zillion times per second) you are much better off using pluggable primitives.
- The FFI is very platform dependent. Pluggable primitives give you a way of defining abstract interfaces (e.g., primitives) to the provided functionality. When using the FFI you talk to the underlying system directly.
For example with AutoCAD you have the possibility to call the exported DLL functions or use the COM interfaces. Both have advantages and disadvantages. Most problems are compatibility related. You have to be a wizard to expect the right API changes in new releases (vendor politics), platform changes (system API politics), compiler changes (technical issues), hardware changes (CPU).
Let's attack this simple problem:
Call the "GetTempPath" function from Win32 KERNEL32.DLL.
It will return a string similar to
(or (getenv "TMP") (getenv "TEMP") (act-directory)).
After having a look into the Win32 API docs ( Borland has the reference
WinHelp file online somewhere ) or grepping .../include you see that
GetTempPath has this header file entry (we call it signature):
DWORD GetTempPath(
DWORD nBufferLength, // size, in characters, of the buffer
LPTSTR lpBuffer // address of buffer for temp. path
);
In fact under NT which is Unicode by default there are two functions for each
function passing a string, one with an "A" suffix (for Ascii) and one with
an "W" (for Wide). Wide means that every char is wide, an int instead of a
single byte char.
We choose GetTempPathA because unicode strings confuse us. For COM btw. we must use Unicode. And COM also has a pascal-style string type BSTR with a seperate int at the front, denoting the strlen.
In my AutoLISP API (for Win32 and MSVC only, so it's really simple) we call it like this:
(setq lib (ffi-LoadLibrary "kernel32.dll"))
(setq func (ffi-GetProcAddress lib "GetTempPathA"))
;; Reserve enough room for the string!
(setq s " ")
;; Lowest-level interface
(ffi-call-explicit func ; just the raw numbers
'(4 (4 t)) ; in
4 ; out, we don't need it
:pascal-linkage
(strlen s) s))
(princ s) ; The string is in 's' by side effect
What goes on behind?
LoadLibrary is the Win32 call to get the process handle of the
loaded library. If it is not already loaded it will be loaded by
searching the usual system dependent library search paths. Most likely
(getenv "PATH") plus some private dirs.
For already loaded processes the Win32 function GetModuleHandle is
also useful but not needed.
Other OS's name this function dlopen (Sun, BSD), dld_link (GNU, linux, ...),
shl_load on HPUX, rld_load on NEXT,
or use different calls (ldopen, ... on AIX; lib$find_image_symbol on VMS)
/* C pseudocode */
unsigned long ffi_LoadLibrary (filename) {
/* get the argument from the high-level language */
return (unsigned long) LoadLibrary( filename );
/* return the handle as long to the high-level language */
}
GetProcAddress is used to return the address of the exported function
by name. ffi-LoadLibrary just returns this handle as integer to be used for the
For some functions and and some OS it needs the exported name, but it might even
have no names, just an ID (the "ordinal value"). You can pass this number on
Win32 as well.
Other OS's name this functions dlsym (Sun, BSD), dld_get_func (GNU dld,
linux, ...), shl_findsym on HPUX, rld_lookup on NEXT or use different calls
(read manually from the XCOFF .loader section on AIX; lib$find_image_symbol
on VMS)
/* C pseudocode */
unsigned long ffi_GetProcAddress (handle name-or-number) {
/* get the arguments from the high-level language */
return (unsigned long) GetProcAddress( handle, name ));
/* return the pointer as long to the high-level language */
}
Now you have the function pointer and can call the function. But first you have to pass the required arguments onto the stack. The only important thing is the number of bytes each argument needs. The called function gets the arguments from the stack, and does its internal work. In my syntax I used '(4 (4 t)) to denote this. "4" means 4 byte (i.e. 32 bit integer), in the second arg (4 t) the 't' denotes a pointer, an indirection. See the difference in assembler below.
(FFI-CALL-EXPLICIT funcproc param-types return-type linkage args)
To make it simple I just show how to pass the parameters for int and pointers. I omit signed/unsigned, alignment issues, type conversion, exception handling, linkage stuff, thread safety, parameter preprocessing and type declarations and such for brevity. This nice tutorial [www] explains the whole piece (args, stack, cdecl vs. stdcall linkage, linking to a DLL, ...) in full and glory assembler.
/* C pseudocode */
ffi_call_explicit (...) {
/* get the parameters into Params[] */
/* and the types into ParamsTypes[], 4 will be T_INT, (4 t) T_POINTER to int */
stdcall decreases the params (from right), cdecl increases (from left)
if (cLinkage == CDECL_LINKAGE) {
from = 0; to = nParams-1;
} else {
from = nParams-1; to = 0;
}
i = from;
while (i != to) {
switch (ParamsTypes[i]) {
/* inline assembler */
case T_INT: /* size 1 or 2 or 4: any number for
char, bool, short, long, ... */
_asm {
mov eax, Params[i]
push eax
}
break;
}
case T_POINTER: /* (4 t): any pointer */
_asm {
mov eax, dword ptr Params[i]
push eax
}
break;
case T_DOUBLE: /* any size 8, here for double */
double dParam = (double) Params[i];
_asm {
mov eax, dword ptr [dParam]
push eax
mov eax, dword ptr [dParam + 8]
push eax
}
break;
}
if (cLinkage == CDECL_LINKAGE) {
i++;
} else {
i--;
}
}
/* we have all the args, call the function now */
/* the return value will be in eax on intel x86 */
switch (ReturnType) {
case T_VOID: /* expect no return value in eax */
_asm {
call dword ptr [funcproc]
}
break;
case T_LONG:
case T_POINTER:
case T_SHORT: /* expect return value in eax */
default:
_asm {
call dword ptr [funcproc]
mov lReturn, eax
}
break;
case T_DOUBLE: /* double from the FPU stack */
_asm {
call dword ptr [funcproc]
fstp qword ptr [dReturn]
}
break;
}
/* handle linkage, do the back conversion,
return lReturn/dReturn to the high-level language */
}
This is it in the easiest case. Anything goes. Only with special hardware
(here the FPU for double) it is different.
Functions with Pascal Linkage unwind the stack by themselves, with C you have to do it manually. And arguments to Pascal functions expect the arguments in reverse order pushed onto the stack, arguments to cdecl function in normal order. Again see the assembler tutorial here: [www]
I usually remember it like this: Pascal is a simple VB-alike. The user is lazy, the system function does the work. In fact all of the Win32 API functions have pascal linkage; an easy to remember association, though its unfair (as most of the MS bashing). Technically it has different reasons. C needs variadic arguments (in lisp terms: optional or rest parameters). So it cannot know the number of bytes it needs to unwind. The caller must do that.
Most non-WinAPI functions have a cdecl linkage (the clib, most useful so's and dll's lying around).
Something like this below should unwind the stack manually. But I'm not
quite sure now. Maybe you have to save eax in another register and not on the
stack.
Normally in debugging mode the compiler adds a few instructions to check the
stack for corruption, for a wrong 'esp' register, the "stackpointer". So you
will get an assertion if anything went wrong.
/* unwind the stack by 'pushed' bytes: */
/* 'pushed' is the sum of bytes pushed onto our callstack, */
/* the sum of sizes from our input args */
if (cLinkage == CDECL_LINKAGE) {
_asm {
pop eax ; save our result. might be wrong!
add esp, pushed ; change the stackpointer
push eax ; restore the result
ret
}
}
A hot tip how to handle varargs is to study or even use the vacall library from the CLISP ffcall package: [www] but the macros in stdarg.h or varargs.h C headers might also do it.
Problem: What happens if GetTempPath and friends wants to return a string which is larger than the pointer holds?
The clib malloc knows how large the pointer areas is, but this knowledge is not "reflective". The API function doesn't know how large the string is, that's why it needs the second argument with the string size. Okay. Lisp knows that, because it stores types with every variable and a clib system should know it as well, There's a communication problem. No big deal, in lisp we cannot rely on an O(1) strlen as well as in Pascal.
The real problem is what GetTempPath will do now. Our version will bark (if strlen is lower than the result), but other functions might try an realloc and pass the string on this new pointer back. This happens a lot with ** vars (in the last decade renamed to &), so called "pointers to pointers" or "references". Most of them allocate in the callee and expect the caller to free it. Impossible in the lisp world and bad design! At least providing a MyFree function would be nice. What if the callee linked to a "smartheap" (TM) malloc library? We are lost. (This really happens everyday in AutoCAD. The API has to be designed very carefully then. The Win32 API is okay btw.)
When foreign functions will move our lisp strings around, our system will fail because we do the memory handling by ourself. So we have to forbid that. Either we allocate that much lisp-compatible room in advance so that the called functions will not create new space or better: Handle our own pool of foreign memory: "foreign pointers", CMUCL calls it "aliens".
This memory area (our c-heap) uses simple system malloc, so we don't care when the called function will do a malloc, free, realloc or strdup. Or even new, delete, new[], delete[] and so on. But our foreign pointers need to be registered to the garbage collector to get free'd by the clib free function when we don't need it anymore. This is called "Finalization". In AutoLISP e.g. this is a problem because we don't have access to the gc sources, and cannot just hook into the gc. No major problem, we just release memory automatically when our FFI is unloaded or when the user requests it. (ffi-free c-ptr) or (ffi-freeall).
Finalization is a minor problem (only a small problem for the implementor), the real problem is the increased level of complexity we bought with this new data type "foreign pointer", in short c-type or c-ptr.
In my proposal I argue that it is even better to write a C or C++ wrapper DLL to handle external memory and even structs and arrays, because this is more natural to write than cref in the high-level language orgies, it is easier to ensure that this wrapper DLL uses the same memory management by linking statically to the same memory library, and you have clean, side effect free exported functions.
Having the whole foreign pointer suite also requires a lot of constants and typedefs to add to your language. C/C++ needs that to ensure proper type safety, and is only needed by the compiler, the running image is free of those. Not so in our environment. We have to buy all these constants (see e.g. all the windows header constants and typedefs) into our run-time image. I really doubt if it's worth and if it's readable.
The drawback of a wrapper DLL is another level of indirection and decreased flexibility. Users have to switch languages and environments. toadd comments and samples
So I came to the simple design of the argument types:
A direct value is just a number: size
An indirect value is a list with non-nil second value: (size T ...)
To add options to the arguments a direct value is a number or a list with nil as second element: (4 nil options...)
Another brain simple idea on the complete other side is this high-level definition:
;;; taken from the header or docs:
;;; DWORD GetTempPath(
;;; DWORD nBufferLength, // size, in characters, of the buffer
;;; LPTSTR lpBuffer // address of buffer for temp. path
;;; )
(ffi-defun-dll GetTempPath
"DWORD GetTempPath(DWORD nBufferLength,LPTSTR lpBuffer)"
:pascal-linkage)
(GetTempPath ((strlen s) s)
From this it is not short to automatic header parsers:
A helper tool or even a high-level function parses the headers and generates all function definitions or even wrappers automatically. Currently I see three approaches for header parsers.
This simplified output is then parses for the real needed stuff. (ACL and FFIGEN use this fool-proof approach)
This is done massively in Corman Lisp, where we have now at least three different header parsers. From the simple one, over advanced with custom reader macros (this is really great), to prototpyes with even greater level of abstraction. (say very good). Vassily Bykow is currently working on the better ones.
There exist several portable C header parsers in lisp:
For functions already in memory such as kernel utilities without header or parsable one have to know the name (or address) and signature. Dumped images are just not as friendly as DLL's or so's. If so, you could call any random address in your process space and cross your fingers.
But this is not my idea. I know of a few people actually doing this and more in production code. It is called run-time patching. You can buy products which hook into known system or application functions, and change this to their own fixed or changed version. Virus like to do this as well.
Popular are just packed letters "NN", and structured lists ((:int :in)...).
We'll come to this later when we'll discuss c-types. Of course the simpliest way would be to avoid dealing with c-types, the great unknown, at all.
My current problem is only how to to design the emacs FFI and
how change the design of my AutoLISP WinAPI correctly.
If I feel okay with the design I can add the needed GC hook
(to finalize external objects) and add the complicated utils to
deal with foreign objects (side effects, c-structs, malloc vs gc
and more of such weird problems) later.
I already did that for AutoLISP and use the various Corman LISP
and perl FFI's a lot. And I know the assembler tricks for various
cpu's. (gcc is fine to study)
It should be no major technical problem.
So I started to write down this huge document evaluating and documenting all different approaches to this problem, from static wrappers (C API for python, perl, emacs, ...) over automatic wrapper generation tools (like SWIG, COM, CORBA, ...) to dynamic FFI's (dozens of Lisp FFI's, java JNI, VB, Perl Win32API+C::DynaLib) and finally to generic language independent foreign function generation solutions (Assembler, virus-like, dynamic compilers or Bruno Haible's ffcall)
The ultimate question is "KISS" or "Good", which I want to solve. This might be seen as highly political but really should be read in terms of pros and contras from two viewpoints: The implementation and the user. Unfortunately my conclusion will be seen political.
At first I'll cite the classic "Worse is better" ("Good News-Bad News-How to Win Big") by Richard Gabriel:
I and just about every designer of Common Lisp and CLOS has had extreme exposure to the MIT/Stanford style of design. The essence of this style can be captured by the phrase ``the right thing.'' To such a designer it is important to get all of the following characteristics right:One should add that Gabriel this paper might have been the reason/... why he switched from Common Lisp to C++ (where his project failed. Certainly ironic).
- Simplicity - the design must be simple, both in implementation and interface. It is more important for the interface to be simple than the implementation.
- Correctness - the design must be correct in all observable aspects. Incorrectness is simply not allowed.
- Consistency - the design must not be inconsistent. A design is allowed to be slightly less simple and less complete to avoid inconsistency. Consistency is as important as correctness.
- Completeness - the design must cover as many important situations as is practical. All reasonably expected cases must be covered. Simplicity is not allowed to overly reduce completeness.
I believe most people would agree that these are good characteristics. I will call the use of this philosophy of design the ``MIT approach.'' Common Lisp (with CLOS) and Scheme represent the MIT approach to design and implementation.
The worse-is-better philosophy is only slightly different:
- Simplicity - the design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.
- Correctness - the design must be correct in all observable aspects. It is slightly better to be simple than correct.
- Consistency - the design must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency.
- Completeness - the design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.
Early Unix and C are examples of the use of this school of design, and I will call the use of this design strategy the ``New Jersey approach.'' I have intentionally caricatured the worse-is-better philosophy to convince you that it is obviously a bad philosophy and that the New Jersey approach is a bad approach.
However, I believe that worse-is-better, even in its strawman form, has better survival characteristics than the-right-thing, and that the New Jersey approach when used for software is a better approach than the MIT approach.
Let me start out by retelling a story that shows that the MIT/New-Jersey distinction is valid and that proponents of each philosophy actually believe their philosophy is better.
Two famous people, one from MIT and another from Berkeley (but working on Unix) once met to discuss operating system issues. The person from MIT was knowledgeable about ITS (the MIT AI Lab operating system) and had been reading the Unix sources. He was interested in how Unix solved the PC loser-ing problem. The PC loser-ing problem occurs when a user program invokes a system routine to perform a lengthy operation that might have significant state, such as IO buffers. If an interrupt occurs during the operation, the state of the user program must be saved. Because the invocation of the system routine is usually a single instruction, the PC of the user program does not adequately capture the state of the process. The system routine must either back out or press forward. The right thing is to back out and restore the user program PC to the instruction that invoked the system routine so that resumption of the user program after the interrupt, for example, re-enters the system routine. It is called ``PC loser-ing'' because the PC is being coerced into ``loser mode,'' where ``loser'' is the affectionate name for ``user'' at MIT.
The MIT guy did not see any code that handled this case and asked the New Jersey guy how the problem was handled. The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again. The MIT guy did not like this solution because it was not the right thing.
The New Jersey guy said that the Unix solution was right because the design philosophy of Unix was simplicity and that the right thing was too complex. Besides, programmers could easily insert this extra test and loop. The MIT guy pointed out that the implementation was simple but the interface to the functionality was complex. The New Jersey guy said that the right tradeoff has been selected in Unix-namely, implementation simplicity was more important than interface simplicity.
The MIT guy then muttered that sometimes it takes a tough man to make a tender chicken, but the New Jersey guy didn't understand (I'm not sure I do either).
Now I want to argue that worse-is-better is better. C is a programming language designed for writing Unix, and it was designed using the New Jersey approach. C is therefore a language for which it is easy to write a decent compiler, and it requires the programmer to write text that is easy for the compiler to interpret. Some have called C a fancy assembly language. Both early Unix and C compilers had simple structures, are easy to port, require few machine resources to run, and provide about 50%-80% of what you want from an operating system and programming language.
Half the computers that exist at any point are worse than median (smaller or slower). Unix and C work fine on them. The worse-is-better philosophy means that implementation simplicity has highest priority, which means Unix and C are easy to port on such machines. Therefore, one expects that if the 50% functionality Unix and C support is satisfactory, they will start to appear everywhere. And they have, haven't they?
Unix and C are the ultimate computer viruses.
A further benefit of the worse-is-better philosophy is that the programmer is conditioned to sacrifice some safety, convenience, and hassle to get good performance and modest resource use. Programs written using the New Jersey approach will work well both in small machines and large ones, and the code will be portable because it is written on top of a virus.
It is important to remember that the initial virus has to be basically good. If so, the viral spread is assured as long as it is portable. Once the virus has spread, there will be pressure to improve it, possibly by increasing its functionality closer to 90%, but users have already been conditioned to accept worse than the right thing. Therefore, the worse-is-better software first will gain acceptance, second will condition its users to expect less, and third will be improved to a point that is almost the right thing. In concrete terms, even though Lisp compilers in 1987 were about as good as C compilers, there are many more compiler experts who want to make C compilers better than want to make Lisp compilers better.
The good news is that in 1995 we will have a good operating system and programming language; the bad news is that they will be Unix and C++.
There is a final benefit to worse-is-better. Because a New Jersey language and system are not really powerful enough to build complex monolithic software, large systems must be designed to reuse components. Therefore, a tradition of integration springs up.
How does the right thing stack up? There are two basic scenarios: the ``big complex system scenario'' and the ``diamond-like jewel'' scenario.
The ``big complex system'' scenario goes like this:
First, the right thing needs to be designed. Then its implementation needs to be designed. Finally it is implemented. Because it is the right thing, it has nearly 100% of desired functionality, and implementation simplicity was never a concern so it takes a long time to implement. It is large and complex. It requires complex tools to use properly. The last 20% takes 80% of the effort, and so the right thing takes a long time to get out, and it only runs satisfactorily on the most sophisticated hardware.
The ``diamond-like jewel'' scenario goes like this:
The right thing takes forever to design, but it is quite small at every point along the way. To implement it to run fast is either impossible or beyond the capabilities of most implementors.
The two scenarios correspond to Common Lisp and Scheme.
The first scenario is also the scenario for classic artificial intelligence software.
The right thing is frequently a monolithic piece of software, but for no reason other than that the right thing is often designed monolithically. That is, this characteristic is a happenstance.
The lesson to be learned from this is that it is often undesirable to go for the right thing first. It is better to get half of the right thing available so that it spreads like a virus. Once people are hooked on it, take the time to improve it to 90% of the right thing.
A wrong lesson is to take the parable literally and to conclude that C is the right vehicle for AI software. The 50% solution has to be basically right, and in this case it isn't.
But, one can conclude only that the Lisp community needs to seriously rethink its position on Lisp design. I will say more about this later.
(against Gabriel)
The gnu / C against the lisp philosophy.
In fact Lisp against everything else also (VB, perl, java)
A design model for an emacs FFI
Perl uses it's own malloc, as Lisp does, so one migth learn from there. But Perl had to add a seperate costly GC sweep at shutdown to finalize foreign objects. Every FFI I know better cannot avoid this.
(ffi-load "kernel32") ; seperate load from define to be
; able to call already known procadresses
; as well.
; some "Good"
(ffi-def-foreign-call "Sleep" (integer) :returning :void)
; or CLISP like
(ffi-def-foreign-call Sleep (:arguments (ms int :in)))
; or CMUCL like
(def-alien-routine "Sleep" void (ms int :in))
; some "KISS"
; chars as types as in VB or OpenScheme or perl
(ffi-def-foreign-call "Sleep" "(L)")
; or like this, stringified signatures in C or Pascal style
(ffi-def-foreign-call "Sleep" "(int ms): void")
(sleep 100) ; call it (wait 100ms non-blocking)
E.g. for the HtmlHelp call it looks like this in my Corman Lisp FFI:
;;; spawn with args, searches exe in path
(ffi:defun-dll spawnvp
((mode :long)
(cmdname (char *))
(argv ((char *) *)))
:library-name "MSVCRT.DLL"
:entry-name "SpawnVP"
:linkage-type :c)
;;; now just call spawnvp with args
(defun call-htmlhelp (file &rest args)
(let ((cfile (ffi:lisp-string-to-c-string "hh.exe"))
(pbuf nil)
;; need list of c-strings
(cpaths (mapcar #'ffi:lisp-string-to-c-string (cons file args))))
(do ((pbuf (ffi:malloc (* (length cpaths) (ffi:sizeof '(char *)))))
(i 0 (1+ i)))
((= i (length cpaths)))
(setf (ffi:cref (char *) pbuf i) (nth i cpaths)))
(prog1
(spawnvp 1 cfile pbuf)
(ffi:free pbuf))))
;; or better directly like this:
(defwinapi HtmlHelp
((hwndCaller HWND)
(pszFile LPCSTR)
(uCommand UINT)
(dwData DWORD))
:return-type HWND
:library-name "hhctrl.ocx"
:entry-name "HtmlHelpA"
:linkage-type :pascal)
;;; The Win32 types were imported seperately from the header, they are
;;; nothing important. Like this:
(defwinconstant HH_KEYWORD_LOOKUP #x000D) ; ...
;;; Create a new instance of HtmlHelp and do a keyword search.
;;; We could also look for a running instance instead.
(defun HH-KEYWORD-LOOKUP (file keyword)
(let ((hwnd (GetDesktopWindow))
(cfile (ffi:lisp-string-to-c-string file))
(ckey (ffi:lisp-string-to-c-string keyword))
(link (ffi:malloc (ffi:sizeof HH_AKLINK)))
(hhwin (ffi:malloc (ffi:sizeof HH_WINTYPE)))
(plink (ffi:malloc (ffi:sizeof (DWORD *)))))
(with-cref-slots () ; c-struct convenience util
(link HH_AKLINK)
(setf cbStruct (ffi:sizeof HH_AKLINK))
(setf fReserved nil)
(setf pszKeywords ckey)
(setf pszUrl cl::C_NULL)
(setf pszMsgTxt cl::C_NULL)
(setf pszMsgTitle cl::C_NULL)
(setf pszWindow cl::C_NULL)
(setf fIndexOnFail T))
(HtmlHelp hwnd cfile HH_KEYWORD_LOOKUP plink)
(ffi:free link)(ffi:free hhwin)(ffi:free plink)))
;; While looking at it now, I doubt if this is really correct.
Note, that one cannot draw the apparent simple conclusion from these examples, say KISS is better or Good is better. At the first glance there seems to be only the difference in the signature syntax, but it will lead to various questions, like how to deal with different coding styles. Side effect vs side effect free, constructs to help in functional programming, passing not so immediate values per value, avoiding the integer result and return the references per value, mark params subject to side effects for memory handling, hook into the garbage collector, ...
We'll come to a detailled discussion later. (where?) First we'll briefly describe all the known language extensions:
LANGUAGE EXTENSIONS
FFI is not the only way for language extensions. At first I list all
known ways how to extend you high-language with functions from
lower-level languages or other libaries, either to extend the
language, to extend the reach (the API), or not to reinvent the
wheel.
In detail I will describe those extensions listed below. This was all
I could find. There might be more methods and FFI's around, but they
are out of my horizon (e.g. more functional languages implementations as some more
Schemes, Prolog, Forth [www],
Erlang, ...) or too expensive or just out
of my reach (various Mac packages such as MCL).
First a short and very simple example, later with more technical
explanation and summaries.
I also hope to come to discuss callbacks ("foreign callables") and the different strategies involved:
You extend emacs with elisp. You interface with external process via comint (unix-style IPC inter process communication) for interactive processes and via unix-style filters (command-line arguments and textual results) for non-interactive processes. You can also extend elisp when you cannot solve with the tools above by write a hand-written C extensions and recompile emacs.
On Win32 you can also use DDE, use an experimental COM Automation interface (I'm currently working on this) and I'm not sure if you also can spy and hook into the win32 typical message loop. This way you could work with every windows application once you've found the correct window you want to interface with.
I just want to create an interface to Corman Common Lisp (either to the console via comint, or via a custom COM interface to the server directly), to AutoLISP (OLE Automation is there but limited, so only Win32 messages seem to be feasable), AutoCAD in general and several other useful utilities (WinHelp, HtmlHelp, ftp). I'm completely new to emacs so I will have to learn more on the mechanisms and maybe also politicial issues involved in this.
There exists no FFI, no dynamic extensions (shared libs though some Win32 C functions do load some dll's), emacs doesn't support threads or exceptions. Doing GUI is hard and limited (though people do that). There's no Tk or Win32 GUI binding, but a X windows alike widget set. It has an eval, you do construct functions on the fly.
The timeline for new features for xemacs is here (i.e. a long term project): [proposals]. Blue Sky (23.x) will have a dynaloader and OLE/COM support. But it also wants to replace the lisp engine, which is a major issue in contrast to the two others, so I doubt if it will make it this century.
See Proposals for my FFI considerations.
To prove that above, an annecdote:
In fact Emacs has a dynaloader as I found out recently, but there was
apparently no need for it. RMS accepted it 1998 for FSF Emacs but it never
appeared in the sources. XEmacs already has (dll-open) but nobody uses it.
Recently the wheel was reinvented by Steve Kemp, who wrote a new and
different (and IMHO worse) dynaloader. We have some use for it, but time
will tell...
The advantage of dynaloading is cheap extendability, the disadvantage the overhead of writing the loader and API documentation for the interface. Static linkage requires a lot of discipline. (namespaces, coding style, source code availability, compiler availability)
Such extensions should really be platform independent as the core. It is questionable to what extend it will happen, with perl it works okay for most modules. But the most important point is that emacs core doesn't to care about such extensions as it has to do now.
Better possibilities than dynaloading, which is limited to in-process shared libs, are of course the simple STDIN/STDOUT filtering mechanism through seperate executables, which made unix strong, and communication through sockets and a simple message passing protocol. This requires some care at the server and the client but is completely location transparent (in-proc, out-proc, lan).
A problem for STDIN/STDOUT filters as with emacs are the platform incompatibilities with custom shells and their limitations: different string quoting, path delimiters, length limits, wildcard expansion, ...
A typical unportable .emacs example:
(setenv "SHELL" "E:\\cygnus\\cygwin-b20\\H-i586-cygwin32\\bin\\bash.exe")
Perl is extendable static and dynamically. There is a primitive and hacky intermediate language called XS and a XS precompiler written in perl which compiles to C or C++. Then you can link this extension into the core or load it dynamically at run-time. Some utilities to convert c header files or similar usable libraries do exist.
I write perl extensions for about half a year, mainly to use C libraries but also to overcome some perl limitations. I wrote an array package for space-efficient typed arrays which fits very nicely into the perl syntax, say it's totally transparent besides the initialisation. Operator overloading: [], new[], delete[], and most of the other array operators.
For perl exist also two modules to call foreign shared libraries:
Win32API (Win32-MSVC only, no callbacks) and C::Dynalib (highly
portable, also callbacks).
It is easily extendable and embeddable. Doing callbacks is easy.
Perl has a weird and brief syntax and semantics, esp. with interfaces,
compared to lisp or python. There's no support for threads or
exceptions. Doing GUI is hard. The Tk binding and the Win32 API
binding involves writing massive wrappers on the C side as with
python.
Same with COM. COM support is by far not as good as for VB, python or ACL. It has an eval, you construct perl functions on the fly very easily (from perl and from C).
Python is also extendable static and dynamically. You typically write extensions in C++ but you can also do C. Python fits much nicer to C++ whilst perl fits better to C and lisp. You can link this extension into the core or load it dynamically at run-time.
I didn't wrote python extensions so far, but they fit even more nicely into the python syntax, say it's totally transparent. Python is known to be one of the best and easiest language to write and use extensions. Esp. C++ or say java style exceptions are supported in the language, but all this transparency comes with a cost. You have to write masive amounts of C code as with perl. There exists no FFI to call shared libraries or to define callbacks dynamically.
There's support for threads or exceptions. Doing callbacks is easy.
Doing GUI is easy. Python has C++ style or java semantics, a
well-formatted syntax, There exist GUI bindings to Tk, the Win32 API.
It has an eval, you do construct python functions on the fly.
Toadd (somewhere else?):
It is one of the first and best COM packages. They had the very
first dynamic COM server to my knowledge, back in 1996.
Pythonwin should be the example of the massive wrapper approach, because they coded everything in c++ extensions instead of using their higher and dynamic python language, which would make things MUCH easier and portable.
There's also a FFI library for python ("calldll") and a Win32
SDK library ("dynwin") on top of it by the same author.
AutoCAD provides interfaces to C++ (ARX), C (ADS), AutoLisp and via
OLE Automation to Visual Basic and such. It also provides direct
vtable COM access but currently no one is using this except special
new GUI hooks only exposing this API (menubar, certain Treeview and
Property Tab Controls). To extend Lisp or the COM interface one can
create via ADS/ARX new lisp functions or COM objects. There's a huge
amount of objects, methods, properties and events supported, via ARX
and via COM. External ARX applications are DLL (same process space),
but one can also use external EXE via COM or Automation.
Unfortunately the Lisp implementation is very limited and the ADS
interface to LISP also. So you cannot e.g. extend the language in a
reasonable way, such as adding support for vectors, hashes or any
other convenient data structure such as structs or objects,
destructive modifications on lists, other language improvements such
as macros. You can only do optional arguments which is not doable with
AutoLISP. The interface (ADS result buffer) cannot pass symbols nor
functions as arguments or result types to external functions. The
externally defined lisp function can be resolved by name (ads_defun)
or by pointer (ads_regfunc), the lisp provides a bytecode compiler, a
very nice IDE, a package manager, and protected namespaces on compiled
packages for competing MDI applications.
There are a lot of predefined callbacks ("reactors") and one can
define user-defined objects and callbacks on this objects as well.
One has to add certain callbacks to add persistency (DWG and DWF
Filers) to custom objects.
AutoCAD is not thread-safe. AutoLISP supports exceptions only
passively, only to catch OLE exceptions and do simple error handling,
but one cannot throw exceptions and catch it elsewhere, so you cannot
write AutoLISP libraries easily.
Doing GUI is very easy. There's home-brewn inheritable dialog
description language DCL (better than Microsoft's DLG or resource
format) but it is not supported anymore, one has to do VB or MFC now.
AutoLISP was the very first lisp to have OLE Automation support (even
before ACL and LWW). COM support is great and simple to use. It is
currently done via Lisp (VLA), VBA (in-proc and supported), VB (exe),
Delphi, perl and java. There's no FFI available, there's one in
development by myself, there's no way to call the exposed custom
vtable COM interfaces via lisp now but soon.
It is very easy to add lisp functions defined per name and signature
at run-time to the system. To define callbacks via the FFI one can
thing of supporting a predefined number of user-definable callbacks
(massive approach) or define it dynamically (ala ffcall). But
there's no possibility to add GC support ever, so usability of such
foreign functions dealing with foreign memory areas is limited. And
one must be strict.
ARX, the "AutoCAD runtime extension", is a C++ improvement over the
simple ADS C extension. It provides dynamic C++ extensions, some kind
of a poor mans Java or CLOS. It lets you inherit (but not overload) from
core C++ objects in your run-time loaded extensions and add
run-time methods for core objects (as Java does btw.)
Links: [home],
Extension [docs].
Erl is a C library to help in C code extensions for argument translation and
such. But there are also two Java interface libraries and a IDL parsing interface,
which allows dynamic and high-level extensions.
Mnesia Session e.g. is a foreign language interface to the
internal Mnesia Database interfaced to via IDL.
[sample]
Links:
See the DrScheme Help Desk for Dynamic extensions.
See PLT/src/mzscheme/dynsrc/oe.c for an example of the current
C extensions. Note that MzScheme names this FFI, though in my terms it is
no real dynamic FFI. It supports true dynamic extensions, a "dynaloader".
See PLT/src/mzscheme/src/dynext.c for the src and
PLT/collects/mzscheme/examples/README for more samples.
The dynext collection (PLT/collects/dynext/) is a wrapper around the
dynaloader facility, which calls the platform specific compiler and linker
to allow flexible and dynamic extensions via a dynamic compiler invocation and
dynamic linking.
See below for the planned FFI.
[www]AutoCAD API's ADS/ARX
The technique and source code of such "SELF-REGISTERING OBJECTS IN C++"
appeared in DrDobb's, 98Aug. [www]
Because C++ is not binary standardized (name mangling), it is
only supported for the C++ compiler in which the core was built
(MSVC4.2, MSVC 6), in contrast to the previous C based ADS which
supported almost all compilers.Elk
Elk is a unix based scheme with good C/C++ integration and extendability
(esp. on C++ classes), a good dynaloader (better than just dlopen), and
argument type conversion support on the C level (in contrary to a dynamic
FFI which would do that on the scheme level).
Elk has been designed and is used specifically as an embeddable, reusable
extension language subsystem for applications written in C or C++.
Erlang Erl Interface
Erlang is just another great high-level functional language package
that recently went open source. It was developed by Ericsson, Sweden.
It is pretty large and overwhelmingly "over-featured" (see also [Poplog]).
Just see the [doc mainpage]
"Erlang doesn't really have an FFI, apart from linked-in drivers.
(given your definition that the foreign functions should execute
in the same process space.)
Using a linked-in driver, you can load a C routine as a shared object.
All external interfaces, including linked-in drivers, are instantiated
as "ports", which appear to the Erlang programmer essentially as a
process, with which you can communicate via message passing.
Except for linked-in drivers, all external programs execute as
separate processes outside the Erlang VM." (Ulf Diger)
So it's similar to emacs, which mainly uses comint for external communication.
Arbitrary ports are of course better than just STDIN/STDOUT with all the inherent
shell problems (portable string quoting, wildcard expansion, ...).
I do like the simplicity of this design. This is the world unix was built upon.
[home],
[c interface].
Automatic wrapper/stub creation in the native language
todomzScheme
MzScheme is widely known as "Rice scheme" and the core of "PLT", which
is one of the biggest and best organized Scheme distributions.GambitC
todoBigloo foreign
tocheckC code and Bigloo code can be merged together. Bigloo
functions can call C functions and vice-versa, Bigloo code can use C
global variables and vice-versa. Bigloo functions and variables can
hold C values (C types values). C values can be allocated from Bigloo
program and vice-versa, Bigloo data structures can point to C data
structures and vice-versa.
Links:
[www],
[sample]
Automatic stub function generator to use C functions in OpenLisp
SWIG is a C++ program which parses source headers and/or an intermediate format for functions to create C wrappers for calls from or into a high-level language and to transport the data-structures. Something like IDL, esp. CORBA IDL mappings.
The good point is better abstraction from the low-level interfaces and the low-level language C with almost no support for better data structures. The bad point is just another intermediate language you have to understand. My practice showed that I either prefer a real FFI, staying in my high-level language, or at the other end do the extension in C, maybe with a little help from the created SWIG wrappers, but doing it manually or with the help of other existing tools it is often better and easier. This happens to me for AutoLISP and perl all the time.
Of course, if you have to create a mapping from your library to the various supported languages (currently only python, perl, guile and java, but you can extend it) SWIG is a good option for you.
Links:
SWIG [www]
The COM and CORBA IDL ("Interface description language") is pretty the same. Microsoft extended the CORBA IDL a little bit and named it ODL, and for COM+ they extended it even more, but I'll have to look at that later. This is in flux, recently they cut off most interesting planned features for dynamic extensions.
After having written the IDL (or created that otherwise: by parsing the language or with a modelling tool such a rationale rose) a MIDL compiler creates the inter-language wrappers and stubs to support transparency across process or even machine boundaries. You don't care for the target language, you don't care if the client is a DLL loaded into your application space, a seperate exe or sitting on a network on a seperate machine. MIDL generates all the code.
COM and CORBA could also go to the SWIG section but they do a little bit more than just creating cross-language or -system function calling interfaces. They also require a lot of work on the server side. The outgoing interface is massively complex and I personally don't really see the big advantages over shared libs, which can be linked traditionally with Dynaloading or more modern with an FFI. However, we are only interested in the incoming interface, the IDL, so we stay with that.
A wonderful place to see various software patterns in the wild: Vendor Lock In, Big Ball of Mud in C or C++ implementations. (see Links)
There exists no proper binding to Common Lisp. Well, ACL, LWW can do COM quite well, Corman Lisp has limited COM support, but this is not what we would call a proper object request broker. (interface inheritance, persistency, ...) Hope someone will correct me here.
The most popular COM scripting language is by far Visual Basic (which is not that bad on the interface side, see [WhyChooseMicrosoftAndVb]). Also used is pythonwin (client and server), perl, VisualJ (aka MS java) and some other minor languages (AutoLISP). See below and in the FFI section for a explicit list of available COM bindings. Most Win32 specific bigger packages provide it.
Links:
IDL Reference [ref],
IDL Critics [www],
Kraig Brockschmidt Inside OLE [online book],
MS Announcements [msj],
OLE/COM Architecture Critics (serious model flaws)
[www],
Visual Basic [www],
WhyChooseMicrosoftAndVb at wiki [wiki]),
COM+ at wiki [wiki]
Patterns: Vendor Lock In [?], Big Ball of Mud [www]
Interfacing implementation do it in their favorite extension language, which is C for perl and most others, C++ for python and C or the internal native compiler for lisps. Only such lisps (leaving other languages with dynamic native code compilers aside. Even java has it now but is limited by its VM specs) can theoretically implement truly dynamic COM servers (as ACL does). Python has this also on its feature list since 1996, but must buy this by maintaining a static C++ wrapper for each supported interface. This is good to have and stable but nothing for my lazy taste.
The two automation possibilities in short:
IDispatch is the normally used interface, most clients support only this.
It is very simple and quite slow. Keywords are variants, safearrays, late
binding, type libraries. Some very simple interfaces require only one
function call: Invoke, better ones also parse the server types via the
Typeinfo interface(s) to 1.) map the argument types dynamically and do the
argument checking and 2.) to generate the native functions on the fly
(as VB and Visual Lisp).
The faster custom interface is modelled after a C++ class where the object links a list of methods of all virtual functions, the vtable. This way you directly call the vtable function which is normally the typical C++ class function, so it comes with almost no additional cost for C++ implementors to provide such custom interfaces to its objects. However the client has to know the exact types, it is very similar to a FFI approach and and has additional documentation costs. Most bigger Win32 applications provide a "dualinterface", that is both.
The list:
See the Erlang IDL library, which dynamically uses IDL as interface language. [up]
Input parameters are pushed by value, output parameters by reference (a pointer to the value).
In Visual Prolog, you explicitly list the interfaced
language; this simplifies the problems inherent in calling
conventions, such as activation record format, naming convention and
returning conventions.
Return values of floating point values are exceedingly troublesome to handle. They may be returned in registers, on the (emulated) coprocessor stack, and the pascal calling convention will frequently return them through pointers. Currently pascal functions cannot return floating point values.Sample:
| Language type | Convert name to upper case | Add leading underscore | Push args Reversed | Adjust SP after return | NT naming convention |
|---|---|---|---|---|---|
| pascal | X | ||||
| c | X | X | X | ||
| stdcall | X | X | X | ||
| syscall | X | X |
Links:
(ffi::def-foreign-call name-and-options arglist
&key call-direct [callback] convention
returning method-index [release-heap] arg-checking
[optimize-for-space])
old ACL FFI:
(ffi::defforeign lispname
&key entry-point unconverted-entry-name arguments
pass-types arg-checking prototype
return-type language convert-symbol print address
remember-address call-direct [callback])
(ffi::defun-dll name param-list
&key return-type library-name entry-name linkage-type)
(fli:register-module "KERNEL32")
(fli:define-foreign-function
(farenheit-to-celsius "FarenheitToCelsius" :source)
((farenheit :int))
:result-type :double
:language :ansi-c)
(alien:load-foreign "-lc") ; the c lib
(alien:alien-funcall (extern-alien "Sleep" (function void (* foo)))
(addr f))
(def-alien-routine "Sleep" void (ms int :in))
(def-call-out Sleep (:arguments (ms int :in)))
(define sleep (foreign:declare "kernel32.dll" "Sleep" "" "n"))
From: Chris Double chris@double.co.nz Newsgroups: comp.lang.lisp Subject: Re: Why no standard foreign language interface? Date: 11 Feb 2000 06:54:15 +1300Paul Meurer has a common FFI layer in his SQL/ODBC library that works with LispWorks and Allegro (among others). If the original poster wants such an API they may want to take a look at that. It is available as a contribution in the cl-http distribution.
This is a sample implementation for a common FFI layer between mcl, allegro and lispworks to interface to a shared ODBC library (mswin: odbc32.dll, unix: adabas/odbc/adabasodbc.so, mac-mcl: vsi:ODBC$DriverMgr).
Links:
[src]
Note that this is not complete. I'm still gathering more functional language FFI's (Scheme, Erlang, Prolog, ...)
To make some oversimplifying classification first:
I like the CLOS to c-type binding, (similar to Harlequin Dylan) so you can e.g. subclass from foreign types and do other fine things. Their docs are great.
Links:
Franz Allegro Common Lisp FFI at franz.com [home], [docs]
You can do practically everything within the language, such as defining convenient FFI's, COM Interfaces if custom or IDispatch, define callbacks. The GC knows foreign pointers and finalizes them automatically. There are some nice abstraction tools to deal with c-types and c-structs, and there are some parsers for c code written in lisp and hooked into the readtable. Everything is still in flux, improved c parsers are in the works, also a CLOS to ctype mapping as in ACL. The parsers are already "better" than in ACL, but not as reliable.
The FFI is a very "Good" one, in my rating the second best, behind ACL. Because it is good it is highly abstract and complicated. Thanksfully you can study the full sorces and even trace through the execution in the assembler code. (When a stepper will become available, promised for version 1.4, 1Q2000)
Links:
Corman Common Lisp [www],
Corman Common Lisp FFI discussions at deja [deja thread]
See my ACL notes. I like it, it is good and full-featured.
Links:
Harlequin Lispworks FLI at xanalys.com, the new company site: [home],[docs],
Harlequin Lispworks FLI at harlequin.com [old home],
[old docs]
Links:
CMUCL alien package at cons.org [home],
[docs]
Then starting with the Amiga FFI ("AFFI"), extended to the portable callback library it became dynamic, the foreign functions were created and linked on-the-fly, with automatic type conversion between Lisp and C types and dynamic loading of C object modules. (#define DYNAMIC_FFI)
Links:
CLISP FFI [home],
[docs]
You may look up classes "dynamically" by name at runtime with the methods:
Smalltalk at: #ClassName
Little Smalltalk has none.
We have an advanced FFI providing:(Eliot Miranda, eliotm@pacbell.net)
- dynamic and static linking to C code
- dynamic and static linking to COM interfaces
- automatic construction of interfaces though parsing technology (i.e. we parse a C header file and generate the FFI automatically; the user simply chooses which functions and types they wish to use and the system generates the transitive closure of the types)
- threaded call-outs and call-backs (i.e. callouts can be scheduled on native threads, and callbacks can be accepted from any thread)
See also these postings:
[ffi desc],
pointers to other recent FFI attempts: [wiki],
WinAPI FFI [www]
(compatible to Visual Smalltalk 3.1 and Dolphin)
The complete Harlequin Win32 API was built on top of this C-FFI. The new Functional Developer Dylan v2.0 has amongst other features a native code compiler, full ActiveX support, native threads.
typedef struct {
unsigned short x_coord;
unsigned short y_coord;
} Point;
Gwydion Dylan is an implementation of the Dylan programming language for Unix systems. Originally written as a research project by the Gwydion group at CMU, it is now maintained by a group of volunteers.There is no dynamic FFI to my knowledge, but interfacing is easy at the static C level because dylan is compiled to C code. The interface is called Melange. [manual]. Mindy, the Gwydion Dylan bytecode interpreter, does dynamic loading. The follower of Melange is called "Pidgin". A sample:The current version of Gwydion is development code, and intended only for use by Dylan fanatics. The compiler is slow, lacks shared library support and still needs lots of bug fixes. To make life more exciting, the documentation is incomplete, and you'll need to read the source and ask questions on the mailing list. If this sounds like fun, you'll enjoy Gwydion.
Thanks to the skilled programmers at CMU, Gwydion can already generate exceptionally efficient code (half the speed of C in most cases) and implements about 98% of the Dylan standard with many extra libraries.
You can also subclass from c-pointers as with the Harlequin Dylan and define custom type mapping functions.
Some articles:
The FFI ([ref]) can do external call-outs, callbacks, external data types such as pointers and objects, several languages call conventions, external callbacks as closures (!) [ref], a GC-free foreign heap.
Haskell provides several rich extension possibilities: A FFI called the "Glasgow's New Foreign Function Interface", an automatic wrapper generator called C->Haskell, a IDL compiler called HaskellDirect, an ActiveX server and client library called ActiveHaskell.
C->Haskell was developed for the GTK binding. It is a C source parser which works pretty well for an automatic binding generator.
The FFI which comes with Hugs/GHC is a "primitive" (KISS) one.
A foreign import declaration is only allowed as a toplevel declaration. It consists of two parts, one giving the Haskell type (prim type), Haskell name (varid) and a ag indicating whether the primitive is unsafe, the other giving details of the name of the external function (ext fun) and its calling interface (callconv.) Giving a Haskell name and type to an external entry point is clearly an unsafe thing to do, as the external name will in most cases be untyped. The onus is on the programmer using foreign import to ensure that the Haskell type given correctly maps on to the type of the external function.The Chalmers Haskell CCall library is even more primitive, but different.
Links:
Not yet online. I'm working on this for one year now but couldn't come
to a nice design solution.
AutoLISP users are comparable to VB users, so there's an inherent usage and learning problem.
I don't want to bother them with the current ugly interface possibilities.
There are three interface layers: high-level, medium-level and low-level.
Only the low-level (e.g. integers for type descriptions) and some medium level interfaces
are yet implemented. Functional interfaces as in other LISP or ML kind languages
seem to be kind of hard to hack. I try to avoid stringification for the type description
for performance reasons, though it seems to be the only solution in this restricted world.
Links:
Links:
Guile's dynamic linking, [old home]
Links:
OpenScheme OF, [www]
The FFI design is a simple one, you don't have to deal with foreign types at all, all data is copied and translated automatically and held in foreign memory, only MIPS based systems have explicit access to these foreign-object and foreign-pointertypes. Only strings are handled differently.
Though the FFI, for Win32 yet only, is a very primitive one yet.
Warning: The FFI as it stands has several flaws which make it difficult to use reliably. It is expected that both the interface to and the mechanisms used by the FFI will be changed in the future. We provide it, and this documentation, only to give people an early start in accessing some of the features of Win32 from Scheme. Should you use it in an experiment we welcome any feedback.(Chris Hanson, Jan 2000) Links:
This is also brand new. Have to check this out (v0.53). Designed by Richard Kelsey and Michael Sperber.
Micheal writes: "The distribution contains complete documentation as well. The FFI deals with (among other things) precise memory management, callbacks, and continuations."
(EXTERNAL-CALL external arg1 arg2 ...)Calls the external value, passing it the number of arguments (as a long), and a pointer to a C array containing the rest of the arguments (also a long). Don't mess with the array, it is really the Scheme 48 argument stack. The arguments are probably in reverse order, I can't remember
Links:
Scheme 48 [home],
[docs]
In short, it has a dynaloader (load-extension filename), [ref
??]
, and a dynamic extension facility "dynext", which calls
the platform specific compiler and linker to allow flexible and
dynamic extensions via a dynamic compiler invocation and dynamic
linking (similar to GambitC),
no FFI yet.
As of writing this text Eli Barzilay, eli@CS.Cornell.EDU,
http://www.cs.cornell.edu/eli/ started to write a real mzScheme FFI,
but it is not generic enough yet and is has very low priority for him.
It's a very simple design. His remaining problems are varargs and
foreign pointers. Sources were posted recently to the plt-scheme mailinglist,
plt-scheme@fast.cs.utah.edu.
PLT Home: [www]
Brian Beckman wrote a brief essay, with examples, for Calling the C world from the Scheme World. [www]
Hostile Foreign-Function Interface in SIODWe wrote a Hostile FFI for George Carette’s wonderful, little SIOD program (http://people.delphi.com/gjc/siod.html). The FFI is hostile because the C code doesn’t need to cooperate. We can run kernel32, COM, Ole Automation, etc. ...
todo
Larceny is Lars T Hansen's academic Scheme from 91 and 92, currently running on the SPARC architecture only. It implements several good GC's and the Twobit compiler optimizer.
Larceny has a nice FFI and provides a general foreign-function interface (FFI) substrate on which other FFIs can be built.
A particular feature is worth to mention:
Dumped heaps with foreign memory are not saved as-is, instead for all
functions their names and relative filenames as given as argument are
stored and dynamically relinked on load-image. So this is quite portable.
I have to evaluate which other implementations do this as well.
Links:
The first perl FFI library which had most needed features, with the simple perl pack-style argument type signature. Supports callbacks, various platforms natively and via the massive wrapper approach (called 'hack30') all others with a 30-arg limitation.
Very nice and very good examples. Unfortunately marketing was very bad, no one knew about this module, so that various people had to re-invent the wheel in the perl world. See below. Some also reported that it doesn't compile with Borland on Win32 (?), but it did recently on Win95.
Links:
Perl C::DynaLib at CPAN [www], [docs]
A very simple and limited FFI. No callbacks, only Win32 and MSVC, no c-structs and arrays, similar to Visual Basic. Very popular in Win32 circles who knew nothing about C::DynaLib
This module was in fact the inition to start my own. I knew other FFI's and also studied the Corman Lisp assembler code, but you know... So thanks to Aldo Capini. (He himself thanks Andrea Frosini who explained it in a italian magazine). Thanksfully I could help Aldo in the double part in his lib this year.
Sidestep: His Win32::GUI for perl is also very nice. I use it a lot. Not as as advanced as Visual Basic but conceptionally better (whilst keeping the Semantics compatible to Tk). Threading is a minor problem so far. And GUI Building.
Links:
Perl Win32::API is not at ActiveState (should be there
but couldn't find it), and not at CPAN (maybe too hackish), only at
perl.dada.it [www]. If it will
arive at CPAN somewhen it will be there: [CPAN]
A new module called "FFI" only. Completely fresh stuff. The first module I know of which finally uses Haible's excellent ffcall. It makes no use of the closure feature yet (perl can do closures), uses a overly simplified syntax, cannot do c-structs and c-arrays, cannot handle the memory problems outlined above yet. At least it should mention these in the docs.
But it is very alpha level (v0.01), 0.02 is currently in the works.
Links:
Perl FFI at CPAN [www],
[docs]
The simple FFI library for python ("calldll") can only do Win32.
It is based on the MIT Scheme FFI and also uses the 30 argument hack trick.
Passing/returning long long (quadint) or doubles from the FPU are also not
supported but there are some simple memory buffer helpers to fill and get
pieces at the c-side (struct slots, strings, ...)
[www],
[ftp]
Sample:
calldll.call_foreign_function ( function_address, in_args_format_string, result_format_string, argument_tuple )where
in_args_format_string is a standard PyArg_ParseTuple string,
and result_format_string is a single-char string which is used to
build a python return value from the result of the function call.
Another helper library "npstruct" is used to parse/"unparse" binary structures into native types (similar to the perl builtins pack/unpack).
First there was the raw native interface RNI and a java COM interface by Microsoft, then came Netscape's JRI. On this base SUN developed it's new JNI.
See this sample from Sun's website
As you see this is a typical KISS situation. The signature is packed into single letters as "(I)V". The function is first registered using GetStaticMethodID, then invoked by CallStaticVoidMethod (or friends). A simplier and more user-friendly approach could have combined those two calls and cache the first calculation of parameters conversion on subsequent calls. On the other side makes the separation absolutely clear what type is being returned.
mid is no function, it's just an argument. Some languages don't allow functional objects or let functions create on the fly. This way you overcome this.
I really started to like this simpliest of the simpliest approaches.
Automagical dynamically created functions are a problem for beginners
and hard to document, say not documentable at all. But communities used to
function passing (the scheme community in particular) will not accept this
argument of course.
This simple way everything is clear, only performance might suffer. So you
should not use that for time-critical functions, such as events in a message
loop or hi-res timers.
Links:
JNI History at sun [ref],
JNI sample a sun [sample]
Links:
Visual Basic at MS [www],
Declare Reference, comments at [wiki]
So it's reasonable to argue that any major cross-platform application could also consist of only a tiny platform dependent binary, whilst the rest could be loaded dynamically via platform specific FFI calls under an abstract code base in a high-level language (such as Lisp). This would make e.g. emacs or AutoCAD more portable (e.g. between Motif and Win32), smaller, easier to maintain and most important easier to customize.
[New] William Perry's gtk-emacs tries to do the same now. What a coincidence. All the gtk bindings are created dynamically, via a simple emacs ffi. With todays hardware obviously no preformance problem anymore. See his [site]
Sample:
Links:
[www]
This is very generic and portable library which supports for the calling-out the automatic parameter conversion, putting and receiving them from the function call stack and for the call-ins to put a function with the argument receiver and a function framework supporting also closures! (captured lexical variables to hold a private function state) onto the heap, mark it as executable, and flush the CPU instruction cache. Ready to be called.
So it's not only a callback framework, also a framework for call outs, for building a generic FFI solution. It was developed with the CLISP. It also comes with CLISP under ffcall/. It was recently enhanced to be re-entrant.
The higher Lisp systems do practically the same but only for their language. gcc does also the same. But ffcall is usable by every language which can link to C libraries. So you may see it as some kind of dynamic gcc (without the optimizer of course).
Links:
CLISP FFCALL package [www],
ffcall [www]
What is libffi?
Compilers for high level languages generate code that follows certain conventions. These conventions are necessary, in part, for separate compilation to work. One such convention is the "calling convention". The "calling convention" is a set of assumptions made by the compiler about where function arguments will be found on entry to a function. A "calling convention" also specifies where the return value for a function is found.Some programs may not know at the time of compilation what arguments are to be passed to a function. For instance, an interpreter may be told at run-time about the number and types of arguments used to call a given function. Libffi can be used in such programs to provide a bridge from the interpreter program to compiled code.
The libffi library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run-time.
Ffi stands for Foreign Function Interface. A foreign function interface is the popular name for the interface that allows code written in one language to call code written in another language. The libffi library really only provides the lowest, machine dependent layer of a fully featured foreign function interface. A layer must exist above libffi that handles type conversions for values passed between the two languages.
You need a basic understanding of assembler to understand how a function is invoked and how it returns its values back to the caller. Once you understood this and if you can use inline assembler you can incooporate this it into code. If there's no inline assembler available, you must help yourself by using an external assembler and put the raw bytes in some kind of code-vector. (BTW: Lisp compilers do that as well).
Better solutions are dynamic. They create such functions frameworks at run-time, as every native lisp compiler does.
Best see ffcall, libffi or better the Corman Lisp FFI or perl C::Dynalib or perl Win32::API what happens there. The first two have to support many architectures so it is hard to read the basics beyond all the conditional and uppercase defines. The last are only for Win32 MSVC, so very easy to read, without #define THIS and THAT.
Also every other machine language compiler such as gcc creates such functions, but those compilers are not really dynamic. They store the result in executable files, ready to be called later.
Other good resources are:
It is done using pForth, which is simple, raw and flexible enough to do everything. Kind of interpreted assembler, maybe only comparable to Corman Lisp's inline assembler, but very small. See [www]. However, the project doesn't seem to be developed anymore.
Callbacks ("foreign callables") are easiest be done with an wrapper in C through which the calling external function is passed to your function. Tosupport multiple incoming interfaces with different function signature and names there exist two basic strategies:
trampolines are dynamically created functions in machine code, either on the stack (no memory mangement needed) or on the heap. So it's first class data, almost lisp for C. This code may involve a private environment ("closure") and code. Some heavy platform dependent wizardry might be applied to ensure the data tagged to be executable. Different optimizing approaches can be used when and where the types are converted, "pre-compiled" or "dynamic". The typical implementation is Bruno Haible's ffcall/trampoline library, borrowing from gcc. With good lisp compilers such platform dependent wizardry is generally not needed, LAP is a better assembler.
Various interesting links:
In the interface you need the type mapping from this low-level number and bool to your high level types the language supports.
All arguments are passed by value (copied). Some destructive modifications on arguments must be handled by C wrappers (e.g. compiled to a DLL/so, called by the dynaloader)
One could also use just 's', but the '&' denotes that one might expect the string to be modified by the called function (side-effect).
The integer must be allocated, free'd, set and get on foreign memory by the caller.
Just a few examples:
Most (not all!) Win32 functions which handle strings, directly or indirectly, have two versions--an ANSI version for 8-bit characters and a wide (Unicode) version for 16-bit characters. The functions are distinguished by appending A or W to the function name.(from the assembler tutorial: [www])Microsoft uses several libraries, one per DLL. The link names for Win32 functions are "decorated" names. The rule is simple: prepend an underscore (_) and append an at-sign (@) and the number of argument bytes in decimal. So the ANSI version of GetMessage, which has four DWORD arguments, is linked as _GetMessageA@16 (_ + GetMessage + A + @ + 4*4). This decoration scheme is primarily for the convenience of Microsoft compilers. The link name doesn't always have this relationship to the entry point name.
The HP 300 series required a prepended underscore on dlsym names in shared library calls. All subsequent machines not anymore. Such exceptions will clutter up the code with lot's of machine dependent #ifdefs.
Extract from my workaround for not sun-style dlopen library calls:
The approach would be to signal the foreign error or exception to the caller in the most friendly and transparent way. If the called function supports exceptions it should be propagated to the caller. If the called function returns just an error code and the caller is exception friendly, which comes mostly parallel to side-effect friendlyness, the error should raise an exception in the FFI. If the caller cannot handle exceptions or wants to deal with error return values instead the FFI must catch it and convert to an error code.
Rich and complicated interfaces such as MS COM let you define custom error interfaces to be able to map it to the possible exception objects in the caller language or to pass custom or enriched information. python for example has a very good exception handled FFI with pythonwin COM. It lets you hook into the debugger to inspect the problem dynamically.
The minimal approach would be to catch all exceptions (esp. needed with COM) and return the error default value (nil with lisp, #f with scheme, undef with perl, ...) on any error or exception.
My current design considerations go like this (KISS style):
Avoid c-structs and c-arrays by reference, support only by value, Thereby completely avoid foreign pointers and the needed helper functions and memory problems.
This disallows for example win32 like window structs and more win32 struct stuff and efficient usage of large datastructures in multiple calls (e.g. for my computional geometry).
But since such c-struct lisp code is as readable as the typical c code (usually even worse) I favor to write the glue code in c/c++ and export functions with meaningful return values to lisp there, in a special wrapper dll. This is by far more readable and slightly less easier to write. (external compile-link)
Exception handling can also be done there because some languages don't support it.
(Below is my old design. I came to better solutions now)
The macro ffi-def-foreign-call associates a foreign function with a Lisp symbol, allowing it to be called as a Lisp function would be called. (In spirit to the ACL FFI, the Corman Common Lisp FFI and various convenience wrappers, the Harlequin Lispworks FLI, the CMUCL alien package, the CLISP FFI, to my own private AutoLISP FFI, the Gambit C-interface, Guile's dynamic linking, the OpenScheme OF, ...)
At first we deal only with immediately represantable types and simple strings called by value only, but no pointers and more by reference stuff, so we don't have to care that much about foreign memory allocation, deallocation (gc finalization!) and dereferencing. Future versions might add the full foreign pointer helper suite, plus def-c-type, def-foreign-callable and such.
(ffi-load "kernel32") ; seperate load from define to be
; able to call already known procadresses
; as well.
; Good:
(ffi-def-foreign-call "Sleep" (integer) :returning :void)
; or CLISP like
(ffi-def-foreign-call Sleep (:arguments (ms int :in)))
; or CMUCL like
(def-alien-routine "Sleep" void (ms int :in))
; KISS (chars as types as in VB or OpenScheme)
(ffi-def-foreign-call "Sleep" "(L)")
; or like this, most natural to C
(ffi-def-foreign-call "Sleep" "(int ms): void")
(sleep 100) ; call it (wait 100ms non-blocking)
Expert for AutoCAD and AutoCAD programming.
Author of the AutoLISP FAQ and of the AutoLISP Standard Library.
perl CPAN id RURBAN,
WinHelp for perl and the first pod2rtf converter generation.
Recently Tie::CArray (external arrays) and still working on the computional
geometry module suite.
Playing a lot with Corman Lisp (doing the AutoCAD port) and it's FFI.
Other hobbies include arts and media such as virtual reality, movies and theatres (stage design class), logistic problems such as architecture, city planning and AI, geometrical problems such as surveying and 3d modeling, software libraries and documention en large.
Granted is permission to use and copy this text for any purpose. Modifications are disallowed. Granted is permission to distribute this text for any personal, non-commercial purpose. Other permissions only per written agreement with the copyright holder.
Some names in this text are trademarked. Because of the technical background knowledge required to understand this text I assume the correct associations between the names and marks on the user side. It is what it seems to be and nothing else.
(You might copy and use this (c) notice. It's yours. Had fun writing it.)