@chapter Loading Object Code

We will outline some of the features of the object loader, by William
Schelter.

When you do @code{(load "foo.o")} the output from the C compiler,
must be loaded into static space in the running KCL, and references
to external symbols must be resolved.  Originally KCL used the
loader from the underlying lisp system, calling it in a subshell,
to produce yet another file, which had the correct references
to externals.   This was then read into kcl.  The data vector (a lisp
readable vector at the end of the object file) was also read into KCL.

Unfortunately some operating systems (such as System V) do not supply
a loader capable of doing this relocation, and in any event it is fairly
slow.   Also there was no possiblity of incrementally adding new external
C symbols to an already running lisp, and then having future files refer
to them.  For example you might have a function @code{search1} written
in C, which you wished to access directly in subsequently loaded files.
This was not possible since the loader only knew about the addresses
of the external symbols in the original saved image.  

The new scheme builds a list of the external symbols into a table
called @code{c_table}.  This table is built by examining the current image.
It will be built automatically with the first call to load.  Subsequent
calls just use this table.   Of course there is the additional benefit,
that it is easy to add additional symbols to the table.  

For example if you have a file @file{try.c} which looks like

@code@{init_code()
add_symbols(joe,&joe,pete,&pete,NULL);
@}

joe(x)
object x
@{...@} 

pete()
@{...@} 
}

then joe and pete will be added to the symbol table of the current kcl.
You may refer to them as external variables in subsequent files, and
these files will load correctly, referencing these variables.   It is an error
apply add_symbol twice, to the same variable.  

The loading of files has speeded up considerably, so that a small file
with only a few small functions in it, can be loaded in less than .05 seconds.


@chapter Metering and Profiling

KCL utilities have been added, by W. Schelter, to allow one to
determine the percentage of time spent in individual functions.

Usage involves deciding which block of code one wishes to profile,
that is to say what address range, and then allocating an appropriate
size @code{*profile-array*}.  For example in the Sun version, if you
have loaded a few object files, then if you wish to meter all of kcl
and the files which you loaded you could allocate a 1 megabyte array.
This would give a roughly 2 to one reduction relative to the code
address range.  Note that the loader prints out the address at which
code is loaded.  There is also a function @code{si@:function-start
(fun)} which returns the start address of a compiled function.

In the above example after loading the file lsp/profile.o  you 
could do
@code{(si:set-up-profile 1000000)}

This allocates the 1 megabyte array, and also reads in the c symbol
table, if this has not already been done.  It also gets the addresses
of all compiled function objects currently in the image, and keeps
them in a table.  This table is called @code{combined_table} at the C
level.  The function @code{si:set-up-combined (size-of-table)} sets up
a combined table for the lisp and C functions.  This function is
called by the previous @code{si:set-up-profile} function, with a
default size-of-table of 6000.

Now to turn profiling on you do @code{(si::prof 0 90)}.  This will
start metering all addresses in the range of 0 (the first arg) to
1,000,000 * (256/90), where 90 is the second arg.  To display the data
collected so far you can invoke @code{si::display-profile} with no
arguments.  In order to clear the profile array you run
@code{(si::clear)}.  A call of @(si::prof 500000 256) would
profile the code in the address range of 500,000 to 1,500,000.
You may switch the profiler off by specifying a 0 mapping, 
ie @code{si::prof 0 0)}.   It can then be restarted by supplying
a nonzero second argument.  Of course if you start up again
with a scale different from the previous one,
without clearing the profile array, you will have gibberish.

The argument list to the last call of @{si::prof} is stored in the
variable @code{si::*current-profile*}.

Unless one is using a one to one mapping of the profile array
to the code, there is a possibility of quantization errors.
There is also the possibility of overflowing a slot in the profile
array, if the mapping is very coarse, or if the interval being measured
is very long.  

@code{
  0.08% (    9): _eql
 15.26% ( 1822): _equal
  0.01% (    1): _Fquote
  0.01% (    1): SET
  0.04% (    5): _parse_key
  0.01% (    1): _Fcond
...
  0.50% (   60): RELIEVE-HYPS1
  0.03% (    4): REMAINDER
  0.01% (    1): REMOVE-*2*IFS
  0.03% (    3): REMOVE-TRIVIAL-EQUATIONS
  4.35% (  520): REWRITE
  0.47% (   56): REWRITE-CAR-V&C-APPLY$
...}

is a sample of the output.  The first column represents percentage of
total time spent with the program counter in the range starting at
this function, up to the next named function. The second column is the
actual number of times that a profile interrupt landed in this section
of the code.  Note the default display is by address, and as mentioned
before, one should beware of overlaps, in a coarse mapping.  Functions
for which there were no ticks, are not displayed.

Note we did not sort the output, since we wished to leave it in address
order.  It is possible (because of roundoff if the second arg to prof
is small) that some calls could be credited to the adjacent function.
This could be spotted more easily if the order is by address.
It is trivial to sort the table by ticks in gnu emacs using the command
sort-columns. Have the  point set at the beginning of column, in the first line
and the mark at the end of the column in the last line.

Unfortunately the System V loader likes to separate the original C
functions of KCL, from those incrementally loaded, by about 2 megabytes.
This makes it awkward to meter both ranges simultaneously without using
a very large profile array.   It is probably reasonable to rewrite the
basic interrupt call, to handle such an address configuration.  This
has not yet been done.   Of course you can always make two runs, and combine
the information for the two ranges.