This chapter presents a forth kernel and a forth Cross-Compiler for the TMS320C5x DSP.
Last update: |
06-February-1999 |
State (estimation) |
55% |
This chapter includes the following sections:
To use this Cross-Compiler, you need:
· a PC running either Windows 95/98/NT or QNX/Photon, with a free serial port (either com1: or com2:).
· the Cross-Compiler software
·
a TMS320C5x DSP starting kit (C50 evaluation card)
htpp://www.ti.com/
· a serial cable as link between the DSK and the PC.
You can't use the Cross-Compiler without the C50 card: the Cross-Compiler always generate code into the target (althought it should not be a big deal to modify the CC to generate into the host memory, a kind of paging functions which never write...). If the umbilical link is broken, the CC will fail and will be not able to generate code.
The Cross-Compiler software is provided as a wfroth (forth) source. It must be compiled with the forth running on the host machine, either under Windows or Qnx/Photon. This can be done by the next wfroth commands:
" /wfroth/ti_c50/fcc_c50.wft" !curr_file
COM1: load
You'll find convenient to put this command into the file startup.wft which is opened and interpreted by wfroth after his initialization. COM1: should be replace by COM2: if the umbilical link is com2. Note that the serial port is set to 57,600 bauds.
I choose to implement a 'direct call' forth model: each call to a definition is translated to the native DSP instructions CALL/RTS. Also, some of the basic forth definitions (DUP or DROP for example) are translated into only one or two native instructions and act as a macro. The cross-compiler tries also to use delayed instructions (such delayed branch, call, rts) to save cycles (use of the pipeline).
Memory Map |
|
0000 |
ROM, It vectors, monitor |
0A00 |
Start of user's forth code |
1A00 |
Start of user's forth data |
2A00 |
Run-time (words defined as subroutine, not macros) |
2BB0 |
Stub code (used to execute user code) |
2BD0 |
Forth Parameters Stack |
2B20 |
Forth definitions as Subroutines (vs. Macros) |
2C00 |
|
· The Parameters Stack is 32 words size and begins at address $2BD0 (data).
· User generated code begins at address $0A00 (code). Size is $1A00 - $A00 = $1000 (4K)
· Variables begin at address $1A00 (data). Size is $2A00 - $1A00 = $1000 (4K)
The cross-compiler runs under a forth (wfroth), on the host machine. Because the CC takes care of all definitions related to the compilation process (handling of the dictionnary, code generation, etc.), there is a 'minimal' forth into the target, I mean only words such +, DROP, SWAP, @, etc. The interpret of the CC knows two modes:
· host mode (HOST)
· target mode (TARGET)
In the host mode, the CC does not try to generate code into the target. For example, if we enter a number, the code sequence generated to push this literal value on the stack is done for the host. We can also interact with the target with some words such dm@, pm@, dm!, pm!, run, etc.
In the target node, the CC try to generate code into the target.
The CC changes the way it displays the number of elements currently on the parameters stack:
· in host mode: (2) for example means that we have currently 2 values on the host stack
· in target mode: {3} for example means that we have currently 3 values on the target stack.
Many words are redefined by the CC to change their behavior depending the host/target mode. This is the case for example for words such .S : ; int dup drop etc.
I have choosen a very simple model for the target forth:
C50 register |
Forth kernel |
AR7 |
Parameters Stack pointer |
ACC |
Top Parameters Stack Value (cached) |
AR6 |
Address Register |
That means that registers AR6, AR7 and ACC can't be changed (or save/restore them). Also, some words use temporary AR3, AR4 and AR5.
The top stack value is cached into the Accumulator (ACC) register. Let's take an example. We have 4 values stacked on the Parameters Stack ( --- 10 20 30 40 ):
2BD0 |
2BD1 |
2BD2 |
2BD3 |
|
|
10 |
20 |
30 |
AR7=2BD4 |
|
|
|
|
ACC=40 |
The Stack Pointer (AR7) points to the next empty cell of the stack. Therefore, we privilegiate pushing a new value on stack (SACL *+).
LITERAL (16-bits literal value)
There are 2 cases, depending the size of numeric constant.
Case where 0 <= n <= 255
SACL *+ |
90A0 |
LACL #imm |
B9XX |
Case where n < 0 or n > 255
SACL *+ |
90A0 |
LACC #imm |
BF80 XXXX |
Even if LIT is 3 words, LITERAL acts as a Macro. Note that the Cross-Compiler can remove the last compiled literal when he optimize (example: 1000 >).
SP! |
( ... --- ) |
SP! is only one word and compile the next code:
LAR AR7,#SP0 |
BF0F 2BD0 |
DEPTH |
( --- n ) |
DEPTH is 3 words and is compiled as a call the next subroutine:
SACL *+ |
90A0 |
LAMM AR7 |
0817 |
RETD |
FF00 |
SUB #S0-1 |
BFA0 2BCF |
DUP |
( n --- n n ) |
DUP is only one word and compile the next code:
SACL *+ |
90A0 |
DDUP |
( n1 n2 --- n1 n2 n1 n2 ) |
Alias: 2DUP |
2DUP is 4 words and is compiled as a call the next subroutine:
SACL *- |
9090 |
LAR AR5,*+ |
05A0 |
RETD |
FF00 |
MAR *+ |
8BA0 |
SAR AR5,*+ |
85A0 |
OVER |
( n1 n2 --- n1 n2 n1 ) |
OVER is 3 words and compile the next code:
SACL *- |
9090 |
LACC *+ |
10A0 |
MAR *+ |
8BA0 |
ABOVE |
( n1 n2 n3 --- n1 n2 n3 n1 ) |
Alias: PLUCK |
ABOVE is 5 words and is compiled as a call the next subroutine:
SACL *- |
9090 |
MAR *- |
8B90 |
LACC *+ |
10A0 |
RETD |
FF00 |
MAR *+ |
8BA0 |
MAR *+ |
8BA0 |
DROP |
( n --- ) |
DROP is 2 words and compile the next code:
MAR *- |
8B90 |
LACC * |
1080 |
UNDER |
( n1 n2 --- n2 ) |
Alias: NIP |
UNDER is 1 word and compile the next code:
MAR *- |
8B90 |
BELOW |
( n1 n2 n3 --- n2 n3 ) |
BELOW is 4 words and is compiled as a call the next subroutine:
MAR *- |
8B90 |
LAR AR5,*- |
0590 |
RETD |
FF00 |
SAR AR5,*+ |
85A0 |
SACL * |
9080 |
SWAP |
( n1 n2 --- n2 n1 ) |
SWAP is 4 words and is compiled as a call the next subroutine:
MAR *- |
8B90 |
LAR AR4,* |
0480 |
RETD |
FF00 |
SACL *+ |
90A0 |
LAMM AR4 |
0814 |
ROT |
( n1 n2 n3 --- n2 n3 n1 ) |
ROT is 6 words and is compiled as a call the next subroutine:
MAR *- |
8B90 |
LAR AR5,*- |
0590 |
LAR AR4,* |
0480 |
SAR AR5,*+ |
85A0 |
RETD |
FF00 |
SACL *+ |
90A0 |
LAMM AR4 |
0814 |
-ROT |
( n1 n2 n3 --- n3 n1 n2 ) |
-ROT is 6 words and is compiled as a call the next subroutine:
ROT: |
MAR *- |
8B90 |
|
LAR AR5,*- |
0590 |
|
LAR AR4,* |
0480 |
|
SACL *+ |
90A0 |
|
RETD |
FF00 |
|
SAR AR4,*+ |
84A0 |
|
LAMM AR5 |
0815 |
NEGATE |
( n1 --- n2 ) |
n2 = -n1 |
NEGATE is 1 word and compile the next code:
NEG |
BE02 |
+ |
( n1 n2 --- n3 ) |
n3 = n1 + n2 |
There are 2 versions of the + word. The cross-compiler will choose the right version depending the context. The goal is to optimize the code. All versions act as a macro:
+ (use the two values from the stack, no optimisation)
MAR *- |
8B90 |
ADD * |
2080 |
LIT+ (optimisation, remove the last LIT generated by the CC)
There are two versions depending the size of the constant:
ADD #XX |
B8XX |
XX is a 8-bits value |
ADD #XXXX |
BF90 XXXX |
XXXX is a 16-bits value |
- |
( n1 n2 --- n3 ) |
n3 = n1 - n2 |
There are 2 versions of the - word. The cross-compiler will choose the right version depending the context. The goal is to optimize the code. All versions act as a macro:
- (use the two values from the stack, no optimisation)
MAR *- |
8B90 |
SUB * |
3080 |
NEG |
BE02 |
LIT+ (optimisation, remove the last LIT generated by the CC)
There are two versions depending the size of the constant:
ADD #XX |
B8XX |
XX is a 8-bits value |
ADD #XXXX |
BF90 XXXX |
XXXX is a 16-bits value |
* |
( n1 n2 --- n3 ) |
n3 = n1 * n2 |
* is 4 words and is compiled as a call the next subroutine:
SACL *- |
9090 |
LT *+ |
73A0 |
MPY *- |
5490 |
PAC |
BE03 |
== |
( n1 n2 --- f ) |
Alias: = |
|
f = (n1==n1) ? TRUE : FALSE |
!= |
( n1 n2 --- f ) |
Alias: <> |
|
f = (n1!=n1) ? TRUE : FALSE |
< |
( n1 n2 --- f ) |
|
|
f = (n1<n1) ? TRUE : FALSE |
<= |
( n1 n2 --- f ) |
|
|
f = (n1<=n1) ? TRUE : FALSE |
> |
( n1 n2 --- f ) |
|
|
f = (n1>n1) ? TRUE : FALSE |
>= |
( n1 n2 --- f ) |
|
|
f = (n1>=n1) ? TRUE : FALSE |
There are 3 versions of each of these words. The cross-compiler will choose the right version depending the context. All versions act as a call to subroutine:
Previous compiled literal i8 and 0 <= i8 <= 255
SUB #i8 |
BA80 |
CALL lit== |
7A80 XXXX |
Previous compiled literal i16 and i16 < 0 or i16 > 255
CALLD lit== |
7E80 XXXX |
SUB #i16 |
BFA0 XXXX |
No previous compiled literal
CALL == |
7A80 XXXX |
== |
|
!= |
|
< |
|
<= |
|
> |
|
>= |
MAR *- |
MAR *- |
MAR *- |
MAR *- |
MAR *- |
MAR *- |
|||||
SUB * |
SUB * |
SUB * |
SUB * |
SUB * |
SUB * |
|||||
NOP |
NOP |
NOP |
NOP |
NOP |
NOP |
|||||
XC 2.NEQ |
XC 2.EQ |
XC 2.LEQ |
XC 2.LT |
XC 2.GEQ |
XC 2.GT |
|||||
LACC #0 |
LACC #0 |
LACC #0 |
LACC #0 |
LACC #0 |
LACC #0 |
|||||
RET |
RET |
RET |
RET |
RET |
RET |
|||||
RETD |
RETD |
RETD |
RETD |
RETD |
RETD |
|||||
LACC #-1 |
LACC #-1 |
LACC #-1 |
LACC #-1 |
LACC #-1 |
LACC #-1 |
lit== |
|
lit!= |
|
lit< |
|
lit<= |
|
lit> |
|
lit>= |
NOP |
NOP |
NOP |
NOP |
NOP |
NOP |
|||||
XC 2.NEQ |
XC 2.EQ |
XC 2.GEQ |
XC 2.GT |
XC 2.LEQ |
XC 2.L T |
|||||
LACC #0 |
LACC #0 |
LACC #0 |
LACC #0 |
LACC #0 |
LACC #0 |
|||||
RET |
RET |
RET |
RET |
RET |
RET |
|||||
RETD |
RETD |
RETD |
RETD |
RETD |
RETD |
|||||
LACC #-1 |
LACC #-1 |
LACC #-1 |
LACC #-1 |
LACC #-1 |
LACC #-1 |
This is an optimisation of the sequence "constant !". 3 words are generated, and this word acts as a macro.
LIT! |
|
LIT!+ |
|
LIT!- |
|||
MAR *,AR6 |
8B8E |
MAR *,AR6 |
8B8E |
MAR *,AR6 |
8B8E |
||
SPLK #XXXX,*,AR7 |
AE8F XXXX |
SPLK #XXXX,*+,AR7 |
AEAF XXXX |
SPLK #XXXX,*-,AR7 |
AE9F XXXX |
3 words are generated, and this word acts as a macro.
! |
|
!+ |
|
!- |
|||
MAR *-,AR6 |
8B9E |
MAR *-,AR6 |
8B9E |
MAR *-,AR6 |
8B9E |
||
SACL *,AR7 |
908F |
SACL *+,AR7 |
90AF |
SACL *-,AR7 |
909F |
||
LACC * |
1080 |
LACC *+ |
10A0 |
LACC *+ |
10A0 |
3 words are generated, and this word acts as a macro.
@ |
|
@+ |
|
@- |
|||
SACL *+ |
90A0 |
SACL *+ |
90A0 |
SACL *+ |
90A0 |
||
MAR *,AR6 |
8B8E |
MAR *,AR6 |
8B8E |
MAR *,AR6 |
8B8E |
||
LACC *,AR7 |
108F |
LACC *+,AR7 |
10AF |
LACC *-,AR7 |
109F |
The implementation of A> is 2 words and acts as a Macro.
SACL *+ |
90A0 |
LAMM AR6 |
0816 |
The implementation of +! and -! are 4 words and acts as a Subroutine call.
+! |
|
-! |
||
MAR *,AR6 |
8B8E |
MAR *,AR6 |
8B8E |
|
ADD * |
2080 |
SUB * |
3080 |
|
RETD |
FF00 |
RETD |
FF00 |
|
SACL *,AR7 |
908F |
SACL *,AR7 |
908F |
|
LACC * |
1080 |
LACC * |
1080 |
1 word is generated, and this word acts as a macro.
B addr |
7980 addr |
In some situations (next 2 words after the branch are 2 single words or 1 double words instruction, not a branch or a rts), then the branch B instruction is replaced by a delayed branch BD instruction to save 2 cycles.
BD addr |
7D80 addr |
4 words are generated, and this word acts as a macro.
BCNDD addr,EQ,UNC |
F388 addr |
MAR *- |
8B90 |
LACC * |
1080 |
4 words are generated, and this word acts as a macro.
BCNDD addr,NEQ,UNC |
F388 addr |
MAR *- |
8B90 |
LACC * |
1080 |
1 word are generated, and this word acts as a macro.
RET |
EF00 |
3 words are generated, and this word acts as a macro.
RETCD NEQ |
FF08 |
MAR *- |
8B90 |
LACC * |
1080 |
The next words are useful in the HOST mode. These words do NOT compile code into the target, but likely interact with it.
DM@ |
( a --- w ) |
Read the 16-bits value at the given target data address.
PM@ |
( a --- w ) |
Read the 16-bits value at the given target program address.
DM! |
( w a --- ) |
Write a 16-bits value at the given target data address.
PM! |
( w a --- ) |
Write a 16-bits value at the given target program address.
RUN |
( a --- ) |
Run the code on the target starting at given address. Code must be endded by a RTE.
The next words are useful to see what is compiled:
Instruction |
Size |
Speed |
Comment |
literal (8-bits 0-255) |
2 |
|
|
literal (16 bits 256-65536) |
1+2 |
|
|
creation of an integer variable |
1 |
|
No code is generated. |
invocation of an integer variable |
2 |
|
|
! |
1+1+1 |
|
|
@ |
1+1+1 |
|
|
DEPTH |
1+1+2 |
|
Call to subroutine. |
DUP |
1 |
|
Macro. |
2DUP |
4 |
|
|
OVER |
1+1+1 |
|
|
ABOVE |
1+1+1+1 |
|
Call to subroutine. |
DROP |
1+1 |
|
|
UNDER (alias: NIP) |
1+1 |
|
|
BELOW |
1+1+1+1 |
|
|
SWAP |
1+1+1+1 |
|
|
ROT |
1+1+1+1+1+1 |
|
Call to subroutine. |
-ROT |
1+1+1+1+1+1 |
|
Call to subroutine. |
NEGATE |
1 |
|
Macro. |
+ |
1+1 |
|
|
- |
1+1+1 |
|
|
* |
1+1+1+1 |
|
|
Definition |
Stack |
Description |
|
sp! |
( ... --- ) |
Reset the parameters stack, i.e. discard all values from the stack. |
|
depth |
( --- n ) |
Return the number of parameters on the stack (prior to call depth). |
|
drop |
( n --- ) |
Discard the value from the top of the stack. |
|
dup |
( n --- n n ) |
Duplicate the value on the top of the stack. |
|
over |
( n1 n2 --- n1 n2 n1 ) |
Duplicate the value under the top of the stack. |
|
above |
pluck |
( n1 n2 n3 --- n1 n2 n3 n1 ) |
Duplicate the value under-under the top of the stack. |
pick |
( n1 --- n2 ) |
Duplicate the n-th value under the top of the stack. |
|
swap |
( n1 n2 --- n2 n1 ) |
Swap (permute) the two values on top of the stack. |
|
under |
nip |
( n1 n2 --- n2 ) |
Discard the value under the top of the stack. |
below |
( n1 n2 n3 --- n2 n3 ) |
Discard the value under-under the top of the stack. |
|
rot |
( n1 n2 n3 --- n2 n3 n1 ) |
Rotate the 3 values on top of the stack. |
|
-rot |
( n1 n2 n3 --- n3 n1 n2 ) |
Rotate the 3 values on top of the stack. |
|
tuck |
( n1 n2 --- n2 n1 n2 ) |
Make a swap then over |
|
ddrop |
2drop |
( n1 n2 --- ) |
Discard the 2 values from the top of the stack. |
ddup |
2dup |
( n1 n2 --- n1 n2 n1 n2 ) |
Duplicate the 2 values on the top of the stack. |
Definition |
Stack |
Operation |
Description |
negate |
( n1 --- n2 ) |
n2 = -n1 |
Negate the top stack value. |
+ |
( n1 n2 --- n3 ) |
n3 = n1+n2 |
Add the values on top of stack. |
- |
( n1 n2 --- n3 ) |
n3 = n1-n2 |
Sub the values on top of stack. |
* |
( n1 n2 --- n3 ) |
n3 = n1*n2 |
Multiply the values on top of stack. |
/ |
( n1 n2 --- n3 ) |
n3 = n1/n2 |
Integer division of the values on top of stack. |
/mod |
( n1 n2 --- n3 n4 ) |
n3 = n1%n2 |
Discard the value under the top of the stack. |
mod |
( n1 n2 --- n3 ) |
n3 = n1%n2 |
Modulo operation. |
rnd |
( n1 n2 --- n3 ) |
n3 = [n1..n2] |
Return a random number between n1 and n2. |
Definition |
Stack |
Operation |
Description |
||
true |
( --- tf ) |
|
Push logical true value on stack. |
||
false |
( --- ff ) |
|
Push logical false value on stack. |
||
== |
= |
( n1 n2 --- f ) |
f = (n1==n2) ? TRUE : FALSE |
Return a logical value
which is true if the given condition |
|
!= |
<> |
( n1 n2 --- f ) |
f = (n1!=n2) ? TRUE : FALSE |
Return a logical value
which is true if the given condition |
|
not |
0== |
0= |
( f1 --- f2 ) |
f2 = (f1) ? FALSE : TRUE |
Return a boolean flag which is the 'reverse' value. |
< |
( n1 n2 --- f ) |
f = (n1<n2) ? TRUE : FALSE |
Return a logical value
which is true if the given condition |
||
<= |
( n1 n2 --- f ) |
f = (n1<=n2) ? TRUE : FALSE |
Return a logical value
which is true if the given condition |
||
> |
( n1 n2 --- f ) |
f = (n1>n2) ? TRUE : FALSE |
Return a logical value which
is true if the given condition |
||
>= |
( n1 n2 --- f ) |
f = (n1>=n2) ? TRUE : FALSE |
Return a logical value
which is true if the given condition |
||
U< |
( u1 u2 --- f ) |
f = (u1<u2) ? TRUE : FALSE |
Return a logical value
which is true if the given condition |
||
U<= |
( n1 n2 --- f ) |
f = (u1<=u2) ? TRUE : FALSE |
Return a logical value which
is true if the given condition |
||
U> |
( u1 u2 --- f ) |
f = (u1>u2) ? TRUE : FALSE |
Return a logical value
which is true if the given condition |
||
U>= |
( u1 u2 --- f ) |
f = (u1>=u2) ? TRUE : FALSE |
Return a logical value
which is true if the given condition |
||
and |
( n1 n2 --- n3 ) |
n3 = n1 & n2 |
Perform a AND logical (bit) operation between the 2 top stack values. |
||
or |
( n1 n2 --- n3 ) |
n3 = n1 | n2 |
Perform a OR logical (bit) operation between the 2 top stack values. |
||
xor |
( n1 n2 --- n3 ) |
n3 = n1 ^ n2 |
Perform a XOR logical (bit) operation between the 2 top stack values. |
||
shl |
( n1 n2 --- n3 ) |
n3 = n1 << n2 |
Perform a left shift logical (bit) operation. |
||
shr |
( n1 n2 --- n3 ) |
n3 = n1 >> n2 |
Perform a right shift logical (bit) operation. |
||
: min 2dup < IF drop ELSE under ENDIF ;
Forth |
Addr |
Instruction |
Comment |
2dup |
0A0F |
CALL 2dup |
Call to subroutine |
< |
0A11 |
CALL < |
Call to subroutine |
IF |
0A13 |
BCNDD 0A1B,EQ,UNC |
(0BRANCH) |
0A15 |
MAR *- |
||
0A16 |
LACC * |
||
DROP |
0A17 |
MAR *- |
|
0A18 |
LACC * |
||
ELSE |
0A19 |
B 0A1C |
(BRANCH) |
UNDER |
0A1A |
ADD #1 |
|
; |
0A1B |
RET |
End of definition |
int cpt
: test
cpt 100 !
0 BEGIN
1 +
dup 10 < ?CONTINUE
dup 50 == ?BREAK
5 +!
AGAIN drop
@ ;
We will assume in this example that the data cell allocated for the variable cpt is at address $1A00 (data address allocated at the compilation):
Forth |
Addr |
Instruction |
Comment |
cpt |
0A00 |
LAR AR6, #1A00h |
Load the address register (Luckally ARP is not modified!) |
100 |
0A02 |
SACL *+ |
Push a literal on the stack |
0A03 |
LACC #100 |
||
+! |
0A04 |
MAR *-,AR6 |
|
0A05 |
SACL *,AR7 |
||
0A06 |
LACC * |
||
0 |
0A07 |
SACL *+ |
Push a literal on the stack |
0A08 |
LACC #0 |
||
1+ |
0A09 |
ADD #1 |
Top of the loop (BEGIN - AGAIN) - Optimisation for the sequence "1 +" |
DUP |
0A0A |
SACL *+ |
|
10 < |
0A0B |
SUB #10 |
Optimisation for the sequence "10 <" |
0A0C |
CALL lit< |
||
?CONTINUE |
0A0E |
BCNDD 0A09,NEQ |
Optimized conditionnal branch (1BRANCH) |
0A10 |
MAR *- |
||
0A11 |
LACC * |
||
DUP |
0A12 |
SACL *+ |
|
50 == |
0A13 |
SUB #50 |
Optimisation for the sequence "50 ==" |
0A14 |
CALL lit== |
||
?BREAK |
0A16 |
BCNDD 0A20,NEQ |
Optimized conditionnal branch (1BRANCH) |
0A18 |
MAR *- |
||
0A19 |
LACC * |
||
5 |
0A1A |
SACL *+ |
Push a literal on the stack |
0A1B |
LACC #5 |
||
+! |
0A1C |
CALL +! |
|
AGAIN |
0A1E |
B 0A09 |
Bottom of the loop (BEGIN - AGAIN) |
DROP |
0A20 |
MAR *- |
|
0A21 |
LACC * |
||
@ |
0A22 |
SACL *+ |
|
0A23 |
MAR *,AR6 |
||
0A24 |
LACC *,AR7 |
||
; |
0A25 |
RET |
End of definition |
Here are some work I should do in a next future:
· finish to code the control structures (WHILE-REPEAT, CASE, DO-LOOP, etc.)
· Code words such: AND OR XOR SHL SHR etc...
· Optimize the generated code (for example use delayed return on an end of a subroutine).
I'm still working on this project, and there is plenty of work to do, althought if all the concepts are there. I'll love to have some feedback. For example, is there a better forth kernel choice for the C50? I'm not sure I've choosen the smallest/fastest instructions to implement the stack machine.