Forth for the TMS320C50 DSP

This chapter presents a forth kernel and a forth Cross-Compiler for the TMS320C5x DSP.

Last update:	06-February-1999
State (estimation)	55%

This chapter includes the following sections:

What do you need to use this Cross-Compiler?
Generalities
Registers used
The Parameters Stack
Implementation
Words to be used in HOST mode
Size/Speed of the generated code
Glossary
Integer Arithmetic
Logical
First example
Second example
Some snapshots
In the Pipe-Line
Feedback

What do you need to use this Cross-Compiler?

To use this Cross-Compiler, you need:

· a PC running either Windows 95/98/NT or QNX/Photon, with a free serial port (either com1: or com2:).

· the Cross-Compiler software

· a TMS320C5x DSP starting kit (C50 evaluation card)
htpp://www.ti.com/

· a serial cable as link between the DSK and the PC.

You can't use the Cross-Compiler without the C50 card: the Cross-Compiler always generate code into the target (althought it should not be a big deal to modify the CC to generate into the host memory, a kind of paging functions which never write...). If the umbilical link is broken, the CC will fail and will be not able to generate code.

The Cross-Compiler software is provided as a wfroth (forth) source. It must be compiled with the forth running on the host machine, either under Windows or Qnx/Photon. This can be done by the next wfroth commands:

" /wfroth/ti_c50/fcc_c50.wft" !curr_file
COM1: load

You'll find convenient to put this command into the file startup.wft which is opened and interpreted by wfroth after his initialization. COM1: should be replace by COM2: if the umbilical link is com2. Note that the serial port is set to 57,600 bauds.

Generalities

I choose to implement a 'direct call' forth model: each call to a definition is translated to the native DSP instructions CALL/RTS. Also, some of the basic forth definitions (DUP or DROP for example) are translated into only one or two native instructions and act as a macro. The cross-compiler tries also to use delayed instructions (such delayed branch, call, rts) to save cycles (use of the pipeline).

Memory Map
0000	ROM, It vectors, monitor
0A00	Start of user's forth code
1A00	Start of user's forth data
2A00	Run-time (words defined as subroutine, not macros)
2BB0	Stub code (used to execute user code)
2BD0	Forth Parameters Stack
2B20	Forth definitions as Subroutines (vs. Macros)
2C00

· The Parameters Stack is 32 words size and begins at address $2BD0 (data).

· User generated code begins at address $0A00 (code). Size is $1A00 - $A00 = $1000 (4K)

· Variables begin at address $1A00 (data). Size is $2A00 - $1A00 = $1000 (4K)

The cross-compiler runs under a forth (wfroth), on the host machine. Because the CC takes care of all definitions related to the compilation process (handling of the dictionnary, code generation, etc.), there is a 'minimal' forth into the target, I mean only words such +, DROP, SWAP, @, etc. The interpret of the CC knows two modes:

· host mode (HOST)

· target mode (TARGET)

In the host mode, the CC does not try to generate code into the target. For example, if we enter a number, the code sequence generated to push this literal value on the stack is done for the host. We can also interact with the target with some words such dm@, pm@, dm!, pm!, run, etc.

In the target node, the CC try to generate code into the target.

The CC changes the way it displays the number of elements currently on the parameters stack:

· in host mode: (2) for example means that we have currently 2 values on the host stack

· in target mode: {3} for example means that we have currently 3 values on the target stack.

Many words are redefined by the CC to change their behavior depending the host/target mode. This is the case for example for words such .S : ; int dup drop etc.

Registers used

I have choosen a very simple model for the target forth:

C50 register	Forth kernel
AR7	Parameters Stack pointer
ACC	Top Parameters Stack Value (cached)
AR6	Address Register

That means that registers AR6, AR7 and ACC can't be changed (or save/restore them). Also, some words use temporary AR3, AR4 and AR5.

The Parameters Stack

The top stack value is cached into the Accumulator (ACC) register. Let's take an example. We have 4 values stacked on the Parameters Stack ( --- 10 20 30 40 ):

2BD0	2BD1	2BD2	2BD3
	10	20	30	AR7=2BD4
				ACC=40

The Stack Pointer (AR7) points to the next empty cell of the stack. Therefore, we privilegiate pushing a new value on stack (SACL *+).

Implementation

Stack Operations

LITERAL (16-bits literal value)

There are 2 cases, depending the size of numeric constant.

Case where 0 <= n <= 255

SACL *+	90A0
LACL #imm	B9XX

Case where n < 0 or n > 255

SACL *+	90A0
LACC #imm	BF80 XXXX

Even if LIT is 3 words, LITERAL acts as a Macro. Note that the Cross-Compiler can remove the last compiled literal when he optimize (example: 1000 >).

SP!

( ... --- )

SP! is only one word and compile the next code:

LAR AR7,#SP0

BF0F 2BD0

DEPTH

( --- n )

DEPTH is 3 words and is compiled as a call the next subroutine:

SACL *+	90A0
LAMM AR7	0817
RETD	FF00
SUB #S0-1	BFA0 2BCF

DUP

( n --- n n )

DUP is only one word and compile the next code:

SACL *+

90A0

DDUP

( n1 n2 --- n1 n2 n1 n2 )

Alias: 2DUP

2DUP is 4 words and is compiled as a call the next subroutine:

SACL *-	9090
LAR AR5,*+	05A0
RETD	FF00
MAR *+	8BA0
SAR AR5,*+	85A0

OVER

( n1 n2 --- n1 n2 n1 )

OVER is 3 words and compile the next code:

SACL *-	9090
LACC *+	10A0
MAR *+	8BA0

ABOVE

( n1 n2 n3 --- n1 n2 n3 n1 )

Alias: PLUCK

ABOVE is 5 words and is compiled as a call the next subroutine:

SACL *-	9090
MAR *-	8B90
LACC *+	10A0
RETD	FF00
MAR *+	8BA0
MAR *+	8BA0

DROP

( n --- )

DROP is 2 words and compile the next code:

MAR *-	8B90
LACC *	1080

UNDER

( n1 n2 --- n2 )

Alias: NIP

UNDER is 1 word and compile the next code:

MAR *-

8B90

BELOW

( n1 n2 n3 --- n2 n3 )

BELOW is 4 words and is compiled as a call the next subroutine:

MAR *-	8B90
LAR AR5,*-	0590
RETD	FF00
SAR AR5,*+	85A0
SACL *	9080

SWAP

( n1 n2 --- n2 n1 )

SWAP is 4 words and is compiled as a call the next subroutine:

MAR *-	8B90
LAR AR4,*	0480
RETD	FF00
SACL *+	90A0
LAMM AR4	0814

ROT

( n1 n2 n3 --- n2 n3 n1 )

ROT is 6 words and is compiled as a call the next subroutine:

MAR *-	8B90
LAR AR5,*-	0590
LAR AR4,*	0480
SAR AR5,*+	85A0
RETD	FF00
SACL *+	90A0
LAMM AR4	0814

-ROT

( n1 n2 n3 --- n3 n1 n2 )

-ROT is 6 words and is compiled as a call the next subroutine:

ROT:	MAR *-	8B90
	LAR AR5,*-	0590
	LAR AR4,*	0480
	SACL *+	90A0
	RETD	FF00
	SAR AR4,*+	84A0
	LAMM AR5	0815

Arithmetic/Logical operations

NEGATE

( n1 --- n2 )

n2 = -n1

NEGATE is 1 word and compile the next code:

NEG

BE02

( n1 n2 --- n3 )

n3 = n1 + n2

There are 2 versions of the + word. The cross-compiler will choose the right version depending the context. The goal is to optimize the code. All versions act as a macro:

+ (use the two values from the stack, no optimisation)

MAR *-	8B90
ADD *	2080

LIT+ (optimisation, remove the last LIT generated by the CC)

There are two versions depending the size of the constant:

ADD #XX

B8XX

XX is a 8-bits value

ADD #XXXX

BF90 XXXX

XXXX is a 16-bits value

( n1 n2 --- n3 )

n3 = n1 - n2

There are 2 versions of the - word. The cross-compiler will choose the right version depending the context. The goal is to optimize the code. All versions act as a macro:

- (use the two values from the stack, no optimisation)

MAR *-	8B90
SUB *	3080
NEG	BE02

LIT+ (optimisation, remove the last LIT generated by the CC)

There are two versions depending the size of the constant:

ADD #XX

B8XX

XX is a 8-bits value

ADD #XXXX

BF90 XXXX

XXXX is a 16-bits value

( n1 n2 --- n3 )

n3 = n1 * n2

* is 4 words and is compiled as a call the next subroutine:

SACL *-	9090
LT *+	73A0
MPY *-	5490
PAC	BE03

==	( n1 n2 --- f )	Alias: =	f = (n1==n1) ? TRUE : FALSE
!=	( n1 n2 --- f )	Alias: <>	f = (n1!=n1) ? TRUE : FALSE
<	( n1 n2 --- f )		f = (n1<n1) ? TRUE : FALSE
<=	( n1 n2 --- f )		f = (n1<=n1) ? TRUE : FALSE
>	( n1 n2 --- f )		f = (n1>n1) ? TRUE : FALSE
>=	( n1 n2 --- f )		f = (n1>=n1) ? TRUE : FALSE

There are 3 versions of each of these words. The cross-compiler will choose the right version depending the context. All versions act as a call to subroutine:

Previous compiled literal i8 and 0 <= i8 <= 255

SUB #i8	BA80
CALL lit==	7A80 XXXX

Previous compiled literal i16 and i16 < 0 or i16 > 255

CALLD lit==	7E80 XXXX
SUB #i16	BFA0 XXXX

No previous compiled literal

CALL ==

7A80 XXXX

==	!=	<	<=	>	>=
MAR *-	MAR *-	MAR *-	MAR *-	MAR *-	MAR *-
SUB *	SUB *	SUB *	SUB *	SUB *	SUB *
NOP	NOP	NOP	NOP	NOP	NOP
XC 2.NEQ	XC 2.EQ	XC 2.LEQ	XC 2.LT	XC 2.GEQ	XC 2.GT
LACC #0	LACC #0	LACC #0	LACC #0	LACC #0	LACC #0
RET	RET	RET	RET	RET	RET
RETD	RETD	RETD	RETD	RETD	RETD
LACC #-1	LACC #-1	LACC #-1	LACC #-1	LACC #-1	LACC #-1

lit==	lit!=	lit<	lit<=	lit>	lit>=
NOP	NOP	NOP	NOP	NOP	NOP
XC 2.NEQ	XC 2.EQ	XC 2.GEQ	XC 2.GT	XC 2.LEQ	XC 2.L T
LACC #0	LACC #0	LACC #0	LACC #0	LACC #0	LACC #0
RET	RET	RET	RET	RET	RET
RETD	RETD	RETD	RETD	RETD	RETD
LACC #-1	LACC #-1	LACC #-1	LACC #-1	LACC #-1	LACC #-1

Memory Operations

LIT!
LIT!+
LIT!-

This is an optimisation of the sequence "constant !". 3 words are generated, and this word acts as a macro.

LIT!		LIT!+		LIT!-
MAR *,AR6	8B8E	MAR *,AR6	8B8E	MAR *,AR6	8B8E
SPLK #XXXX,*,AR7	AE8F XXXX	SPLK #XXXX,*+,AR7	AEAF XXXX	SPLK #XXXX,*-,AR7	AE9F XXXX

!
!+
!-

3 words are generated, and this word acts as a macro.

!		!+		!-
MAR *-,AR6	8B9E	MAR *-,AR6	8B9E	MAR *-,AR6	8B9E
SACL *,AR7	908F	SACL *+,AR7	90AF	SACL *-,AR7	909F
LACC *	1080	LACC *+	10A0	LACC *+	10A0

@
@+
@-

3 words are generated, and this word acts as a macro.

@		@+		@-
SACL *+	90A0	SACL *+	90A0	SACL *+	90A0
MAR *,AR6	8B8E	MAR *,AR6	8B8E	MAR *,AR6	8B8E
LACC *,AR7	108F	LACC *+,AR7	10AF	LACC *-,AR7	109F

A>

The implementation of A> is 2 words and acts as a Macro.

SACL *+	90A0
LAMM AR6	0816

+!
-!

The implementation of +! and -! are 4 words and acts as a Subroutine call.

+!		-!
MAR *,AR6	8B8E	MAR *,AR6	8B8E
ADD *	2080	SUB *	3080
RETD	FF00	RETD	FF00
SACL *,AR7	908F	SACL *,AR7	908F
LACC *	1080	LACC *	1080

Control Structures

(branch)

1 word is generated, and this word acts as a macro.

B addr

7980 addr

In some situations (next 2 words after the branch are 2 single words or 1 double words instruction, not a branch or a rts), then the branch B instruction is replaced by a delayed branch BD instruction to save 2 cycles.

BD addr

7D80 addr

(0branch)

4 words are generated, and this word acts as a macro.

BCNDD addr,EQ,UNC	F388 addr
MAR *-	8B90
LACC *	1080

(1branch)

4 words are generated, and this word acts as a macro.

BCNDD addr,NEQ,UNC	F388 addr
MAR *-	8B90
LACC *	1080

EXIT

1 word are generated, and this word acts as a macro.

RET

EF00

?EXIT

3 words are generated, and this word acts as a macro.

RETCD NEQ	FF08
MAR *-	8B90
LACC *	1080

Words to be used in HOST mode

The next words are useful in the HOST mode. These words do NOT compile code into the target, but likely interact with it.

DM@

( a --- w )

Read the 16-bits value at the given target data address.

PM@

( a --- w )

Read the 16-bits value at the given target program address.

DM!

( w a --- )

Write a 16-bits value at the given target data address.

PM!

( w a --- )

Write a 16-bits value at the given target program address.

RUN

( a --- )

Run the code on the target starting at given address. Code must be endded by a RTE.

The next words are useful to see what is compiled:

Size/Speed of the generated code

Instruction	Size	Speed	Comment
literal (8-bits 0-255)	2
literal (16 bits 256-65536)	1+2
creation of an integer variable	1 (data)		No code is generated.
invocation of an integer variable	2
!	1+1+1
@	1+1+1
DEPTH	1+1+2		Call to subroutine.
DUP	1		Macro.
2DUP	4
OVER	1+1+1
ABOVE	1+1+1+1		Call to subroutine.
DROP	1+1
UNDER (alias: NIP)	1+1
BELOW	1+1+1+1
SWAP	1+1+1+1
ROT	1+1+1+1+1+1		Call to subroutine.
-ROT	1+1+1+1+1+1		Call to subroutine.
NEGATE	1		Macro.
+	1+1
-	1+1+1
*	1+1+1+1

Glossary

Stack Operations

Definition		Stack	Description
sp!		( ... --- )	Reset the parameters stack, i.e. discard all values from the stack.
depth		( --- n )	Return the number of parameters on the stack (prior to call depth).
drop		( n --- )	Discard the value from the top of the stack.
dup		( n --- n n )	Duplicate the value on the top of the stack.
over		( n1 n2 --- n1 n2 n1 )	Duplicate the value under the top of the stack.
above	pluck	( n1 n2 n3 --- n1 n2 n3 n1 )	Duplicate the value under-under the top of the stack.
pick		( n1 --- n2 )	Duplicate the n-th value under the top of the stack.
swap		( n1 n2 --- n2 n1 )	Swap (permute) the two values on top of the stack.
under	nip	( n1 n2 --- n2 )	Discard the value under the top of the stack.
below		( n1 n2 n3 --- n2 n3 )	Discard the value under-under the top of the stack.
rot		( n1 n2 n3 --- n2 n3 n1 )	Rotate the 3 values on top of the stack.
-rot		( n1 n2 n3 --- n3 n1 n2 )	Rotate the 3 values on top of the stack.
tuck		( n1 n2 --- n2 n1 n2 )	Make a swap then over
ddrop	2drop	( n1 n2 --- )	Discard the 2 values from the top of the stack.
ddup	2dup	( n1 n2 --- n1 n2 n1 n2 )	Duplicate the 2 values on the top of the stack.

Integer Arithmetic

Definition	Stack	Operation	Description
negate	( n1 --- n2 )	n2 = -n1	Negate the top stack value.
+	( n1 n2 --- n3 )	n3 = n1+n2	Add the values on top of stack.
-	( n1 n2 --- n3 )	n3 = n1-n2	Sub the values on top of stack.
*	( n1 n2 --- n3 )	n3 = n1*n2	Multiply the values on top of stack.
/	( n1 n2 --- n3 )	n3 = n1/n2	Integer division of the values on top of stack.
/mod	( n1 n2 --- n3 n4 )	n3 = n1%n2 n4 = n1/n2	Discard the value under the top of the stack.
mod	( n1 n2 --- n3 )	n3 = n1%n2	Modulo operation.
rnd	( n1 n2 --- n3 )	n3 = [n1..n2]	Return a random number between n1 and n2.

Logical

Definition			Stack	Operation	Description
true			( --- tf )		Push logical true value on stack.
false			( --- ff )		Push logical false value on stack.
==	=		( n1 n2 --- f )	f = (n1==n2) ? TRUE : FALSE	Return a logical value which is true if the given condition for the two top stack values are true.
!=	<>		( n1 n2 --- f )	f = (n1!=n2) ? TRUE : FALSE	Return a logical value which is true if the given condition for the two top stack values are true.
not	0==	0=	( f1 --- f2 )	f2 = (f1) ? FALSE : TRUE	Return a boolean flag which is the 'reverse' value.
<			( n1 n2 --- f )	f = (n1<n2) ? TRUE : FALSE	Return a logical value which is true if the given condition for the two top stack values are true.
<=			( n1 n2 --- f )	f = (n1<=n2) ? TRUE : FALSE	Return a logical value which is true if the given condition for the two top stack values are true.
>			( n1 n2 --- f )	f = (n1>n2) ? TRUE : FALSE	Return a logical value which is true if the given condition for the two top stack values are true.
>=			( n1 n2 --- f )	f = (n1>=n2) ? TRUE : FALSE	Return a logical value which is true if the given condition for the two top stack values are true.
U<			( u1 u2 --- f )	f = (u1<u2) ? TRUE : FALSE	Return a logical value which is true if the given condition for the two top stack values are true.
U<=			( n1 n2 --- f )	f = (u1<=u2) ? TRUE : FALSE	Return a logical value which is true if the given condition for the two top stack values are true.
U>			( u1 u2 --- f )	f = (u1>u2) ? TRUE : FALSE	Return a logical value which is true if the given condition for the two top stack values are true.
U>=			( u1 u2 --- f )	f = (u1>=u2) ? TRUE : FALSE	Return a logical value which is true if the given condition for the two top stack values are true.
and			( n1 n2 --- n3 )	n3 = n1 & n2	Perform a AND logical (bit) operation between the 2 top stack values.
or			( n1 n2 --- n3 )	n3 = n1 \| n2	Perform a OR logical (bit) operation between the 2 top stack values.
xor			( n1 n2 --- n3 )	n3 = n1 ^ n2	Perform a XOR logical (bit) operation between the 2 top stack values.
shl			( n1 n2 --- n3 )	n3 = n1 << n2	Perform a left shift logical (bit) operation.
shr			( n1 n2 --- n3 )	n3 = n1 >> n2	Perform a right shift logical (bit) operation.

First example

: min 2dup < IF drop ELSE under ENDIF ;

*Forth*	*Addr*	*Instruction*	*Comment*
2dup	0A0F	CALL 2dup	Call to subroutine
<	0A11	CALL <	Call to subroutine
IF	0A13	BCNDD 0A1B,EQ,UNC	(0BRANCH)
	0A15	MAR *-
	0A16	LACC *
DROP	0A17	MAR *-
DROP	0A18	LACC *
ELSE	0A19	B 0A1C	(BRANCH)
UNDER	0A1A	ADD #1
;	0A1B	RET	End of definition

Second example

int cpt
: test
cpt 100 !
0 BEGIN
      1 +
      dup 10 < ?CONTINUE
      dup 50 == ?BREAK
      5 +!
AGAIN drop @ ;

We will assume in this example that the data cell allocated for the variable cpt is at address $1A00 (data address allocated at the compilation):

*Forth*	*Addr*	*Instruction*	*Comment*
cpt	0A00	LAR AR6, #1A00h	Load the address register (Luckally ARP is not modified!)
100	0A02	SACL *+	Push a literal on the stack
100	0A03	LACC #100	Push a literal on the stack
+!	0A04	MAR *-,AR6
	0A05	SACL *,AR7
	0A06	LACC *
0	0A07	SACL *+	Push a literal on the stack
0	0A08	LACC #0	Push a literal on the stack
1+	0A09	ADD #1	Top of the loop (BEGIN - AGAIN) - Optimisation for the sequence "1 +"
DUP	0A0A	SACL *+
10 <	0A0B	SUB #10	Optimisation for the sequence "10 <"
10 <	0A0C	CALL lit<	Optimisation for the sequence "10 <"
?CONTINUE	0A0E	BCNDD 0A09,NEQ	Optimized conditionnal branch (1BRANCH)
	0A10	MAR *-
	0A11	LACC *
DUP	0A12	SACL *+
50 ==	0A13	SUB #50	Optimisation for the sequence "50 =="
50 ==	0A14	CALL lit==	Optimisation for the sequence "50 =="
?BREAK	0A16	BCNDD 0A20,NEQ	Optimized conditionnal branch (1BRANCH)
	0A18	MAR *-
	0A19	LACC *
5	0A1A	SACL *+	Push a literal on the stack
5	0A1B	LACC #5	Push a literal on the stack
+!	0A1C	CALL +!
AGAIN	0A1E	B 0A09	Bottom of the loop (BEGIN - AGAIN)
DROP	0A20	MAR *-
DROP	0A21	LACC *
@	0A22	SACL *+
	0A23	MAR *,AR6
	0A24	LACC *,AR7
;	0A25	RET	End of definition

Some snapshots

In the Pipe-Line

Here are some work I should do in a next future:

· finish to code the control structures (WHILE-REPEAT, CASE, DO-LOOP, etc.)

· Code words such: AND OR XOR SHL SHR etc...

· Optimize the generated code (for example use delayed return on an end of a subroutine).

Feedback

I'm still working on this project, and there is plenty of work to do, althought if all the concepts are there. I'll love to have some feedback. For example, is there a better forth kernel choice for the C50? I'm not sure I've choosen the smallest/fastest instructions to implement the stack machine.

Forth for the TMS320C50 DSP

What do you need to use this Cross-Compiler?

Generalities

Registers used

The Parameters Stack

Implementation

Stack Operations

Arithmetic/Logical operations

Memory Operations

LIT! LIT!+ LIT!-

! !+ !-

@ @+ @-

A>

+! -!

Control Structures

(branch)

(0branch)

(1branch)

EXIT

?EXIT

Words to be used in HOST mode

Size/Speed of the generated code

Glossary

Stack Operations

Integer Arithmetic

Logical

First example

Second example

Some snapshots

In the Pipe-Line

Feedback

LIT!
LIT!+
LIT!-

!
!+
!-

@
@+
@-

+!
-!