Forth for the TMS320C50 DSP

This chapter presents a forth kernel and a forth Cross-Compiler for the TMS320C5x DSP.

Last update:

06-February-1999

State (estimation)

55%


This chapter includes the following sections:


What do you need to use this Cross-Compiler?

To use this Cross-Compiler, you need:

         a PC running either Windows 95/98/NT or QNX/Photon, with a free serial port (either com1: or com2:).

         the Cross-Compiler software

         a TMS320C5x DSP starting kit (C50 evaluation card)
htpp://www.ti.com/

         a serial cable as link between the DSK and the PC.

You can't use the Cross-Compiler without the C50 card: the Cross-Compiler always generate code into the target (althought it should not be a big deal to modify the CC to generate into the host memory, a kind of paging functions which never write...). If the umbilical link is broken, the CC will fail and will be not able to generate code.

The Cross-Compiler software is provided as a wfroth (forth) source. It must be compiled with the forth running on the host machine, either under Windows or Qnx/Photon. This can be done by the next wfroth commands:

" /wfroth/ti_c50/fcc_c50.wft" !curr_file
COM1: load

You'll find convenient to put this command into the file startup.wft which is opened and interpreted by wfroth after his initialization. COM1: should be replace by COM2: if the umbilical link is com2. Note that the serial port is set to 57,600 bauds.

 

Generalities

I choose to implement a 'direct call' forth model: each call to a definition is translated to the native DSP instructions CALL/RTS. Also, some of the basic forth definitions (DUP or DROP for example) are translated into only one or two native instructions and act as a macro. The cross-compiler tries also to use delayed instructions (such delayed branch, call, rts) to save cycles (use of the pipeline).

Memory Map

0000

ROM, It vectors, monitor

0A00

Start of user's forth code

1A00

Start of user's forth data

2A00

Run-time (words defined as subroutine, not macros)

2BB0

Stub code (used to execute user code)

2BD0

Forth Parameters Stack

2B20

Forth definitions as Subroutines (vs. Macros)

2C00

 

         The Parameters Stack is 32 words size and begins at address $2BD0 (data).

         User generated code begins at address $0A00 (code). Size is $1A00 - $A00 = $1000 (4K)

         Variables begin at address $1A00 (data). Size is $2A00 - $1A00 = $1000 (4K)

The cross-compiler runs under a forth (wfroth), on the host machine. Because the CC takes care of all definitions related to the compilation process (handling of the dictionnary, code generation, etc.), there is a 'minimal' forth into the target, I mean only words such +, DROP, SWAP, @, etc. The interpret of the CC knows two modes:

         host mode (HOST)

         target mode (TARGET)

In the host mode, the CC does not try to generate code into the target. For example, if we enter a number, the code sequence generated to push this literal value on the stack is done for the host. We can also interact with the target with some words such dm@, pm@, dm!, pm!, run, etc.

In the target node, the CC try to generate code into the target.

The CC changes the way it displays the number of elements currently on the parameters stack:

         in host mode: (2) for example means that we have currently 2 values on the host stack

         in target mode: {3} for example means that we have currently 3 values on the target stack.

Many words are redefined by the CC to change their behavior depending the host/target mode. This is the case for example for words such .S : ; int dup drop etc.

 

Registers used

I have choosen a very simple model for the target forth:

C50 register

Forth kernel

AR7

Parameters Stack pointer

ACC

Top Parameters Stack Value (cached)

AR6

Address Register

That means that registers AR6, AR7 and ACC can't be changed (or save/restore them). Also, some words use temporary AR3, AR4 and AR5.

 

The Parameters Stack

The top stack value is cached into the Accumulator (ACC) register. Let's take an example. We have 4 values stacked on the Parameters Stack ( --- 10 20 30 40 ):

2BD0

2BD1

2BD2

2BD3

 

 

10

20

30

AR7=2BD4

 

 

 

 

ACC=40

The Stack Pointer (AR7) points to the next empty cell of the stack. Therefore, we privilegiate pushing a new value on stack (SACL *+).

 

Implementation

Stack Operations

LITERAL (16-bits literal value)

There are 2 cases, depending the size of numeric constant.

Case where 0 <= n <= 255

SACL *+

90A0

LACL #imm

B9XX

Case where n < 0 or n > 255

SACL *+

90A0

LACC #imm

BF80 XXXX

Even if LIT is 3 words, LITERAL acts as a Macro. Note that the Cross-Compiler can remove the last compiled literal when he optimize (example: 1000 >).

 

SP!

( ... --- )

SP! is only one word and compile the next code:

LAR AR7,#SP0

BF0F 2BD0

 

DEPTH

( --- n )

DEPTH is 3 words and is compiled as a call the next subroutine:

SACL *+

90A0

LAMM AR7

0817

RETD

FF00

SUB #S0-1

BFA0 2BCF

 

DUP

( n --- n n )

DUP is only one word and compile the next code:

SACL *+

90A0

 

DDUP

( n1 n2 --- n1 n2 n1 n2 )

Alias: 2DUP

2DUP is 4 words and is compiled as a call the next subroutine:

SACL *-

9090

LAR AR5,*+

05A0

RETD

FF00

MAR *+

8BA0

SAR AR5,*+

85A0

 

OVER

( n1 n2 --- n1 n2 n1 )

OVER is 3 words and compile the next code:

SACL *-

9090

LACC *+

10A0

MAR *+

8BA0

 

ABOVE

( n1 n2 n3 --- n1 n2 n3 n1 )

Alias: PLUCK

ABOVE is 5 words and is compiled as a call the next subroutine:

SACL *-

9090

MAR *-

8B90

LACC *+

10A0

RETD

FF00

MAR *+

8BA0

MAR *+

8BA0

 

DROP

( n --- )

DROP is 2 words and compile the next code:

MAR *-

8B90

LACC *

1080

 

UNDER

( n1 n2 --- n2 )

Alias: NIP

UNDER is 1 word and compile the next code:

MAR *-

8B90

 

BELOW

( n1 n2 n3 --- n2 n3 )

BELOW is 4 words and is compiled as a call the next subroutine:

MAR *-

8B90

LAR AR5,*-

0590

RETD

FF00

SAR AR5,*+

85A0

SACL *

9080

 

SWAP

( n1 n2 --- n2 n1 )

SWAP is 4 words and is compiled as a call the next subroutine:

MAR *-

8B90

LAR AR4,*

0480

RETD

FF00

SACL *+

90A0

LAMM AR4

0814

 

ROT

( n1 n2 n3 --- n2 n3 n1 )

ROT is 6 words and is compiled as a call the next subroutine:

MAR *-

8B90

LAR AR5,*-

0590

LAR AR4,*

0480

SAR AR5,*+

85A0

RETD

FF00

SACL *+

90A0

LAMM AR4

0814

 

-ROT

( n1 n2 n3 --- n3 n1 n2 )

-ROT is 6 words and is compiled as a call the next subroutine:

ROT:

MAR *-

8B90

 

LAR AR5,*-

0590

 

LAR AR4,*

0480

 

SACL *+

90A0

 

RETD

FF00

 

SAR AR4,*+

84A0

 

LAMM AR5

0815

 

Arithmetic/Logical operations

NEGATE

( n1 --- n2 )

n2 = -n1

NEGATE is 1 word and compile the next code:

NEG

BE02

 

+

( n1 n2 --- n3 )

n3 = n1 + n2

There are 2 versions of the + word. The cross-compiler will choose the right version depending the context. The goal is to optimize the code. All versions act as a macro:

+ (use the two values from the stack, no optimisation)

MAR *-

8B90

ADD *

2080

 

LIT+ (optimisation, remove the last LIT generated by the CC)

There are two versions depending the size of the constant:

ADD #XX

B8XX

XX is a 8-bits value

 

ADD #XXXX

BF90 XXXX

XXXX is a 16-bits value

 

-

( n1 n2 --- n3 )

n3 = n1 - n2

There are 2 versions of the - word. The cross-compiler will choose the right version depending the context. The goal is to optimize the code. All versions act as a macro:

- (use the two values from the stack, no optimisation)

MAR *-

8B90

SUB *

3080

NEG

BE02

 

LIT+ (optimisation, remove the last LIT generated by the CC)

There are two versions depending the size of the constant:

ADD #XX

B8XX

XX is a 8-bits value

 

ADD #XXXX

BF90 XXXX

XXXX is a 16-bits value

 

*

( n1 n2 --- n3 )

n3 = n1 * n2

* is 4 words and is compiled as a call the next subroutine:

SACL *-

9090

LT *+

73A0

MPY *-

5490

PAC

BE03

 

==

( n1 n2 --- f )

Alias: =

 

f = (n1==n1) ? TRUE : FALSE

!=

( n1 n2 --- f )

Alias: <>

 

f = (n1!=n1) ? TRUE : FALSE

<

( n1 n2 --- f )

 

 

f = (n1<n1) ? TRUE : FALSE

<=

( n1 n2 --- f )

 

 

f = (n1<=n1) ? TRUE : FALSE

>

( n1 n2 --- f )

 

 

f = (n1>n1) ? TRUE : FALSE

>=

( n1 n2 --- f )

 

 

f = (n1>=n1) ? TRUE : FALSE

There are 3 versions of each of these words. The cross-compiler will choose the right version depending the context. All versions act as a call to subroutine:

Previous compiled literal i8 and 0 <= i8 <= 255

SUB #i8

BA80

CALL lit==

7A80 XXXX

Previous compiled literal i16 and i16 < 0 or i16 > 255

CALLD lit==

7E80 XXXX

SUB #i16

BFA0 XXXX

No previous compiled literal

CALL ==

7A80 XXXX

 

==

 

!=

 

<

 

<=

 

>

 

>=

MAR *-

MAR *-

MAR *-

MAR *-

MAR *-

MAR *-

SUB *

SUB *

SUB *

SUB *

SUB *

SUB *

NOP

NOP

NOP

NOP

NOP

NOP

XC 2.NEQ

XC 2.EQ

XC 2.LEQ

XC 2.LT

XC 2.GEQ

XC 2.GT

LACC #0

LACC #0

LACC #0

LACC #0

LACC #0

LACC #0

RET

RET

RET

RET

RET

RET

RETD

RETD

RETD

RETD

RETD

RETD

LACC #-1

LACC #-1

LACC #-1

LACC #-1

LACC #-1

LACC #-1

 

lit==

 

lit!=

 

lit<

 

lit<=

 

lit>

 

lit>=

NOP

NOP

NOP

NOP

NOP

NOP

XC 2.NEQ

XC 2.EQ

XC 2.GEQ

XC 2.GT

XC 2.LEQ

XC 2.L

T

LACC #0

LACC #0

LACC #0

LACC #0

LACC #0

LACC #0

RET

RET

RET

RET

RET

RET

RETD

RETD

RETD

RETD

RETD

RETD

LACC #-1

LACC #-1

LACC #-1

LACC #-1

LACC #-1

LACC #-1

 

Memory Operations

LIT!
LIT!+
LIT!-

This is an optimisation of the sequence "constant !". 3 words are generated, and this word acts as a macro.

LIT!

 

LIT!+

 

LIT!-

MAR *,AR6

8B8E

MAR *,AR6

8B8E

MAR *,AR6

8B8E

SPLK #XXXX,*,AR7

AE8F XXXX

SPLK #XXXX,*+,AR7

AEAF XXXX

SPLK #XXXX,*-,AR7

AE9F XXXX

 

!
!+
!-

3 words are generated, and this word acts as a macro.

!

 

!+

 

!-

MAR *-,AR6

8B9E

MAR *-,AR6

8B9E

MAR *-,AR6

8B9E

SACL *,AR7

908F

SACL *+,AR7

90AF

SACL *-,AR7

909F

LACC *

1080

LACC *+

10A0

LACC *+

10A0

 

@
@+
@-

3 words are generated, and this word acts as a macro.

@

 

@+

 

@-

SACL *+

90A0

SACL *+

90A0

SACL *+

90A0

MAR *,AR6

8B8E

MAR *,AR6

8B8E

MAR *,AR6

8B8E

LACC *,AR7

108F

LACC *+,AR7

10AF

LACC *-,AR7

109F

 

A>

The implementation of A> is 2 words and acts as a Macro.

SACL *+

90A0

LAMM AR6

0816

 

+!
-!

The implementation of +! and -! are 4 words and acts as a Subroutine call.

+!

 

-!

MAR *,AR6

8B8E

MAR *,AR6

8B8E

ADD *

2080

SUB *

3080

RETD

FF00

RETD

FF00

SACL *,AR7

908F

SACL *,AR7

908F

LACC *

1080

LACC *

1080

 

Control Structures

(branch)

1 word is generated, and this word acts as a macro.

B addr

7980 addr

In some situations (next 2 words after the branch are 2 single words or 1 double words instruction, not a branch or a rts), then the branch B instruction is replaced by a delayed branch BD instruction to save 2 cycles.

BD addr

7D80 addr

 

(0branch)

4 words are generated, and this word acts as a macro.

BCNDD addr,EQ,UNC

F388 addr

MAR *-

8B90

LACC *

1080

 

(1branch)

4 words are generated, and this word acts as a macro.

BCNDD addr,NEQ,UNC

F388 addr

MAR *-

8B90

LACC *

1080

 

EXIT

1 word are generated, and this word acts as a macro.

RET

EF00

 

?EXIT

3 words are generated, and this word acts as a macro.

RETCD NEQ

FF08

MAR *-

8B90

LACC *

1080

 

Words to be used in HOST mode

The next words are useful in the HOST mode. These words do NOT compile code into the target, but likely interact with it.

DM@

( a --- w )

Read the 16-bits value at the given target data address.

 

PM@

( a --- w )

Read the 16-bits value at the given target program address.

 

DM!

( w a --- )

Write a 16-bits value at the given target data address.

 

PM!

( w a --- )

Write a 16-bits value at the given target program address.

 

RUN

( a --- )

Run the code on the target starting at given address. Code must be endded by a RTE.

 

The next words are useful to see what is compiled:

 

Size/Speed of the generated code

Instruction

Size

Speed

Comment

literal (8-bits 0-255)

2

 

 

literal (16 bits 256-65536)

1+2

 

 

creation of an integer variable

1
(data)

 

No code is generated.

invocation of an integer variable

2

 

 

!

1+1+1

 

 

@

1+1+1

 

 

DEPTH

1+1+2

 

Call to subroutine.

DUP

1

 

Macro.

2DUP

4

 

 

OVER

1+1+1

 

 

ABOVE

1+1+1+1

 

Call to subroutine.

DROP

1+1

 

 

UNDER (alias: NIP)

1+1

 

 

BELOW

1+1+1+1

 

 

SWAP

1+1+1+1

 

 

ROT

1+1+1+1+1+1

 

Call to subroutine.

-ROT

1+1+1+1+1+1

 

Call to subroutine.

NEGATE

1

 

Macro.

+

1+1

 

 

-

1+1+1

 

 

*

1+1+1+1

 

 

 

Glossary

Stack Operations

Definition

Stack

Description

sp!

( ... --- )

Reset the parameters stack, i.e. discard all values from the stack.

depth

( --- n )

Return the number of parameters on the stack (prior to call depth).

drop

( n --- )

Discard the value from the top of the stack.

dup

( n --- n n )

Duplicate the value on the top of the stack.

over

( n1 n2 --- n1 n2 n1 )

Duplicate the value under the top of the stack.

above

pluck

( n1 n2 n3 --- n1 n2 n3 n1 )

Duplicate the value under-under the top of the stack.

pick

( n1 --- n2 )

Duplicate the n-th value under the top of the stack.

swap

( n1 n2 --- n2 n1 )

Swap (permute) the two values on top of the stack.

under

nip

( n1 n2 --- n2 )

Discard the value under the top of the stack.

below

( n1 n2 n3 --- n2 n3 )

Discard the value under-under the top of the stack.

rot

( n1 n2 n3 --- n2 n3 n1 )

Rotate the 3 values on top of the stack.

-rot

( n1 n2 n3 --- n3 n1 n2 )

Rotate the 3 values on top of the stack.

tuck

( n1 n2 --- n2 n1 n2 )

Make a swap then over

ddrop

2drop

( n1 n2 --- )

Discard the 2 values from the top of the stack.

ddup

2dup

( n1 n2 --- n1 n2 n1 n2 )

Duplicate the 2 values on the top of the stack.

 

Integer Arithmetic

Definition

Stack

Operation

Description

negate

( n1 --- n2 )

n2 = -n1

Negate the top stack value.

+

( n1 n2 --- n3 )

n3 = n1+n2

Add the values on top of stack.

-

( n1 n2 --- n3 )

n3 = n1-n2

Sub the values on top of stack.

*

( n1 n2 --- n3 )

n3 = n1*n2

Multiply the values on top of stack.

/

( n1 n2 --- n3 )

n3 = n1/n2

Integer division of the values on top of stack.

/mod

( n1 n2 --- n3 n4 )

n3 = n1%n2
n4 = n1/n2

Discard the value under the top of the stack.

mod

( n1 n2 --- n3 )

n3 = n1%n2

Modulo operation.

rnd

( n1 n2 --- n3 )

n3 = [n1..n2]

Return a random number between n1 and n2.

 

Logical

Definition

Stack

Operation

Description

true

( --- tf )

 

Push logical true value on stack.

false

( --- ff )

 

Push logical false value on stack.

==

=

( n1 n2 --- f )

f = (n1==n2) ? TRUE : FALSE

Return a logical value which is true if the given condition
for the two top stack values are true.

!=

<>

( n1 n2 --- f )

f = (n1!=n2) ? TRUE : FALSE

Return a logical value which is true if the given condition
for the two top stack values are true.

not

0==

0=

( f1 --- f2 )

f2 = (f1) ? FALSE : TRUE

Return a boolean flag which is the 'reverse' value.

<

( n1 n2 --- f )

f = (n1<n2) ? TRUE : FALSE

Return a logical value which is true if the given condition
for the two top stack values are true.

<=

( n1 n2 --- f )

f = (n1<=n2) ? TRUE : FALSE

Return a logical value which is true if the given condition
for the two top stack values are true.

>

( n1 n2 --- f )

f = (n1>n2) ? TRUE : FALSE

Return a logical value which is true if the given condition
for the two top stack values are true.

>=

( n1 n2 --- f )

f = (n1>=n2) ? TRUE : FALSE

Return a logical value which is true if the given condition
for the two top stack values are true.

U<

( u1 u2 --- f )

f = (u1<u2) ? TRUE : FALSE

Return a logical value which is true if the given condition
for the two top stack values are true.

U<=

( n1 n2 --- f )

f = (u1<=u2) ? TRUE : FALSE

Return a logical value which is true if the given condition
for the two top stack values are true.

U>

( u1 u2 --- f )

f = (u1>u2) ? TRUE : FALSE

Return a logical value which is true if the given condition
for the two top stack values are true.

U>=

( u1 u2 --- f )

f = (u1>=u2) ? TRUE : FALSE

Return a logical value which is true if the given condition
for the two top stack values are true.

and

( n1 n2 --- n3 )

n3 = n1 & n2

Perform a AND logical (bit) operation between the 2 top stack values.

or

( n1 n2 --- n3 )

n3 = n1 | n2

Perform a OR logical (bit) operation between the 2 top stack values.

xor

( n1 n2 --- n3 )

n3 = n1 ^ n2

Perform a XOR logical (bit) operation between the 2 top stack values.

shl

( n1 n2 --- n3 )

n3 = n1 << n2

Perform a left shift logical (bit) operation.

shr

( n1 n2 --- n3 )

n3 = n1 >> n2

Perform a right shift logical (bit) operation.

 

First example

: min 2dup < IF drop ELSE under ENDIF ;

Forth

Addr

Instruction

Comment

2dup

0A0F

CALL 2dup

Call to subroutine

<

0A11

CALL <

Call to subroutine

IF

0A13

BCNDD 0A1B,EQ,UNC

(0BRANCH)

0A15

MAR *-

0A16

LACC *

DROP

0A17

MAR *-

 

0A18

LACC *

ELSE

0A19

B 0A1C

(BRANCH)

UNDER

0A1A

ADD #1

 

;

0A1B

RET

End of definition

 

Second example

int cpt
: test
  cpt 100 !
  0 BEGIN
      1 +
      dup 10 < ?CONTINUE
      dup 50 == ?BREAK
      5 +!
  AGAIN drop @ ;

We will assume in this example that the data cell allocated for the variable cpt is at address $1A00 (data address allocated at the compilation):

Forth

Addr

Instruction

Comment

cpt

0A00

LAR AR6, #1A00h

Load the address register (Luckally ARP is not modified!)

100

0A02

SACL *+

Push a literal on the stack

0A03

LACC #100

+!

0A04

MAR *-,AR6

 

0A05

SACL *,AR7

0A06

LACC *

0

0A07

SACL *+

Push a literal on the stack

0A08

LACC #0

1+

0A09

ADD #1

Top of the loop (BEGIN - AGAIN) - Optimisation for the sequence "1 +"

DUP

0A0A

SACL *+

 

10 <

0A0B

SUB #10

Optimisation for the sequence "10 <"

0A0C

CALL lit<

?CONTINUE

0A0E

BCNDD 0A09,NEQ

Optimized conditionnal branch (1BRANCH)

0A10

MAR *-

0A11

LACC *

DUP

0A12

SACL *+

 

50 ==

0A13

SUB #50

Optimisation for the sequence "50 =="

0A14

CALL lit==

?BREAK

0A16

BCNDD 0A20,NEQ

Optimized conditionnal branch (1BRANCH)

0A18

MAR *-

0A19

LACC *

5

0A1A

SACL *+

Push a literal on the stack

0A1B

LACC #5

+!

0A1C

CALL +!

 

AGAIN

0A1E

B 0A09

Bottom of the loop (BEGIN - AGAIN)

DROP

0A20

MAR *-

 

0A21

LACC *

@

0A22

SACL *+

 

0A23

MAR *,AR6

0A24

LACC *,AR7

;

0A25

RET

End of definition

 

Some snapshots

 

In the Pipe-Line

Here are some work I should do in a next future:

         finish to code the control structures (WHILE-REPEAT, CASE, DO-LOOP, etc.)

         Code words such: AND OR XOR SHL SHR etc...

         Optimize the generated code (for example use delayed return on an end of a subroutine).

 

Feedback

I'm still working on this project, and there is plenty of work to do, althought if all the concepts are there. I'll love to have some feedback. For example, is there a better forth kernel choice for the C50? I'm not sure I've choosen the smallest/fastest instructions to implement the stack machine.