STM8 compilers quick comparison

While porting ChibiOS/RT to the STM8 I had a chance to work in parallel with two different compilers, the Raisonance RKit-STM8 and the Cosmic STM8 C Compiler. The work allowed me to analyze the produced code (something I always do while porting the OS). Unfortunately GCC is not yet available for STM8 so I couldn't include it in the comparison.

Some notes

The comparison will only analyze the produced code in a very specific scenario, it will not cover other topics like:

  • Ease of use.
  • Quality of the runtime and libraries.
  • Quality of the IDE.
  • Documentation.
  • Price and support.
  • The tested version is the free one limited code size for both compilers.

The first problems

Interestingly enough both compilers failed to produce correct code on the first try, both companies were very reactive and fixed the problems.

  • Cosmic, they sent me an updated compiler executable, apparently the non-free version didn't have the bugs I reported. I am not sure if the free version has been fixed since then, the download page does not show any version number nor a change list log.
  • Raisonance, they contacted me and worked to fix the problem, the currently available download does not have problems as far ChibiOS/RT is concerned. The information available on the web site is complete.

Compiler versions

  • Raisonance, RKit-STM8_2.30.10.0175, 32K free version, unlimited use.
  • Cosmic, compiler reporting version 4.3.3.3, 32K free version, 1 year limited use.

Test setup

The test have been executed on the cheap and excellent STM8S-Discovery kit which also includes an STM32-based USB debugger. The test application has been compiled using the free ST Visual Develop IDE. This IDE supports both the Cosmic and Raisonance compilers, it allowed me to create a single test application that can run using both compilers without code changes. This also demonstrates how ChibiOS/RT can hide to the application most of the compiler-related differences. The test application and the STVD project for both compilers is available in the ChibiOS-RT distributions starting from version 2.1.1.

Benchmarks

I expected differences between the compilers but the benchmark results really made me curious and I had to go deeper inside the generated code. The results were generated with compilers options set for “fastest” code. The full details can be verified in the test application project files.

Code size

The test application uses almost nothing of the runtime libraries so the reported size depends almost entirely on the compilers.

Desciption Cosmic Raisonance Difference
Whole Demo (flash) 20620 bytes 22845 bytes +10.790%

Execution speed

Now lets see how the compilers behave under the various standard benchmarks.

Benchmark Cosmic Raisonance Difference
messages #1 72488 64062 -11.624%
messages #2 57394 49106 -14.441%
messages #3 57394 49106 -14.441%
context switch 138392 110712 -20.001%
threads, full cycle 20894 18000 -13.851%
threads, create only 32356 27846 -13.939%
mass reschedule 55794 42402 -24.003%
round robin 71596 54880 -23.348%
I/O Queues throughput 91500 63540 -30.557%
virtual timers set/reset 72654 57702 -20.580%
semaphores wait/signal 293272 216720 -26.103%
mutexes lock/unlock 141764 123188 -13.103%

The source code of the various benchmarks is available in the ChibiOS/RT distribution, the results are also available for all the other supported architectures, see the ”Performance and Testing Data” page. It is also interesting to see how the STM8S compares with other common 8/16 bit architectures.
The fun thing is that having very few internal registers makes the STM8 a context switch monster, also interrupt latencies are excellent because this.

The code

I decided to analyze the generated code for some ChibiOS/RT functions.

Function scheduler_init()

This is a very simple function, no loops, no function calls, just assignments, it should be an easy task overall.

void scheduler_init(void) {
  queue_init(&rlist.r_queue);
  rlist.r_prio = NOPRIO;
  rlist.r_preempt = CH_TIME_QUANTUM;
  rlist.r_newer = rlist.r_older = (Thread *)&rlist;
}

Note that the project setup allocates the rlist structure in page zero, so it just requires an 8 bits addressing.

Resulting code for the Cosmic compiler:

Cosmic (size = 18, stack = 0)
	ldw	x,#_rlist
	ldw	_rlist+2,x
	ldw	_rlist,x
	clr	_rlist+4
	mov	_rlist+11,#10
	ldw	_rlist+9,x
	ldw	_rlist+7,x
	ret

Resulting code for the Raisonance compiler:

Raisonance (size = 28, stack = 0)
        LD     A,#002H
        ADD    A,#rlist
        CLRW   X
        LD     XL,A
        LDW    Y,#rlist
        LDW    (X),Y
        LDW    X,(X)
        LDW    rlist,X
        CLR    rlist + 04H
        MOV    rlist + 0BH,#00AH
        LDW    X,#rlist
        LDW    rlist + 07H,X
        LDW    rlist + 09H,X
        RET

Function chSchReadyI()

This is the most critical function in ChibiOS/RT the generated code directly affects the context switch performance.

Thread *chSchReadyI(Thread *tp) {
  Thread *cp;
  tp->p_state = THD_STATE_READY;
  cp = (Thread *)&rlist.r_queue;
  do {
    cp = cp->p_next;
  } while (cp->p_prio >= tp->p_prio);
  tp->p_prev = (tp->p_next = cp)->p_prev;
  tp->p_prev->p_next = cp->p_prev = tp;
  return tp;
}

Resulting code for the Cosmic compiler:

Cosmic (size = 49, stack = 4)
	pushw	x
	pushw	x
	clr	(11,x)
	ldw	x,#_rlist
	ldw	(OFST-1,sp),x
L3:
	ldw	x,(OFST-1,sp)
	ldw	x,(x)
	ldw	(OFST-1,sp),x
	ld	a,(4,x)
	ldw	x,(OFST+1,sp)
	cp	a,(4,x)
	jruge	L3
	ldw	y,(OFST-1,sp)
	ldw	(x),y
	ldw	x,(x)
	ldw	y,(OFST+1,sp)
	ldw	x,(2,x)
	ldw	(2,y),x
	ldw	x,(OFST-1,sp)
	ldw	(2,x),y
	ldw	y,(2,y)
	ldw	x,(2,x)
	ldw	(y),x
	ldw	x,(OFST+1,sp)
	addw	sp,#4
	ret

Resulting code for the Raisonance compiler:

Raisonance (size = 65, stack = 6)
        PUSHW  X
        SUB    SP,#004H
        LDW    X,(005H,SP)   ; [ tp ]
        CLR    (00BH,X)
        LDW    X,#rlist
        LDW    (001H,SP),X   ; [ cp ]
?DO_0001:
        LDW    X,(001H,SP)   ; [ cp ]
        LDW    X,(X)
        LDW    (001H,SP),X   ; [ cp ]
        LDW    X,(005H,SP)   ; [ tp ]
        LD     A,(004H,X)
        LD     ?BH,A
        LDW    X,(001H,SP)   ; [ cp ]
        LD     A,(004H,X)
        CP     A,?BH
        JRUGE  ?DO_0001
        LDW    X,(005H,SP)   ; [ tp ]
        LDW    Y,(001H,SP)   ; [ cp ]
        LDW    (X),Y
        LDW    X,(X)
        LDW    X,(002H,X)
        EXGW   X,Y
        LDW    X,(005H,SP)   ; [ tp ]
        INCW   X
        INCW   X
        LDW    (003H,SP),X
        LDW    (X),Y
        LDW    X,(001H,SP)   ; [ cp ]
        INCW   X
        INCW   X
        LDW    Y,(005H,SP)   ; [ tp ]
        LDW    (X),Y
        LDW    X,(X)
        EXGW   X,Y
        LDW    X,(003H,SP)
        LDW    X,(X)
        LDW    (X),Y
        LDW    X,(005H,SP)   ; [ tp ]
        ADD    SP,#006H
        RET

Function chSchWakeupS()

Another very critical scheduler function, this one includes some function calls.

void chSchWakeupS(Thread *ntp, msg_t msg) {
  ntp->p_u.rdymsg = msg;
  if (ntp->p_prio <= currp->p_prio)
    chSchReadyI(ntp);
  else {
    Thread *otp = chSchReadyI(currp);
    rlist.r_preempt = CH_TIME_QUANTUM;
    setcurrp(ntp);
    ntp->p_state = THD_STATE_CURRENT;
    chDbgTrace(otp);
    chSysSwitchI(ntp, otp);
  }
}

Resulting code for the Cosmic compiler:

Cosmic (size = 46, stack = 4)
	pushw	x
	pushw	x
	ldw	y,(OFST+5,sp)
	ldw	(16,x),y
	ld	a,(4,x)
	ldw	x,_rlist+5
	cp	a,(4,x)
	jrugt	L13
	ldw	x,(OFST+1,sp)
	call	_chSchReadyI
	jra	L33
L13:
	call	_chSchReadyI
	ldw	(OFST-1,sp),x
	mov	_rlist+11,#10
	ldw	x,(OFST+1,sp)
	ldw	_rlist+5,x
	ld	a,#1
	ld	(11,x),a
	ldw	x,(OFST-1,sp)
	call	__port_switch
L33:
	addw	sp,#4
	ret	

Resulting code for the Raisonance compiler:

Raisonance (size = 54, stack = 4)
        PUSHW  X
        PUSHW  X
        LDW    Y,(007H,SP)   ; [ msg ]
        LDW    (010H,X),Y
        LDW    X,(003H,SP)   ; [ ntp ]
        LD     A,(004H,X)
        LD     ?BH,A
        LDW    X,rlist + 05H
        LD     A,(004H,X)
        CP     A,?BH
        JRULT  ?ELSE_0005
        LDW    X,(003H,SP)   ; [ ntp ]
        CALL   ?chSchReadyI
        JRA    ?EPILOG_0005
?ELSE_0005:
        LDW    X,rlist + 05H
        CALL   ?chSchReadyI
        LDW    (001H,SP),X   ; [ otp ]
        MOV    rlist + 0BH,#00AH
        LDW    X,(003H,SP)   ; [ ntp ]
        LDW    rlist + 05H,X
        LD     A,#001H
        LD     (00BH,X),A
        LDW    X,(001H,SP)   ; [ otp ]
        CALL   ?_port_switch
?EPILOG_0005:
        ADD    SP,#004H
        RET    

Function chSchIsRescRequiredExI()

Another, apparently simple, scheduler function.

bool_t chSchIsRescRequiredExI(void) {
  tprio_t p1 = firstprio(&rlist.r_queue);
  tprio_t p2 = currp->p_prio;
  return rlist.r_preempt ? p1 > p2 : p1 >= p2;
}

Resulting code for the Cosmic compiler:

Cosmic (size = 36, stack = 2)
	pushw	x
	ldw	x,_rlist
	ld	a,(4,x)
	ld	(OFST-1,sp),a
	ldw	x,_rlist+5
	ld	a,(4,x)
	ld	(OFST+0,sp),a
	ld	a,_rlist+11
	jreq	L46
	ld	a,(OFST-1,sp)
	cp	a,(OFST+0,sp)
	jrule	L47
LC002:
	ld	a,#1
	jra	L67
L46:
	ld	a,(OFST-1,sp)
	cp	a,(OFST+0,sp)
	jruge	LC002
L47:
	clr	a
L67:
	popw	x
	ret	

Resulting code for the Raisonance compiler:

Raisonance (size = 43, stack = 2)
        PUSHW  X
        LDW    X,rlist
        LD     A,(004H,X)
        LD     (001H,SP),A   ; [ p1 ]
        LDW    X,rlist + 05H
        LD     A,(004H,X)
        LD     (002H,SP),A   ; [ p2 ]
        TNZ    rlist + 0BH
        JREQ   ?ELSE_0009
        LD     A,(002H,SP)   ; [ p2 ]
        CP     A,(001H,SP)   ; [ p1 ]
        JRUGE  ?LAB_0008
        CLR    A
        INC    A
        JRA    ?EPILOG_0008
?LAB_0008:
        CLR    A
        JRA    ?EPILOG_0008
?ELSE_0009:
        LD     A,(001H,SP)   ; [ p1 ]
        CP     A,(002H,SP)   ; [ p2 ]
        JRULT  ?LAB_0010
        CLR    A
        INC    A
        JRA    ?EPILOG_0008
?LAB_0010:
        CLR    A
?EPILOG_0008:
        POPW   X
        RET    

Just performance?

It think that the performance is not the whole story, the quality of a development environment is not just the generated code efficiency.

Cosmic Comments

The good

  • More efficient generated code.

The bad

  • Version numbers mess, I am still not sure if the currently available free compiler includes the fixes of the version they sent me.

The ugly

  • The IDE is not as well organized as the Raisonance Ride7 (but this is subjective).
  • 32K free version limited to 1 year, there is a 16K unlimited version but 16K is not enough for the ChibiOS/RT test application which averages around 20K.
  • The compiler supports an inlining directive but it seems to not give great advantages.

Raisonance Comments

The good

  • Ride7 is a pleasure to work with. It also supports GCC for ARM development so it is not just STM8.
  • Interrupt handlers are easier to declare than in the Cosmic compiler, you don't have to maintain a vectors table, it is all done automagically.
  • The compiler uses some less bytes in page zero for virtual registers, it also optimizes automatically the allocation in page 0 using a special option.
  • 32K free version unlimited in time.

The bad

  • Generates less efficient code but I am seeing a positive trend in the latest releases.

The ugly

  • It seems to be less supported in STVD, I had a problem using the Raisonance compiler in debug mode under STVD, probably not a Raisonance problem anyway.
  • No support for inlining.
 
chibios/articles/stm8_compilers.txt · Last modified: 2011/10/03 20:44 by giovanni
 
Except where otherwise noted, content on this wiki is licensed under the following license:GNU Free Documentation License 1.3