Infineon AURIX TC3xx

Architecture

On-Chip System Connectivity and Bridges

SRI Domains

The AURIX™ TC3xx Platform has three independent on-chip connectivity resources:

System Resource Interconnect Fabric (SRI Fabric) connects the TriCore CPUs, the DMA module, and other high bandwidth requestors to high bandwidth memories and other resources for instruction fetches and data accesses. A key component of the fabric is the SRI crossbar, which connects all the agents in one SRI domain. The SRI crossbar carries the transactions between the SRI Masters and SRI Slaves of the domain. The SRI crossbar supports parallel transactions between different SRI Master and SRI Slave agents. In addition to the parallelism of concurrent requests, it also supports pipelined requests from an SRI Master to a SRI Slave.
System Peripheral Bus (SPB) connects the TriCore CPUs, the DMA module, and other SPB masters to the medium and low bandwidth peripherals. SPB masters do not directly connect to the SRI Fabric, and will access SRI attached resources via a SFI_F2S Bridge.
Back Bone Bus (BBB) connects the TriCore CPUs, the DMA module, and SPB masters with ADAS resources. SRI Masters do not directly connect to the BBB, but access BBB attached resources via a SFI_S2F Bridge. SPB masters also do not directly connect to the BBB, but access BBB attached resources via bridging over the SRI Fabric.

CPU Resource Access Times

These tables describe the CPU access times to various resources in CPU clock cycles for the AURIX™ TC3xx Platform. In the case of load or fetch accesses, the access times are the minimum number of CPU stall cycles to complete the access. If there is a conflict for the resource accessed, there may be additional stall cycles till the conflicting access completes.

For write access, the access times are the maximum for a sequence of such access (non-conflicting). In many cases for a singleton access, or a short sequence, write buffering reduces the stall effect seen by a CPU, sometimes to 0. However, as with loads and fetches, if there is a conflict for the resource accessed, there may be additional stall cycles till the conflicting access completes.

Access latency for global resources


CPU Access Type	CPU stall cycles	Notes
Data read from System Peripheral Bus (SPB)	${\frac {f_{\text{CPU}}}{f_{\text{SPB}}}}\cdot (4+{\text{Module Wait State}})$	The final number of stall cycles will depend on the real number of WS generated by the target resource.
Data write to System Peripheral Bus (SPB)	${\frac {f_{\text{CPU}}}{f_{\text{SPB}}}}\cdot (4+{\text{Module Wait State}})$
Data read from Back Bone Bus (BBB) (TC39x, TC37xED)	$9+{\frac {f_{\text{CPU}}}{f_{\text{BBB}}}}\cdot (5+{\text{Module Wait State}})$	When SFI_S2F is connected to XBar2 (TC39x and TC37xED) there is an additional latency due to access going through an S2S.
Data write to Back Bone Bus (BBB) (TC39x, TC37xED)	$5+{\frac {f_{\text{CPU}}}{f_{\text{BBB}}}}\cdot (4+{\text{Module Wait State}})$
Data read from Back Bone Bus (BBB) (TC35x, TC33xED)	$6+{\frac {f_{\text{CPU}}}{f_{\text{BBB}}}}\cdot (5+{\text{Module Wait State}})$
Data write to Back Bone Bus (BBB) (TC35x, TC33xED)	$3+{\frac {f_{\text{CPU}}}{f_{\text{BBB}}}}\cdot (4+{\text{Module Wait State}})$

Module Wait State: The number of wait states for read and for write accesses is >= 1 and depends on the accessed module and its configuration.

CPU Accesses: Stall cycles for local and SRI resources


CPU Access Type	Local CPU	Local SRI	Remote SRI Domain
Data read from DSPR	0	7	10
Data write to DSPR	0	5, 3 (with Pipelining)	5, 4 (with Pipelining)
Instruction fetch from DSPR	See local SRI column	7	10
Data read from DLMU	0	7	10
Data write to DLMU	2	5, 3 (with Pipelining)	5, 4
Instruction fetch from DLMU	See local SRI column	7	10
Data read from PSPR	See local SRI column	7	10
Data write to PSPR	See local SRI column	5, 3 (with Pipelining)	5, 4 (with Pipelining)
Instruction fetch from PSPR	0	7	10
Data read from PFlash	5 + PWS	10 + PWS	13 + PWS
Instruction fetch from PFlash (buffer miss)	2 + PWS	9 + PWS	12 + PWS
Instruction fetch from PFlash (buffer hit)	3	6	9
Data read from LMU	n.a.	7	10
Data write to LMU	n.a.	5, 3 (with Pipelining)	5, 4 (with Pipelining)
Instruction fetch from LMU	n.a.	7	10
Data read from DFlash	n.a.	5 + 3*(3 + DCWS)	8 + 3*(3 + DCWS)
Data read access from EMEM (TC39x, TC37xED)	n.a.	n.a.	14, 15 (f_BBB < f_SRI)
Data write access to EMEM (TC39x, TC37xED)	n.a.	n.a.	9
Data read access from EMEM (TC35x, TC33xED)	n.a.	11, 12 (f_BBB < f_SRI)	n.a.
Data write access to EMEM (TC35x, TC33xED)	n.a.	9	n.a.
Data read access from DAM	n.a.	10	13
Data write access to DAM	n.a.	7	7

Remote SRI Domain: Only applies to products with SRI extenders. Additional latency due to access going through an S2S
PWS: Configured PFlash Wait States (Includes cycles for PFlash access cycles only). ECC correction latency is only incurred when the incoming data requires ECC correction.
PWS: Configured PFlash Wait States (Includes cycles for PFlash access cycles only). ECC correction latency is only incurred when the incoming data requires ECC correction.
DCWS: Configured DFlash Corrected Wait States (Includes cycles for DFlash access cycles and ECC correction latency)

CPU Subsystem (CPU0 ... CPU5)

Processor Core, Local Memory and Connectivity

The Infineon AURIX TC3xx features up to 6 processor cores implementing the TC1.6.2 instruction set architecture. The following section focuses on the microarchitectural details of the CPU subsystem. For more information about the ISA, please take a look at the TC1.6.2 article.

The processor core connects to the following memories and bus interfaces (where implemented):

Program Scratch-Pad SRAM (PSPR)
Data Scratch-Pad SRAM (DSPR)
Program Cache (PCache)
Data Cache (DCache)
Local Memory Unit (DLMU)
Local Pflash bank (LPB)
SRI slave interface (x2)
SRI master Interface
SPB master interface

TC1.6.2P Implementation Features

Most instructions executed in 1 cycle
Branch instructions in 1, 2 or 3 cycles (using dynamic branch prediction)
Wide memory interface for fast context switch
Automatic context save-on-entry and restore-on-exit for: subroutine, interrupt, trap
Six memory protection register sets
Dual instruction issuing (in parallel into Integer Pipeline and Load/Store Pipeline)
Third pipeline for loop instruction only (zero overhead loop)
Single precision Floating Point Unit (IEEE-754 Compatible)
Dedicated Integer divide unit
18 data memory protection ranges, 10 code memory protection ranges arranged in 6 sets

Pipeline

Instruction Timing

Platform Devices

The following table shows a feature overview of the AURIX™ TC3xx Platform family focusing on memory and number of cores.

	Feature	TC33x	TC33xEXT	TC35x	TC36x	TC37x	TC37xEXT	TC38x	TC39x
CPUs	Cores / Checker Cores	1 / 1	2 / 1	3 / 2	2 / 2	3 / 2	3 / 3	4 / 2	6 / 4
CPUs	Max. Freq.	300 MHz
Cache per CPU	Program [KB]	32
Cache per CPU	Data [KB]	16
SRAM per CPU	PSPR [KB]	8	32 (CPU0) 64 (other)	64	32	64	64	64	64
	DSPR [KB]	192	192 (CPU0) 64 (other)	240 (CPU0&1) 64 (other)	192	240 (CPU0&1) 64 (other)	240 (CPU0&1) 64 (other)	240 (CPU0&1) 64 (other)	240 (CPU0&1) 64 (other)
	DLMU [KB]	8	8 (CPU0) 64 (other)	64	64	64	64	64	64
SRAM global	LMU [KB]	-	-	512	-	-	-	128	768
SRAM global	DAM [KB]	-	-	32	-	64	64	128	128
Extension Memory (EMEM)	TCM [MB]	-	1	2	-	-	2	-	2
	XCM [MB]	-	-	-	-	-	1	-	2
	XTM [KB]	-	16	16	-	-	16	-	16
Program Flash	Size [MB]	2	4	4	4	6	10	10	16
Program Flash	Banks [MB]	1 x 2	2 x 2	2 x 2	2 x 2	2 x 3	3 x 3, 1 x 1	3 x 3, 1 x 1	5 x 3, 1 x 1
Data Flash	DF0 Size (single-ended) [KB]	128	128	128	128	256	512	512	1024
Data Flash	DF1 Size (single-ended) [KB]	128	128	128	128	128	128	128	128
DMA	Channels	64	64	64	64	128	128	128	128
	Move Engines	2	2	2	2	2	2	2	2
	Resource Partitions	4	4	4	4	4	4	4	4

Compilers

The AURIX TC3xx family is a series of high-performance microcontrollers widely used in automotive and industrial applications. Compilers for the AURIX TC3xx are crucial for developers aiming to optimize performance, reliability, and safety in their applications. One significant aspect of the compiler landscape for AURIX TC3xx is the limited direct support from mainline open-source compilers such as GCC or LLVM/Clang. The primary reason for this is the stringent requirements for functional safety in automotive and industrial applications, which demand specialized features and compliance with safety standards that are often not met by general-purpose open-source compilers.

Commercial Compilers

Tasking: Tasking compilers are renowned for their robust support for automotive applications, offering advanced debugging capabilities and optimization techniques tailored for the AURIX architecture. They provide extensive code optimization, comprehensive debugging tools, and strong support for safety standards such as ISO 26262, making them ideal for developing high-performance, reliable, and safe applications.
HighTec: The HighTec compiler is a popular choice, known for its Eclipse-based development environment and strong multicore support. HighTec provides both GCC and LLVM-based ports of open-source compilers tailored specifically for the AURIX TC3xx family. These compilers offer efficient parallel execution, advanced code analysis, and an integrated development environment, ensuring robust performance and compliance with safety standards.
Green Hills Software: Green Hills Software provides a highly optimized toolchain aimed at safety-critical applications, focusing on high performance and strict compliance with automotive standards. Their compiler offers superior optimization, extensive safety features, and a proprietary IDE with specialized tools for automotive development, ensuring developers can meet the stringent demands of functional safety.

GCC for AURIX

While mainline GCC does not directly support the AURIX TC3xx family, there is an unofficial GCC version available for AURIX. Due to the GNU General Public License (GPL), the source code from HighTec was retrieved and, along with binary versions, published on GitHub:

GCC 4.9.4/Binutils 2.20/Newlib 1.18 for Tricore Aurix [Source] [Prebuild Mingw Binaries] [Prebuild Linux Binaries]
GCC 9.4.0/Binutils 2.20/Newlib 1.18 for Tricore Aurix [Source] [Prebuild Mingw Binaries] [Prebuild Linux Binaries]
GDB 10.0.50 for Tricore Aurix [Source]

External Links

Infineon AURIX TC3xx

Contents

Architecture

On-Chip System Connectivity and Bridges

CPU Resource Access Times

Access latency for global resources

CPU Accesses: Stall cycles for local and SRI resources

CPU Subsystem (CPU0 ... CPU5)

TC1.6.2P Implementation Features

Pipeline

Instruction Timing

Platform Devices

Compilers

Commercial Compilers

GCC for AURIX

See Also

External Links

Navigation menu

Infineon AURIX TC3xx

Architecture

On-Chip System Connectivity and Bridges

CPU Resource Access Times

Access latency for global resources

CPU Accesses: Stall cycles for local and SRI resources

CPU Subsystem (CPU0 ... CPU5)

TC1.6.2P Implementation Features

Pipeline

Instruction Timing

Platform Devices

Compilers

Commercial Compilers

GCC for AURIX

See Also

External Links

Navigation menu

Search