Infineon AURIX TC3xx

From emmtrix Wiki
Jump to navigation Jump to search


Architecture

Infineon TC3xx Architecture.webp

On-Chip System Connectivity and Bridges

SRI Domains

The AURIX™ TC3xx Platform has three independent on-chip connectivity resources:

  • System Resource Interconnect Fabric (SRI Fabric) connects the TriCore CPUs, the DMA module, and other high bandwidth requestors to high bandwidth memories and other resources for instruction fetches and data accesses. A key component of the fabric is the SRI crossbar, which connects all the agents in one SRI domain. The SRI crossbar carries the transactions between the SRI Masters and SRI Slaves of the domain. The SRI crossbar supports parallel transactions between different SRI Master and SRI Slave agents. In addition to the parallelism of concurrent requests, it also supports pipelined requests from an SRI Master to a SRI Slave.
  • System Peripheral Bus (SPB) connects the TriCore CPUs, the DMA module, and other SPB masters to the medium and low bandwidth peripherals. SPB masters do not directly connect to the SRI Fabric, and will access SRI attached resources via a SFI_F2S Bridge.
  • Back Bone Bus (BBB) connects the TriCore CPUs, the DMA module, and SPB masters with ADAS resources. SRI Masters do not directly connect to the BBB, but access BBB attached resources via a SFI_S2F Bridge. SPB masters also do not directly connect to the BBB, but access BBB attached resources via bridging over the SRI Fabric.

CPU Resource Access Times

These tables describe the CPU access times to various resources in CPU clock cycles for the AURIX™ TC3xx Platform. In the case of load or fetch accesses, the access times are the minimum number of CPU stall cycles to complete the access. If there is a conflict for the resource accessed, there may be additional stall cycles till the conflicting access completes.

For write access, the access times are the maximum for a sequence of such access (non-conflicting). In many cases for a singleton access, or a short sequence, write buffering reduces the stall effect seen by a CPU, sometimes to 0. However, as with loads and fetches, if there is a conflict for the resource accessed, there may be additional stall cycles till the conflicting access completes.

Access latency for global resources

CPU Access Type CPU stall cycles Notes
Data read from System Peripheral Bus (SPB) The final number of stall cycles will depend
on the real number of WS generated by the target resource.
Data write to System Peripheral Bus (SPB)
Data read from Back Bone Bus (BBB) (TC39x, TC37xED) When SFI_S2F is connected to XBar2 (TC39x and TC37xED)
there is an additional latency due to access going through an S2S.
Data write to Back Bone Bus (BBB) (TC39x, TC37xED)
Data read from Back Bone Bus (BBB) (TC35x, TC33xED)
Data write to Back Bone Bus (BBB) (TC35x, TC33xED)

  • Module Wait State: The number of wait states for read and for write accesses is >= 1 and depends on the accessed module and its configuration.

CPU Accesses: Stall cycles for local and SRI resources

CPU Access Type Local CPU Local SRI Remote SRI Domain
Data read from DSPR 0 7 10
Data write to DSPR 0 5, 3 (with Pipelining) 5, 4 (with Pipelining)
Instruction fetch from DSPR See local SRI column 7 10
Data read from DLMU 0 7 10
Data write to DLMU 2 5, 3 (with Pipelining) 5, 4
Instruction fetch from DLMU See local SRI column 7 10
Data read from PSPR See local SRI column 7 10
Data write to PSPR See local SRI column 5, 3 (with Pipelining) 5, 4 (with Pipelining)
Instruction fetch from PSPR 0 7 10
Data read from PFlash 5 + PWS 10 + PWS 13 + PWS
Instruction fetch from PFlash (buffer miss) 2 + PWS 9 + PWS 12 + PWS
Instruction fetch from PFlash (buffer hit) 3 6 9
Data read from LMU n.a. 7 10
Data write to LMU n.a. 5, 3 (with Pipelining) 5, 4 (with Pipelining)
Instruction fetch from LMU n.a. 7 10
Data read from DFlash n.a. 5 + 3*(3 + DCWS) 8 + 3*(3 + DCWS)
Data read access from EMEM (TC39x, TC37xED) n.a. n.a. 14, 15 (fBBB < fSRI)
Data write access to EMEM (TC39x, TC37xED) n.a. n.a. 9
Data read access from EMEM (TC35x, TC33xED) n.a. 11, 12 (fBBB < fSRI) n.a.
Data write access to EMEM (TC35x, TC33xED) n.a. 9 n.a.
Data read access from DAM n.a. 10 13
Data write access to DAM n.a. 7 7
  • Remote SRI Domain: Only applies to products with SRI extenders. Additional latency due to access going through an S2S
  • PWS: Configured PFlash Wait States (Includes cycles for PFlash access cycles only). ECC correction latency is only incurred when the incoming data requires ECC correction.
  • PWS: Configured PFlash Wait States (Includes cycles for PFlash access cycles only). ECC correction latency is only incurred when the incoming data requires ECC correction.
  • DCWS: Configured DFlash Corrected Wait States (Includes cycles for DFlash access cycles and ECC correction latency)

CPU Subsystem (CPU0 ... CPU5)

TC3xx Processor Core, Local Memory and Connectivity
Processor Core, Local Memory and Connectivity

The Infineon AURIX TC3xx features up to 6 processor cores implementing the TC1.6.2 instruction set architecture. The following section focuses on the microarchitectural details of the CPU subsystem. For more information about the ISA, please take a look at the TC1.6.2 article.

The processor core connects to the following memories and bus interfaces (where implemented):

  • Program Scratch-Pad SRAM (PSPR)
  • Data Scratch-Pad SRAM (DSPR)
  • Program Cache (PCache)
  • Data Cache (DCache)
  • Local Memory Unit (DLMU)
  • Local Pflash bank (LPB)
  • SRI slave interface (x2)
  • SRI master Interface
  • SPB master interface

TC1.6.2P Implementation Features

  • Most instructions executed in 1 cycle
  • Branch instructions in 1, 2 or 3 cycles (using dynamic branch prediction)
  • Wide memory interface for fast context switch
  • Automatic context save-on-entry and restore-on-exit for: subroutine, interrupt, trap
  • Six memory protection register sets
  • Dual instruction issuing (in parallel into Integer Pipeline and Load/Store Pipeline)
  • Third pipeline for loop instruction only (zero overhead loop)
  • Single precision Floating Point Unit (IEEE-754 Compatible)
  • Dedicated Integer divide unit
  • 18 data memory protection ranges, 10 code memory protection ranges arranged in 6 sets

Pipeline

Instruction Timing

Platform Devices

The following table shows a feature overview of the AURIX™ TC3xx Platform family focusing on memory and number of cores.

Feature TC33x TC33xEXT TC35x TC36x TC37x TC37xEXT TC38x TC39x
CPUs Cores / Checker Cores 1 / 1 2 / 1 3 / 2 2 / 2 3 / 2 3 / 3 4 / 2 6 / 4
Max. Freq. 300 MHz
Cache per CPU Program [KB] 32
Data [KB] 16
SRAM per CPU PSPR [KB] 8 32 (CPU0)

64 (other)

64 32 64 64 64 64
DSPR [KB] 192 192 (CPU0)

64 (other)

240 (CPU0&1)

64 (other)

192 240 (CPU0&1)

64 (other)

240 (CPU0&1)

64 (other)

240 (CPU0&1)

64 (other)

240 (CPU0&1)

64 (other)

DLMU [KB] 8 8 (CPU0)

64 (other)

64 64 64 64 64 64
SRAM global LMU [KB] - - 512 - - - 128 768
DAM [KB] - - 32 - 64 64 128 128
Extension Memory (EMEM) TCM [MB] - 1 2 - - 2 - 2
XCM [MB] - - - - - 1 - 2
XTM [KB] - 16 16 - - 16 - 16
Program Flash Size [MB] 2 4 4 4 6 10 10 16
Banks [MB] 1 x 2 2 x 2 2 x 2 2 x 2 2 x 3 3 x 3, 1 x 1 3 x 3, 1 x 1 5 x 3, 1 x 1
Data Flash DF0 Size (single-ended) [KB] 128 128 128 128 256 512 512 1024
DF1 Size (single-ended) [KB] 128 128 128 128 128 128 128 128
DMA Channels 64 64 64 64 128 128 128 128
Move Engines 2 2 2 2 2 2 2 2
Resource Partitions 4 4 4 4 4 4 4 4

Compilers

The AURIX TC3xx family is a series of high-performance microcontrollers widely used in automotive and industrial applications. Compilers for the AURIX TC3xx are crucial for developers aiming to optimize performance, reliability, and safety in their applications. One significant aspect of the compiler landscape for AURIX TC3xx is the limited direct support from mainline open-source compilers such as GCC or LLVM/Clang. The primary reason for this is the stringent requirements for functional safety in automotive and industrial applications, which demand specialized features and compliance with safety standards that are often not met by general-purpose open-source compilers.

Commercial Compilers

  1. Tasking: Tasking compilers are renowned for their robust support for automotive applications, offering advanced debugging capabilities and optimization techniques tailored for the AURIX architecture. They provide extensive code optimization, comprehensive debugging tools, and strong support for safety standards such as ISO 26262, making them ideal for developing high-performance, reliable, and safe applications.
  2. HighTec: The HighTec compiler is a popular choice, known for its Eclipse-based development environment and strong multicore support. HighTec provides both GCC and LLVM-based ports of open-source compilers tailored specifically for the AURIX TC3xx family. These compilers offer efficient parallel execution, advanced code analysis, and an integrated development environment, ensuring robust performance and compliance with safety standards.
  3. Green Hills Software: Green Hills Software provides a highly optimized toolchain aimed at safety-critical applications, focusing on high performance and strict compliance with automotive standards. Their compiler offers superior optimization, extensive safety features, and a proprietary IDE with specialized tools for automotive development, ensuring developers can meet the stringent demands of functional safety.

GCC for AURIX

While mainline GCC does not directly support the AURIX TC3xx family, there is an unofficial GCC version available for AURIX. Due to the GNU General Public License (GPL), the source code from HighTec was retrieved and, along with binary versions, published on GitHub:

See Also

External Links