

## Response-Time Analysis of Dataflow Applications on a Many-Core Processor with Shared Memory and Network on Chip

#### Amaury Graillat, <u>Claire Maiza</u>, Matthieu Moy, Pascal Raymond, Benoit Dinechin

**RTNS 2019** 



| Maiza |
|-------|
|       |

Many-Core RTA









High-level Data-Flow Application





High-level Data-Flow Application









→ Implementation of a dataflow application on a many-core → with intra-cluster communications





#### In this talk

- The basis: intra-cluster implementation and analysis
- The question : Inter-cluster, what are the main steps of a communication?
- Our timing analysis and implementation choices on the MPPA2 many-core





#### 1 Context: MPPA2, Implementation and analysis on a multi-core

- 2 Inter-Cluster Communication on MPPA
- 3 Our timing analysis and implementation choices on the MPPA2 many-core

#### 4 Evaluation





#### 1 Context: MPPA2, Implementation and analysis on a multi-core

- 2 Inter-Cluster Communication on MPPA
- 3 Our timing analysis and implementation choices on the MPPA2 many-core

#### 4 Evaluation

## Architecture Model





#### Kalray MPPA 256 Bostan

16 compute clusters + 4 I/O clusters Dual NoC (Network on Chip)

## Architecture Model





#### Per cluster:

16 cores + 1 Resource Manager NoC Tx, NoC Rx, Debug Unit 16 shared memory banks (total size: 2 MB) Multi-level bus arbiter per memory bank

## Architecture Model





#### Per cluster:

16 cores + 1 Resource Manager NoC Tx, NoC Rx, Debug Unit 16 shared memory banks (total size: 2 MB) Multi-level bus arbiter per memory bank



## Memory mapping and execution model





#### Implementation choice

Phased execution model:

- Execute in a "local" bank
- Write to a "remote" bank
- $\Rightarrow$  Interference limited to communication (write)

## Response Time Analysis : within a cluster



#### Response Time Analysis<sup>1</sup>

 $R_i = WCET_i + \sum_{j \neq i} interference_{i,j}$ 

# MIA tool: Interference Analysis, Response Time and Release date

- From: local WCET, # memory accesses, initial Scheduling/Mapping
- Estimate interference delay
- Reajust release date to respect precedence constraints
- Iterate until fix point

<sup>1</sup>no preemption





#### 1 Context: MPPA2, Implementation and analysis on a multi-core

#### 2 Inter-Cluster Communication on MPPA

## 3 Our timing analysis and implementation choices on the MPPA2 many-core

#### 4 Evaluation





NoC Communication steps:

Read from bank

#### Interference:







#### NoC Communication steps:

1 Read from bank

2, 3 write to the buffer and start transmission

#### Interference:

1 Intra-cluster bus interference

**2**, **3** no arbiter  $\Rightarrow$  One TX channel per sender

ber sender

 $\Rightarrow$  isolation





#### NoC Communication steps:

- Read from bank
- 2, 3 write to the buffer and start transmission
- 4 NoC transmission

#### Interference:

- 1 Intra-cluster bus interference
- **2**, **3** no arbiter  $\Rightarrow$  One TX channel per sender
- - $\Rightarrow \text{isolation}$
- 4 Interferences in each router
  - $\rightarrow$  network calculus





#### **NoC Communication steps:**

- Read from bank
- 2, 3 write to the buffer and start transmission
- 4 NoC transmission
- 5 write in bank with High-priority!

#### Interference:

- 1 Intra-cluster bus interference
- **2**, **3** no arbiter  $\Rightarrow$  One TX channel per sender
  - $\Rightarrow \text{isolation}$
- 4 Interferences in each router
  - $\rightarrow$  network calculus
- 5 Need of interference analysis

Many-Core RTA





1 Context: MPPA2, Implementation and analysis on a multi-core

- 2 Inter-Cluster Communication on MPPA
- 3 Our timing analysis and implementation choices on the MPPA2 many-core

#### 4 Evaluation

# How to integrate the NoC Reception in the timing model?



#### NoC reception as an additional task

- Model as a task ⇒ when the fixed priority interference will occur
- Q1 When does the reception start?
- Q2 What about circular dependency?



















Issue: circular dependency?



Summary: NoC Communication Timing Analysis

Compute Task:

Fits in previous work memory interference model

Copy to TX:

- Isolate by scheduling
- Start NoC transfer (EOT):
  - Avoid memory access = no interference
- On the RX side:
  - RX can only start after "Start NoC transfer"
    ⇒ edge from "Copy to TX" to "RX" in the task dependency graph.





#### 1 Context: MPPA2, Implementation and analysis on a multi-core

- 2 Inter-Cluster Communication on MPPA
- 3 Our timing analysis and implementation choices on the MPPA2 many-core



## How to validate such a model?



#### Evaluated in the paper

- How much do we gain on the Response-Time with our 3-phased Noc Transmission task?
- How precise is the time-triggered implementation?

## **Example Application**

#### Worst-Case:



#### Improved:





New Data-flow Application implementation and timing analysis with intra-cluster communications

#### Summary

- Sending task = 3-phased task
- Isolated NoC transmission
- No memory access during EOT task
- Interference due to NoC recption taken into account in Multi-Core Interference Analysis