#### Cache-aware Schedulability Analysis of PREM Compliant Tasks

Syed Aftab Rashid, Muhammad Ali Awan, Pedro F. Souto, Konstantinos Bletsas, Eduardo Tovar







#### **Presentation Contents**

#### 01 Introduction and Motivation

**PREM Model**, and **Problem definition** 

#### 03 **Experimental Setup and** Results

**Experiments performed under different** settings and parameter values



**CISTER** - Research Center in **Real-Time & Embedded Computing Systems** 



#### 02 **Cache-aware PREM Model**

**DRCB-only Approach FDCB-DRCB** Approach

#### 04

**Conclusions and Future works** 

**Conclusions** and potential **future** research directions

















Δ

- Several advantages over single-core/custom build hardware.
  - $\checkmark$ Improved performance
  - ✓Instant availability
  - $\checkmark$ Low cost
  - $\checkmark$  Low integration complexity, etc.
- Design mantra of "average case faster" makes MCPs a popular choice in lowcriticality/soft real-time systems.







- Several advantages over single-core/custom build hardware.
  - $\checkmark Improved \ performance$
  - ✓Instant availability
  - $\checkmark$ Low cost
  - $\checkmark$  Low integration complexity, etc.
- Design mantra of "average case faster" makes MCPs a popular choice in lowcriticality/soft real-time systems.
- The performance-oriented design of MCPs makes them non-deterministic, e.g., due to sharing of hardware resources such as caches, bus and the main memory.





- Several advantages over single-core/custom build hardware.
  - $\checkmark$ Improved performance
  - ✓Instant availability
  - $\checkmark$ Low cost
  - $\checkmark$  Low integration complexity, etc.
- Design mantra of "average case faster" makes MCPs a popular choice in lowcriticality/soft real-time systems.
- The performance-oriented design of MCPs makes them non-deterministic, e.g., due to sharing of hardware resources such as caches, bus and the main memory.

Contention due to shared resources, e.g., main memory, between tasks can have a significant impact on WCET/WCRT of tasks







































Serialized Memory Accesses































Cache line Prefetches/write-backs to main memory









Cache line Prefetches/write-backs to main memory









Cache line Prefetches/write-backs to main memory







Cache line Prefetches/write-backs to main memory

Total Memory Accesses = Total Prefetches + Total Write-backs = 13 + 13 = 26









Cache line Prefetches/write-backs to main memory

Total Memory Accesses = Total Prefetches + Total Write-backs = 13 + 13 = 26









Cache line Prefetches/write-backs to main memory

Total Memory Accesses = Total Prefetches + Total Write-backs = 13 + 13 = 26









Total Memory Accesses = Total Prefetches + Total Write-backs = 13 + 13 = 26









Total Memory Accesses = Total Prefetches + Total Write-backs = 13 + 13 = 26









PORTO

FFUP

FACULDADE DE ENGENHARIA

UNIVERSIDADE DO PORTO

= 13 + 13 = 26





Total Memory Accesses = Total Prefetches + Total Write-backs = 8 + 8 = 16

## **Contribution: Cache-aware schedulability analysis of** <sup>26</sup> **PREM tasks**

- **Preciser estimate of cache line prefetches -> DRCB-only Approach**
- Tighter bound on cache line write-backs -> FDCB-DRCB Approach







• What are Definitely Reused Cache Blocks (DRCBs)?







#### • What are Definitely Reused Cache Blocks (DRCBs)?

Memory blocks that are cached at the end point E of a scheduling interval E\_{i,j-1} and are reused at some program point F in scheduling interval E\_{i,j}, without being evicted on any possible execution path between E to F.







• What are Definitely Reused Cache Blocks (DRCBs)?

Memory blocks that are cached at the end point E of a scheduling interval  $E_{i,j-1}$  and are reused at some program point F in scheduling interval  $E_{i,j}$ , without being evicted on any possible execution path between E to F.



$$DRCB_{i,0} = \{ \}$$
$$RCB_{i,1} = \{C\}, DRCB_{i,2} = \{D, E\}$$





• What are Definitely Reused Cache Blocks (DRCBs)?

Memory blocks that are cached at the end point E of a scheduling interval  $E_{i,j-1}$  and are reused at some program point F in scheduling interval  $E_{i,j}$ , without being evicted on any possible execution path between E to F.



$$DRCB_{i,0} = \{ \}$$
  

$$PRCB_{i,1} = \{C\}, DRCB_{i,2} = \{D, E\}$$
  

$$DRCB_{i,3} = \{F, G\}$$





• What are Definitely Reused Cache Blocks (DRCBs)?

Memory blocks that are cached at the end point E of a scheduling interval  $E_{i,j-1}$  and are reused at some program point F in scheduling interval  $E_{i,j}$ , without being evicted on any possible execution path between E to F.



$$DRCB_{i,0} = \{ \}$$
  

$$DRCB_{i,1} = \{C\}, DRCB_{i,2} = \{D, E\}$$
  

$$DRCB_{i,3} = \{F, G\}$$
  

$$DRCB_{i} = \{C, D, E, F, G\}$$

**CISTER** - Research Center in Real-Time & Embedded Computing Systems



U. PORTO FACULDADE DE ENGENHARIA UNIVERSIDADE DO PORTO

#### • What are Evicting Cache Blocks (ECBs)?

All Memory blocks used by a task (or scheduling interval) during its execution







• What are Evicting Cache Blocks (ECBs)?

All Memory blocks used by a task (or scheduling interval) during its execution









• What are Evicting Cache Blocks (ECBs)?

All Memory blocks used by a task (or scheduling interval) during its execution









• What are Evicting Cache Blocks (ECBs)?

All Memory blocks used by a task (or scheduling interval) during its execution

|             | 1              | E <sub>i,0</sub> | E <sub>i,1</sub> | $E_{i,2}$ | E <sub>i,3</sub> |                                                    |
|-------------|----------------|------------------|------------------|-----------|------------------|----------------------------------------------------|
| Cache Lines | τ <sub>i</sub> | ME               | ME               | ME        | ME               |                                                    |
|             | 7              |                  |                  |           | Н                |                                                    |
|             | 6              |                  |                  | G         | G                | $ECB_{i,0} = \{A, B, C\}, ECB_{i,1} = \{C, D, E\}$ |
|             | 5              |                  |                  | F         | F                |                                                    |
|             | 4              |                  | E                | E         |                  |                                                    |
|             | 3              |                  | D                | D         |                  | $ECB_{i,2} = \{D, E, F, G\}$                       |
|             | 2              | С                | С                |           |                  |                                                    |
|             | 1              | В                |                  |           |                  | $ECB_{i,3} = \{F, G, H\}$                          |
|             | 0              | А                |                  |           |                  |                                                    |
|             |                |                  |                  |           |                  | $ECB_i = \{A, B, C, D, E, F, G, H\}$               |





#### • How the DRCB approach work?

A scheduling interval E\_{i,j} only needs to prefetch ECBs that are not its DRCBs







#### • How the DRCB approach work?

A scheduling interval E\_{i,j} only needs to prefetch ECBs that are not its DRCBs

Memory blocks prefeteched by  $E_{i,j} = ECB_{i,j}/DRCB_{i,j}$ 







 $_{\rm O}$  How the DRCB approach work?

A scheduling interval E\_{i,j} only needs to prefetch ECBs that are not its DRCBs







 $_{\rm O}$  How the DRCB approach work?

A scheduling interval E\_{i,j} only needs to prefetch ECBs that are not its DRCBs



$$DRCB_{i,j}^{E} = DRCB_{i,j} \bigcap \{\bigcup_{\forall k \in hp(i)} ECB_k\}$$

DRCB of E\_{i,j} evicted due to preemptions

**CISTER** - Research Center in Real-Time & Embedded Computing Systems





 $_{\rm O}$  How the DRCB approach work?

A scheduling interval E\_{i,j} only needs to prefetch ECBs that are not its DRCBs



**CISTER** - Research Center in Real-Time & Embedded Computing Systems



FACULDADE DE ENGENHARIA UNIVERSIDADE DO PORTO







• DRCB-only approach overestimates the number of cache write-backs







• DRCB-only approach overestimates the number of cache write-backs







#### • DRCB-only approach overestimates the number of cache write-backs



$$WB_{i,0} = \{A, B, C\}, WB_{i,1} = \{D, E\}$$
  
 $WB_{i,2} = \{F, G\}$   
 $WB_{i,3} = \{H\}$ 







#### • DRCB-only approach overestimates the number of cache write-backs



$$WB_{i,0} = \{A, B, C\}, WB_{i,1} = \{D, E\}$$
$$WB_{i,2} = \{F, G\}$$
$$WB_{i,3} = \{H\}$$

Actual number of write-backs depend on the number of dirty cache lines

CISTER - Research Center in Real-Time & Embedded Computing Systems



U. PORTO EUP FACULDADE DE ENGENHARIA UNIVERSIDADE DO PORTO

#### • What are Final Dirty Cache Blocks (FDCBs)?

Memory blocks that may be written to during the execution of a task/scheduling interval and may still be available in cache after the completion of that task/scheduling interval.







#### • What are Final Dirty Cache Blocks (FDCBs)?

Memory blocks that may be written to during the execution of a task/scheduling interval and may still be available in cache after the completion of that task/scheduling interval.



#### • What are Final Dirty Cache Blocks (FDCBs)?

Memory blocks that may be written to during the execution of a task/scheduling interval and may still be available in cache after the completion of that task/scheduling interval.



• Cache write-backs during the execution of a scheduling interval E\_{i,j} depend on the ECBs of that scheduling interval and FDCBs of all tasks in hep(i) and lp(i)



Col





• Cache write-backs during the execution of a scheduling interval E\_{i,j} depend on the ECBs of that scheduling interval and FDCBs of all tasks in hep(i) and lp(i)



Write-backs due to tasks in lp(i)





• Cache write-backs during the execution of a scheduling interval E\_{i,j} depend on the ECBs of that scheduling interval and FDCBs of all tasks in hep(i) and lp(i)



## **Cache-Aware PREM: WCRT Analysis**

$$C_{i,j} = (|WB_{i,j}^{tot}| + |ECB_{i,j}^{P}|) \times d_{mem} + C_{i,j}^{e}$$

$$WCET \text{ of } E_{i,j} \text{ Total write-backs during the execution of } E_{i,j} \text{ Total prefetches during the execution of } E_{i,j} \text{ Worst-case time to load one block from main memory}}$$

$$C_{i} = \sum_{j=0}^{N_{i}-1} C_{i,j} \qquad R_{i}^{k+1} = B_{i} + C_{i} + \sum_{\forall h \in hp(i)} \left\lceil \frac{R_{i}^{k}}{T_{h}} \right\rceil C_{h}$$

$$WCET \text{ of } \tau_{i}$$

**CISTER** - Research Center in Real-Time & Embedded Computing Systems





# **Experimental Evaluation:** Experimental Setup and Taskset generation

- We modeled a multicore platform with cores m=4, Last-level Cache = 64KB (2048 Cache sets, 32-byte blocks), Cache Prefetch/Write-back Penalty = 100 µSec
- Taskset size = 32, i.e., 8 tasks per core, Taskset Utilizations = UUnifast, Task periods = [5ms, 500ms], WCET = Utilization x Period, ...
- 1000 synthetic tasksets per experimental point, compared SoA cache-agnostic PREM, DRCB-only approach and FDCB-DRCB approach.







### **Experimental Evaluation: Varying per Core Utilization**







## **Experimental Evaluation:** Varying number of Cores and Cache Size



FACULDADE DE ENGENHARIA FFUP UNIVERSIDADE DO PORTO

**Real-Time & Embedded Computing Systems** 

## **Experimental Evaluation: Varying DRCB-ECB and FDCB-ECB** percentage



FACULDADE DE ENGENHARIA FFUP UNIVERSIDADE DO PORTO

## **Conclusion and Future Work**

- Ported established cache-aware schedulability analysis to PREM
- Presented two approaches that tightly estimate cache line loads and write-backs
- DRCB-only approach exploits cache reuse among scheduling intervals to reduce the number of loads. The FDCB-DRCB approach improves on DRCB-only approach by carefully analyzing cache write-backs.
- Experimental evaluation shows that our approaches can achieve up to 55% better schedulability ratio.
- ✤ As future work, we aim to extend our analysis to multi-set approaches.
- We also plan to extend our analysis to consider inter-job cache reuse.



**CISTER** - Research Center in Real-Time & Embedded Computing Systems





## **Conclusion and Future Work**

- Ported established cache-aware schedulability analysis to PREM
- Presented two approaches that tightly estimate cache line loads and write-backs
- DRCB-only approach exploits cache reuse among scheduling intervals to reduce the number of loads. The FDCB-DRCB approach improves on DRCB-only approach by carefully analyzing cache write-backs.
- Experimental evaluation shows that our approaches can achieve up to 55% better schedulability ratio.
- ✤ As future work, we aim to extend our analysis to multi-set approaches.
- **We also plan to extend our analysis to consider inter-job cache reuse.**

Thank You 🙂

syed.aftabrashid@vortex-colab.com

syara@isep.ipp.pt



**CISTER** - Research Center in Real-Time & Embedded Computing Systems



