Close

Presentation

This content is available for: Workshop Reg Pass. Upgrade Registration
Distributed Data Locality-Aware Job Allocation
DescriptionScheduling tasks close to their associated data is crucial in distributed systems to minimize network traffic and latency. Some Big Data frameworks like Apache Spark employ locality functions and job allocation algorithms to minimize network traffic and execution times. However, these frameworks rely on centralized mechanisms, where the master node determines data locality by allocating tasks to available workers with minimal data transfer time, ignoring variances in worker configurations and availability. To address these limitations, we propose a decentralized approach to locality-driven scheduling that grants workers autonomy in the job allocation process while factoring in workers' configurations, such as network and CPU speed differences. Our approach is developed and evaluated on Crossflow, a distributed stream processing platform with data-aware independent worker nodes. Preliminary evaluation experiments indicate that our approach can yield up to 3.57x faster execution times when compared to the baseline centralized approach where the master controls data locality.
Event Type
Workshop
TimeMonday, 13 November 202311:39am - 11:57am MST
Location704-706
Tags
Data Analysis, Visualization, and Storage
Large Scale Systems
Programming Frameworks and System Software
Reproducibility
Resource Management
Runtime Systems
Registration Categories
W