This assessment evaluates a candidate's ability to design, configure, and optimize Apache Spark clusters in on-premise environments. It focuses on practical understanding of cluster architecture, YARN and Standalone deployment, resource management, storage and shuffle tuning, networking, security (Kerberos), and performance troubleshooting. Questions combine real-world premises, configuration snippets, and code examples to test deep knowledge of Spark internals, JVM tuning, and data-processing efficiency on self-managed infrastructure.
Example Question:
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=2
spark.dynamicAllocation.maxExecutors=40
spark.shuffle.service.enabled=false