Apache Spark

30 Minutes

15 Questions

This assessment evaluates a candidate's ability to design, configure, and optimize Apache Spark clusters in on-premise environments. It focuses on practical understanding of cluster architecture, YARN and Standalone deployment, resource management, storage and shuffle tuning, networking, security (Kerberos), and performance troubleshooting. Questions combine real-world premises, configuration snippets, and code examples to test deep knowledge of Spark internals, JVM tuning, and data-processing efficiency on self-managed infrastructure.

Example Question:

Multiple-Choice

Your on-prem Spark cluster uses YARN. You enable dynamic allocation to optimize executor usage:

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=2
spark.dynamicAllocation.maxExecutors=40
spark.shuffle.service.enabled=false

However, tasks fail with:

"Cannot find shuffle files for lost executor"

What caused this failure?

Answers

1. Dynamic allocation requires an external shuffle service running on each node.

2. The driver lost connection to the YARN ResourceManager.

3. The max executor count is too high for YARN.

4. Shuffle compression must be disabled when using dynamic allocation.