Spark practice questions



Set of questions & answers to test your introductory knowledge on Apache Spark concepts

1. What is the USP for Apache Spark?

    a. It runs programs in-memory up to 100x faster than MapReduce
    b. It offers over 80 high level operators
    c. Can be used from Scala and Python shells
    d. Product is already 5 years old

1. What is the USP for Apache Spark?

Answer: a.
It runs programs in-memory up to 100x faster than MapReduce


Explanation: Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. Runs programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

2. Which of the following is not an associated component of Spark?

    a. Shark for SQL
    b. MLlib for machine learning
    c. Spark Streaming
    d. Giraph

2. Which of the following is not an associated component of Spark?

Answer: d.
Giraph


Explanation: Spark uses GraphX for graph processing. It enables users to easily and interactively build, transform, and reason about graph structured data at scale.

3. What is the status of Apache Spark as an Apache Software Foundation project?

    a. Incubator
    b. Sub project
    c. Top level project
    d. Proposal

3. What is the status of Apache Spark as an Apache Software Foundation project?

Answer: c.
Top Level project


Explanation: Spark has graduated from the Apache Incubator to become a top-level Apache project, signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles.

4. Who among the following offers commercial distribution of Apache Spark?

    a. DataBricks
    b. Cloudera
    c. MapR
    d. All of the above

4. Who among the following offers commercial distribution of Apache Spark?

Answer: a.
All of the above


Explanation: Spark is available in open source as Apache Spark or as commercial distribution by DataBricks, Cloudera and MapR.
Click on page number to flip the pages.