New Year Special 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Pass Using Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Dumps

Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Question 13

Which of the following statements about data skew is incorrect?

Options:

A.

Spark will not automatically optimize skew joins by default.

B.

Broadcast joins are a viable way to increase join performance for skewed data over sort-merge joins.

C.

In skewed DataFrames, the largest and the smallest partition consume very different amounts of memory.

D.

To mitigate skew, Spark automatically disregards null values in keys when joining.

E.

Salting can resolve data skew.

Question 14

The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)

Options:

A.

1. save

2. mode

3. "ignore"

4. "compression"

5. path

B.

1. store

2. with

3. "replacement"

4. "compression"

5. path

C.

1. write

2. mode

3. "overwrite"

4. "compression"

5. save

(Correct)

D.

1. save

2. mode

3. "replace"

4. "compression"

5. path

E.

1. write

2. mode

3. "overwrite"

4. compression

5. parquet

Question 15

Which of the following statements about DAGs is correct?

Options:

A.

DAGs help direct how Spark executors process tasks, but are a limitation to the proper execution of a query when an executor fails.

B.

DAG stands for "Directing Acyclic Graph".

C.

Spark strategically hides DAGs from developers, since the high degree of automation in Spark means that developers never need to consider DAG layouts.

D.

In contrast to transformations, DAGs are never lazily executed.

E.

DAGs can be decomposed into tasks that are executed in parallel.

Question 16

Which of the following code blocks returns a DataFrame where columns predError and productId are removed from DataFrame transactionsDf?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId|f |

3.+-------------+---------+-----+-------+---------+----+

4.|1 |3 |4 |25 |1 |null|

5.|2 |6 |7 |2 |2 |null|

6.|3 |3 |null |25 |3 |null|

7.+-------------+---------+-----+-------+---------+----+

Options:

A.

transactionsDf.withColumnRemoved("predError", "productId")

B.

transactionsDf.drop(["predError", "productId", "associateId"])

C.

transactionsDf.drop("predError", "productId", "associateId")

D.

transactionsDf.dropColumns("predError", "productId", "associateId")

E.

transactionsDf.drop(col("predError", "productId"))