New Year Special 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Free Access Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 New Release

Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Question 21

The code block displayed below contains an error. The code block should count the number of rows that have a predError of either 3 or 6. Find the error.

Code block:

transactionsDf.filter(col('predError').in([3, 6])).count()

Options:

A.

The number of rows cannot be determined with the count() operator.

B.

Instead of filter, the select method should be used.

C.

The method used on column predError is incorrect.

D.

Instead of a list, the values need to be passed as single arguments to the in operator.

E.

Numbers 3 and 6 need to be passed as string variables.

Question 22

The code block displayed below contains an error. The code block should return a DataFrame where all entries in column supplier contain the letter combination et in this order. Find the error.

Code block:

itemsDf.filter(Column('supplier').isin('et'))

Options:

A.

The Column operator should be replaced by the col operator and instead of isin, contains should be used.

B.

The expression inside the filter parenthesis is malformed and should be replaced by isin('et', 'supplier').

C.

Instead of isin, it should be checked whether column supplier contains the letters et, so isin should be replaced with contains. In addition, the column should be accessed using col['supplier'].

D.

The expression only returns a single column and filter should be replaced by select.

Question 23

The code block shown below should return a DataFrame with only columns from DataFrame transactionsDf for which there is a corresponding transactionId in DataFrame itemsDf. DataFrame

itemsDf is very small and much smaller than DataFrame transactionsDf. The query should be executed in an optimized way. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

__1__.__2__(__3__, __4__, __5__)

Options:

A.

1. transactionsDf

2. join

3. broadcast(itemsDf)

4. transactionsDf.transactionId==itemsDf.transactionId

5. "outer"

B.

1. transactionsDf

2. join

3. itemsDf

4. transactionsDf.transactionId==itemsDf.transactionId

5. "anti"

C.

1. transactionsDf

2. join

3. broadcast(itemsDf)

4. "transactionId"

5. "left_semi"

D.

1. itemsDf

2. broadcast

3. transactionsDf

4. "transactionId"

5. "left_semi"

E.

1. itemsDf

2. join

3. broadcast(transactionsDf)

4. "transactionId"

5. "left_semi"

Question 24

In which order should the code blocks shown below be run in order to create a table of all values in column attributes next to the respective values in column supplier in DataFrame itemsDf?

1. itemsDf.createOrReplaceView("itemsDf")

2. spark.sql("FROM itemsDf SELECT 'supplier', explode('Attributes')")

3. spark.sql("FROM itemsDf SELECT supplier, explode(attributes)")

4. itemsDf.createOrReplaceTempView("itemsDf")

Options:

A.

4, 3

B.

1, 3

C.

2

D.

4, 2

E.

1, 2