Black Friday Special 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Helping Hand Questions for Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0

Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Question 17

Which of the following code blocks creates a new DataFrame with two columns season and wind_speed_ms where column season is of data type string and column wind_speed_ms is of data type

double?

Options:

A.

spark.DataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

B.

spark.createDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

C.

1. from pyspark.sql import types as T

2. spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D.

spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E.

spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Question 18

The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively.

Find the error.

Code block:

1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color")

Instead of calling spark.createDataFrame, just DataFrame should be called.

Options:

A.

The commas in the tuples with the colors should be eliminated.

B.

The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples.

C.

Instead of color, a data type should be specified.

D.

The "color" expression needs to be wrapped in brackets, so it reads ["color"].

Question 19

The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

A.

The column names should be listed directly as arguments to the operator and not as a list.

B.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed

as strings without being wrapped in a col() operator.

C.

The select operator should be replaced by a drop operator.

D.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and

f should be replaced by transactionId, predError, value and storeId.

E.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Question 20

Which of the following code blocks returns a DataFrame that has all columns of DataFrame transactionsDf and an additional column predErrorSquared which is the squared value of column

predError in DataFrame transactionsDf?

Options:

A.

transactionsDf.withColumn("predError", pow(col("predErrorSquared"), 2))

B.

transactionsDf.withColumnRenamed("predErrorSquared", pow(predError, 2))

C.

transactionsDf.withColumn("predErrorSquared", pow(col("predError"), lit(2)))

D.

transactionsDf.withColumn("predErrorSquared", pow(predError, lit(2)))

E.

transactionsDf.withColumn("predErrorSquared", "predError"**2)