Winter Special - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: top65certs

Free and Premium Databricks Databricks-Machine-Learning-Professional Dumps Questions Answers

Databricks Certified Machine Learning Professional Questions and Answers

Question 1

A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.

They are using the following code block:

The code block is not nesting the runs in MLflow as they expected.

Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?

Options:

A.

Indent the child run blocks within the parent run block

B.

Add the nested=True argument to the parent run

C.

Remove the nested=True argument from the child runs

D.

Provide the same name to the run name parameter for all three run blocks

E.

Add the nested=True argument to the parent run and remove the nested=True arguments from the child runs

Buy Now
Question 2

Which of the following MLflow operations can be used to automatically calculate and log a Shapley feature importance plot?

Options:

A.

mlflow.shap.log_explanation

B.

None of these operations can accomplish the task.

C.

mlflow.shap

D.

mlflow.log_figure

E.

client.log_artifact

Question 3

Which of the following is a benefit of logging a model signature with an MLflow model?

Options:

A.

The model will have a unique identifier in the MLflow experiment

B.

The schema of input data can be validated when serving models

C.

The model can be deployed using real-time serving tools

D.

The model will be secured by the user that developed it

E.

The schema of input data will be converted to match the signature

Question 4

A data scientist has computed updated feature values for all primary key values stored in the Feature Store table features. In addition, feature values for some new primary key values have also been computed. The updated feature values are stored in the DataFrame features_df. They want to replace all data in features with the newly computed data.

Which of the following code blocks can they use to perform this task using the Feature Store Client fs?

A)

B)

C)

D)

E)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Question 5

Which of the following MLflow Model Registry use cases requires the use of an HTTP Webhook?

Options:

A.

Starting a testing job when a new model is registered

B.

Updatingdata in a source table for a Databricks SQL dashboard when a model version transitions to the Production stage

C.

Sending an email alert when an automated testing Job fails

D.

None of these use cases require the use of an HTTP Webhook

E.

Sending a message to a Slack channel when a model version transitions stages

Question 6

A machine learning engineer wants to log and deploy a model as an MLflow pyfunc model. They have custom preprocessing that needs to be completed on feature variables prior to fitting the model or computing predictions using that model. They decide to wrap this preprocessing in a custom model class ModelWithPreprocess, where the preprocessing is performed when calling fit and when calling predict. They then log the fitted model of the ModelWithPreprocess class as a pyfunc model.

Which of the following is a benefit of this approach when loading the logged pyfunc model for downstream deployment?

Options:

A.

The pvfunc model can be used to deploy models in a parallelizable fashion

B.

The same preprocessing logic will automatically be applied when calling fit

C.

The same preprocessing logic will automatically be applied when calling predict

D.

This approach has no impact when loading the logged Pvfunc model for downstream deployment

E.

There is no longer a need for pipeline-like machine learning objects

Question 7

Which of the following lists all of the model stages are available in the MLflow Model Registry?

Options:

A.

Development. Staging. Production

B.

None. Staging. Production

C.

Staging. Production. Archived

D.

None. Staging. Production. Archived

E.

Development. Staging. Production. Archived

Question 8

Which of the following describes concept drift?

Options:

A.

Concept drift is when there is a change in the distribution of an input variable

B.

Concept drift is when there is a change in the distribution of a target variable

C.

Concept drift is when there is a change in the relationship between input variables and target variables

D.

Concept drift is when there is a change in the distribution of the predicted target given by the model

E.

None of these describe Concept drift

Question 9

Which of the following tools can assist in real-time deployments by packaging software with its own application, tools, and libraries?

Options:

A.

Cloud-based compute

B.

None of these tools

C.

REST APIs

D.

Containers

E.

Autoscaling clusters

Question 10

A data scientist has written a function to track the runs of their random forest model. The data scientist is changing the number of trees in the forest across each run.

Which of the following MLflow operations is designed to log single values like the number of trees in a random forest?

Options:

A.

mlflow.log_artifact

B.

mlflow.log_model

C.

mlflow.log_metric

D.

mlflow.log_param

E.

There is no way to store values like this.

Question 11

A machine learning engineer and data scientist are working together to convert a batch deployment to an always-on streaming deployment. The machine learning engineer has expressed that rigorous data tests must be put in place as a part of their conversion to account for potential changes in data formats.

Which of the following describes why these types of data type tests and checks are particularly important for streaming deployments?

Options:

A.

Because the streaming deployment is always on, all types of data must be handled without producing an error

B.

All of these statements

C.

Because the streaming deployment is always on, there is no practitioner to debug poor model performance

D.

Because the streamingdeployment is always on, there is a need to confirm that the deployment can autoscale

E.

None of these statements

Question 12

A machine learning engineer has developed a model and registered it using the FeatureStoreClient fs. The model has model URI model_uri. The engineer now needs to perform batch inference on customer-level Spark DataFrame spark_df, but it is missing a few of the static features that were used when training the model. The customer_id column is the primary key of spark_df and the training set used when training and logging the model.

Which of the following code blocks can be used to compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id?

Options:

A.

df = fs.get_missing_features(spark_df, model_uri)

fs.score_model(model_uri, df)

B.

fs.score_model(model_uri, spark_df)

C.

df = fs.get_missing_features(spark_df, model_uri)

fs.score_batch(model_uri, df)

df = fs.get_missing_features(spark_df)

D.

fs.score_batch(model_uri, df)

E.

fs.score_batch(model_uri, spark_df)

Question 13

A data scientist wants to remove the star_rating column from the Delta table at the location path. To do this, they need to load in data and drop the star_rating column.

Which of the following code blocks accomplishes this task?

Options:

A.

spark.read.format(“delta”).load(path).drop(“star_rating”)

B.

spark.read.format(“delta”).table(path).drop(“star_rating”)

C.

Delta tables cannot be modified

D.

spark.read.table(path).drop(“star_rating”)

E.

spark.sql(“SELECT * EXCEPT star_rating FROM path”)

Question 14

Which of the following machine learning model deployment paradigms is the most common for machine learning projects?

Options:

A.

On-device

B.

Streaming

C.

Real-time

D.

Batch

E.

None of these deployments

Question 15

A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.

Which of the following tools can be used to provide this type of continuous processing?

Options:

A.

Spark UDFs

B.

[Structured Streaming

C.

MLflow

D Delta Lake

D.

AutoML

Question 16

A data scientist has developed and logged a scikit-learn random forest model model, and then they ended their Spark session and terminated their cluster. After starting a new cluster, they want to review the feature_importances_ of the original model object.

Which of the following lines of code can be used to restore the model object so that feature_importances_ is available?

Options:

A.

mlflow.load_model(model_uri)

B.

client.list_artifacts(run_id)["feature-importances.csv"]

C.

mlflow.sklearn.load_model(model_uri)

D.

This can only be viewed in the MLflow Experiments UI

E.

client.pyfunc.load_model(model_uri)

Question 17

A machine learning engineer wants to log feature importance data from a CSV file at path importance_path with an MLflow run for model model.

Which of the following code blocks will accomplish this task inside of an existing MLflow run block?

Options:

A.

B.

C.

mlflow.log_data(importance_path, "feature-importance.csv")

D.

mlflow.log_artifact(importance_path, "feature-importance.csv")

E.

None of these code blocks tan accomplish the task.

Question 18

A machine learning engineer is attempting to create a webhook that will trigger a Databricks Jobjob_idwhen a model version for modelmodeltransitions into any MLflow Model Registry stage.

They have the following incomplete code block:

Which of the following lines of code can be used to fill in the blank so that the code block accomplishes the task?

Options:

A.

"MODEL_VERSION_CREATED"

B.

"MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"

C.

"MODEL_VERSION_TRANSITIONED_TO_STAGING"

D.

"MODEL_VERSION_TRANSITIONED_STAGE"

E.

"MODEL_VERSION_TRANSITIONED_TO_STAGING", "MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"