Month End Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Free Access Amazon Web Services MLS-C01 New Release

Page: 22 / 23
Total 322 questions

AWS Certified Machine Learning - Specialty Questions and Answers

Question 85

A Machine Learning Specialist is building a logistic regression model that will predict whether or not a person will order a pizza. The Specialist is trying to build the optimal model with an ideal classification threshold.

What model evaluation technique should the Specialist use to understand how different classification thresholds will impact the model's performance?

Options:

A.

Receiver operating characteristic (ROC) curve

B.

Misclassification rate

C.

Root Mean Square Error (RM&)

D.

L1 norm

Question 86

A machine learning (ML) specialist needs to extract embedding vectors from a text series. The goal is to provide a ready-to-ingest feature space for a data scientist to develop downstream ML predictive models. The text consists of curated sentences in English. Many sentences use similar words but in different contexts. There are questions and answers among the sentences, and the embedding space must differentiate between them.

Which options can produce the required embedding vectors that capture word context and sequential QA information? (Choose two.)

Options:

A.

Amazon SageMaker seq2seq algorithm

B.

Amazon SageMaker BlazingText algorithm in Skip-gram mode

C.

Amazon SageMaker Object2Vec algorithm

D.

Amazon SageMaker BlazingText algorithm in continuous bag-of-words (CBOW) mode

E.

Combination of the Amazon SageMaker BlazingText algorithm in Batch Skip-gram mode with a custom recurrent neural network (RNN)

Question 87

A data scientist is training a text classification model by using the Amazon SageMaker built-in BlazingText algorithm. There are 5 classes in the dataset, with 300 samples for category A, 292 samples for category B, 240 samples for category C, 258 samples for category D, and 310 samples for category E.

The data scientist shuffles the data and splits off 10% for testing. After training the model, the data scientist generates confusion matrices for the training and test sets.

What could the data scientist conclude form these results?

Options:

A.

Classes C and D are too similar.

B.

The dataset is too small for holdout cross-validation.

C.

The data distribution is skewed.

D.

The model is overfitting for classes B and E.

Question 88

A network security vendor needs to ingest telemetry data from thousands of endpoints that run all over the world. The data is transmitted every 30 seconds in the form of records that contain 50 fields. Each record is up to 1 KB in size. The security vendor uses Amazon Kinesis Data Streams to ingest the data. The vendor requires hourly summaries of the records that Kinesis Data Streams ingests. The vendor will use Amazon Athena to query the records and to generate the summaries. The Athena queries will target 7 to 12 of the available data fields.

Which solution will meet these requirements with the LEAST amount of customization to transform and store the ingested data?

Options:

A.

Use AWS Lambda to read and aggregate the data hourly. Transform the data and store it in Amazon S3 by using Amazon Kinesis Data Firehose.

B.

Use Amazon Kinesis Data Firehose to read and aggregate the data hourly. Transform the data and store it in Amazon S3 by using a short-lived Amazon EMR cluster.

C.

Use Amazon Kinesis Data Analytics to read and aggregate the data hourly. Transform the data and store it in Amazon S3 by using Amazon Kinesis Data Firehose.

D.

Use Amazon Kinesis Data Firehose to read and aggregate the data hourly. Transform the data and store it in Amazon S3 by using AWS Lambda.

Page: 22 / 23
Total 322 questions