Pass Databricks-Generative-AI-Engineer-Associate Exam Guide

Databricks Certified Generative AI Engineer Associate Questions and Answers

Question 5

After changing the response generating LLM in a RAG pipeline from GPT-4 to a model with a shorter context length that the company self-hosts, the Generative AI Engineer is getting the following error:

What TWO solutions should the Generative AI Engineer implement without changing the response generating model? (Choose two.)

Options:

Use a smaller embedding model to generate

Reduce the maximum output tokens of the new model

Decrease the chunk size of embedded documents

Reduce the number of records retrieved from the vector database

Retrain the response generating model using ALiBi

Question 6

A Generative Al Engineer is building a RAG application that answers questions about internal documents for the company SnoPen AI.

The source documents may contain a significant amount of irrelevant content, such as advertisements, sports news, or entertainment news, or content about other companies.

Which approach is advisable when building a RAG application to achieve this goal of filtering irrelevant information?

Options:

Keep all articles because the RAG application needs to understand non-company content to avoid answering questions about them.

Include in the system prompt that any information it sees will be about SnoPenAI, even if no data filtering is performed.

Include in the system prompt that the application is not supposed to answer any questions unrelated to SnoPen Al.

Consolidate all SnoPen AI related documents into a single chunk in the vector database.

Question 7

Which indicator should be considered to evaluate the safety of the LLM outputs when qualitatively assessing LLM responses for a translation use case?

Options:

The ability to generate responses in code

The similarity to the previous language

The latency of the response and the length of text generated

The accuracy and relevance of the responses

Question 8

A Generative Al Engineer is deciding between using LSH (Locality Sensitive Hashing) and HNSW (Hierarchical Navigable Small World) for indexing their vector database Their top priority is semantic accuracy

Which approach should the Generative Al Engineer use to evaluate these two techniques?

Options:

Compare the cosine similarities of the embeddings of returned results against those of a representative sample of test inputs

Compare the Bilingual Evaluation Understudy (BLEU) scores of returned results for a representative sample of test inputs

Compare the Recall-Onented-Understudy for Gistmg Evaluation (ROUGE) scores of returned results for a representative sample of test inputs

Compare the Levenshtein distances of returned results against a representative sample of test inputs

Answer:

Explanation:

The task is to choose between LSH and HNSW for a vector database index, prioritizing semantic accuracy. The evaluation must assess how well each method retrieves semantically relevant results. Let’s evaluate the options.

Option A: Compare the cosine similarities of the embeddings of returned results against those of a representative sample of test inputs

Cosine similarity measures semantic closeness between vectors, directly assessing retrieval accuracy in a vector database. Comparing returned results’ embeddings to test inputs’ embeddings evaluates how well LSH or HNSW preserves semantic relationships, aligning with the priority.

Databricks Reference:"Cosine similarity is a standard metric for evaluating vector search accuracy"("Databricks Vector Search Documentation," 2023).

Option B: Compare the Bilingual Evaluation Understudy (BLEU) scores of returned results for a representative sample of test inputs

BLEU evaluates text generation (e.g., translations), not vector retrieval accuracy. It’s irrelevant for indexing performance.

Databricks Reference:"BLEU applies to generative tasks, not retrieval"("Generative AI Cookbook").

Option C: Compare the Recall-Oriented-Understudy for Gisting Evaluation (ROUGE) scores of returned results for a representative sample of test inputs

ROUGE is for summarization evaluation, not vector search. It doesn’t measure semantic accuracy in retrieval.

Databricks Reference:"ROUGE is unsuited for vector database evaluation"("Building LLM Applications with Databricks").

Option D: Compare the Levenshtein distances of returned results against a representative sample of test inputs

Levenshtein distance measures string edit distance, not semantic similarity in embeddings. It’s inappropriate for vector-based retrieval.

Databricks Reference: No specific support for Levenshtein in vector search contexts.

Conclusion: Option A (cosine similarity) is the correct approach, directly evaluating semantic accuracy in vector retrieval, as recommended by Databricks for Vector Search assessments.

Summer Special - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: top65certs

Pass Databricks-Generative-AI-Engineer-Associate Exam Guide

Databricks Certified Generative AI Engineer Associate Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

CompTIA

Fortinet

Microsoft

Salesforce