Latest Databricks Databricks-Generative-AI-Engineer-Associate Dumps PDF Questions Answers 2025

Databricks Certified Generative AI Engineer Associate Questions and Answers

Question 1

A Generative Al Engineer is developing a RAG application and would like to experiment with different embedding models to improve the application performance.

Which strategy for picking an embedding model should they choose?

Options:

Pick an embedding model trained on related domain knowledge

Pick the most recent and most performant open LLM released at the time

pick the embedding model ranked highest on the Massive Text Embedding Benchmark (MTEB) leaderboard hosted by HuggingFace

Pick an embedding model with multilingual support to support potential multilingual user questions

Buy Now

Answer:

Explanation:

The task involves improving a Retrieval-Augmented Generation (RAG) application’s performance by experimenting with embedding models. The choice of embedding model impacts retrieval accuracy,which is critical for RAG systems. Let’s evaluate the options based on Databricks Generative AI Engineer best practices.

Option A: Pick an embedding model trained on related domain knowledge

Embedding models trained on domain-specific data (e.g., industry-specific corpora) produce vectors that better capture the semantics of the application’s context, improving retrieval relevance. For RAG, this is a key strategy to enhance performance.

Databricks Reference:"For optimal retrieval in RAG systems, select embedding models aligned with the domain of your data"("Building LLM Applications with Databricks," 2023).

Option B: Pick the most recent and most performant open LLM released at the time

LLMs are not embedding models; they generate text, not embeddings for retrieval. While recent LLMs may be performant for generation, this doesn’t address the embedding step in RAG. This option misunderstands the component being selected.

Databricks Reference: Embedding models and LLMs are distinct in RAG workflows:"Embedding models convert text to vectors, while LLMs generate responses"("Generative AI Cookbook").

Option C: Pick the embedding model ranked highest on the Massive Text Embedding Benchmark (MTEB) leaderboard hosted by HuggingFace

The MTEB leaderboard ranks models across general tasks, but high overall performance doesn’t guarantee suitability for a specific domain. A top-ranked model might excel in generic contexts but underperform on the engineer’s unique data.

Databricks Reference: General performance is less critical than domain fit:"Benchmark rankings provide a starting point, but domain-specific evaluation is recommended"("Databricks Generative AI Engineer Guide").

Option D: Pick an embedding model with multilingual support to support potential multilingual user questions

Multilingual support is useful only if the application explicitly requires it. Without evidence of multilingual needs, this adds complexity without guaranteed performance gains for the current use case.

Databricks Reference:"Choose features like multilingual support based on application requirements"("Building LLM-Powered Applications").

Conclusion: Option A is the best strategy because it prioritizes domain relevance, directly improving retrieval accuracy in a RAG system—aligning with Databricks’ emphasis on tailoring models to specific use cases.

Question 2

A Generative Al Engineer is creating an LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative Al Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from.

Which will fulfill their need?

Options:

context length 514; smallest model is 0.44GB and embedding dimension 768

context length 2048: smallest model is 11GB and embedding dimension 2560

context length 32768: smallest model is 14GB and embedding dimension 4096

context length 512: smallest model is 0.13GB and embedding dimension 384

Question 3

A Generative AI Engineer has been asked to build an LLM-based question-answering application. The application should take into account new documents that are frequently published. The engineer wants to build this application with the least cost and least development effort and have it operate at the lowest cost possible.

Which combination of chaining components and configuration meets these requirements?

Options:

For the application a prompt, a retriever, and an LLM are required. The retriever output is inserted into the prompt which is given to the LLM to generate answers.

The LLM needs to be frequently with the new documents in order to provide most up-to-date answers.

For the question-answering application, prompt engineering and an LLM are required to generate answers.

For the application a prompt, an agent and a fine-tuned LLM are required. The agent is used by the LLM to retrieve relevant content that is inserted into the prompt which is given to the LLM to generate answers.

Question 4

A Generative AI Engineer developed an LLM application using the provisioned throughput Foundation Model API. Now that the application is ready to be deployed, they realize their volume of requests are not sufficiently high enough to create their own provisioned throughput endpoint. They want to choose a strategy that ensures the best cost-effectiveness for their application.

What strategy should the Generative AI Engineer use?

Options:

Switch to using External Models instead

Deploy the model using pay-per-token throughput as it comes with cost guarantees

Change to a model with a fewer number of parameters in order to reduce hardware constraint issues

Throttle the incoming batch of requests manually to avoid rate limiting issues

Question 5

After changing the response generating LLM in a RAG pipeline from GPT-4 to a model with a shorter context length that the company self-hosts, the Generative AI Engineer is getting the following error:

What TWO solutions should the Generative AI Engineer implement without changing the response generating model? (Choose two.)

Options:

Use a smaller embedding model to generate

Reduce the maximum output tokens of the new model

Decrease the chunk size of embedded documents

Reduce the number of records retrieved from the vector database

Retrain the response generating model using ALiBi

Question 6

A Generative Al Engineer is building a RAG application that answers questions about internal documents for the company SnoPen AI.

The source documents may contain a significant amount of irrelevant content, such as advertisements, sports news, or entertainment news, or content about other companies.

Which approach is advisable when building a RAG application to achieve this goal of filtering irrelevant information?

Options:

Keep all articles because the RAG application needs to understand non-company content to avoid answering questions about them.

Include in the system prompt that any information it sees will be about SnoPenAI, even if no data filtering is performed.

Include in the system prompt that the application is not supposed to answer any questions unrelated to SnoPen Al.

Consolidate all SnoPen AI related documents into a single chunk in the vector database.

Question 7

Which indicator should be considered to evaluate the safety of the LLM outputs when qualitatively assessing LLM responses for a translation use case?

Options:

The ability to generate responses in code

The similarity to the previous language

The latency of the response and the length of text generated

The accuracy and relevance of the responses

Question 8

A Generative Al Engineer is deciding between using LSH (Locality Sensitive Hashing) and HNSW (Hierarchical Navigable Small World) for indexing their vector database Their top priority is semantic accuracy

Which approach should the Generative Al Engineer use to evaluate these two techniques?

Options:

Compare the cosine similarities of the embeddings of returned results against those of a representative sample of test inputs

Compare the Bilingual Evaluation Understudy (BLEU) scores of returned results for a representative sample of test inputs

Compare the Recall-Onented-Understudy for Gistmg Evaluation (ROUGE) scores of returned results for a representative sample of test inputs

Compare the Levenshtein distances of returned results against a representative sample of test inputs

Answer:

Explanation:

The task is to choose between LSH and HNSW for a vector database index, prioritizing semantic accuracy. The evaluation must assess how well each method retrieves semantically relevant results. Let’s evaluate the options.

Option A: Compare the cosine similarities of the embeddings of returned results against those of a representative sample of test inputs

Cosine similarity measures semantic closeness between vectors, directly assessing retrieval accuracy in a vector database. Comparing returned results’ embeddings to test inputs’ embeddings evaluates how well LSH or HNSW preserves semantic relationships, aligning with the priority.

Databricks Reference:"Cosine similarity is a standard metric for evaluating vector search accuracy"("Databricks Vector Search Documentation," 2023).

Option B: Compare the Bilingual Evaluation Understudy (BLEU) scores of returned results for a representative sample of test inputs

BLEU evaluates text generation (e.g., translations), not vector retrieval accuracy. It’s irrelevant for indexing performance.

Databricks Reference:"BLEU applies to generative tasks, not retrieval"("Generative AI Cookbook").

Option C: Compare the Recall-Oriented-Understudy for Gisting Evaluation (ROUGE) scores of returned results for a representative sample of test inputs

ROUGE is for summarization evaluation, not vector search. It doesn’t measure semantic accuracy in retrieval.

Databricks Reference:"ROUGE is unsuited for vector database evaluation"("Building LLM Applications with Databricks").

Option D: Compare the Levenshtein distances of returned results against a representative sample of test inputs

Levenshtein distance measures string edit distance, not semantic similarity in embeddings. It’s inappropriate for vector-based retrieval.

Databricks Reference: No specific support for Levenshtein in vector search contexts.

Conclusion: Option A (cosine similarity) is the correct approach, directly evaluating semantic accuracy in vector retrieval, as recommended by Databricks for Vector Search assessments.

Question 9

A Generative Al Engineer would like an LLM to generate formatted JSON from emails. This will require parsing and extracting the following information: order ID, date, and sender email. Here’s a sample email:

They will need to write a prompt that will extract the relevant information in JSON format with the highest level of output accuracy.

Which prompt will do that?

Options:

You will receive customer emails and need to extract date, sender email, and order ID. You should return the date, sender email, and order ID information in JSON format.

You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format.

Here’s an example: {“date”: “April 16, 2024”, “sender_email”: “sarah.lee925@gmail.com”, “order_id”: “RE987D”}

You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in a human-readable format.

You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format.

Question 10

Generative AI Engineer at an electronics company just deployed a RAG application for customers to ask questions about products that the company carries. However, they received feedback that the RAG response often returns information about an irrelevant product.

What can the engineer do to improve the relevance of the RAG’s response?

Options:

Assess the quality of the retrieved context

Implement caching for frequently asked questions

Use a different LLM to improve the generated response

Use a different semantic similarity search algorithm

Question 11

A Generative Al Engineer has created a RAG application to look up answers to questions about a series of fantasy novels that are being asked on the author’s web forum. The fantasy novel texts are chunked and embedded into a vector store with metadata (page number, chapter number, book title), retrieved with the user’s query, and provided to an LLM for response generation. The Generative AI Engineer used their intuition to pick the chunking strategy and associated configurations but now wants to more methodically choose the best values.

Which TWO strategies should the Generative AI Engineer take to optimize their chunking strategy and parameters? (Choose two.)

Options:

Change embedding models and compare performance.

Add a classifier for user queries that predicts which book will best contain the answer. Use this to filter retrieval.

Choose an appropriate evaluation metric (such as recall or NDCG) and experiment with changes in the chunking strategy, such as splitting chunks by paragraphs or chapters.

Choose the strategy that gives the best performance metric.

Pass known questions and best answers to an LLM and instruct the LLM to provide the best token count. Use a summary statistic (mean, median, etc.) of the best token counts to choose chunk size.

Create an LLM-as-a-judge metric to evaluate how well previous questions are answered by the most appropriate chunk. Optimize the chunking parameters based upon the values of the metric.

Answer:

C, E

Explanation:

To optimize a chunking strategy for a Retrieval-Augmented Generation (RAG) application, the Generative AI Engineer needs a structured approach to evaluating the chunking strategy, ensuring that the chosen configuration retrieves the most relevant information and leads to accurate and coherent LLM responses. Here's whyCandEare the correct strategies:

Strategy C: Evaluation Metrics (Recall, NDCG)

Define an evaluation metric: Common evaluation metrics such as recall, precision, or NDCG (Normalized Discounted Cumulative Gain) measure how well the retrieved chunks match the user's query and the expected response.

Recallmeasures the proportion of relevant information retrieved.

NDCGis often used when you want to account for both the relevance of retrieved chunks and the ranking or order in which they are retrieved.

Experiment with chunking strategies: Adjusting chunking strategies based on text structure (e.g., splitting by paragraph, chapter, or a fixed number of tokens) allows the engineer to experiment with various ways of slicing the text. Some chunks may better align with the user's query than others.

Evaluate performance: By using recall or NDCG, the engineer can methodically test various chunking strategies to identify which one yields the highest performance. This ensures that the chunking method provides the most relevant information when embedding and retrieving data from the vector store.

Strategy E: LLM-as-a-Judge Metric

Use the LLM as an evaluator: After retrieving chunks, the LLM can be used to evaluate the quality of answers based on the chunks provided. This could be framed as a "judge" function, where the LLM compares how well a given chunk answers previous user queries.

Optimize based on the LLM's judgment: By having the LLM assess previous answers and rate their relevance and accuracy, the engineer can collect feedback on how well different chunking configurations perform in real-world scenarios.

This metric could be a qualitative judgment on how closely the retrieved information matches the user's intent.

Tune chunking parameters: Based on the LLM's judgment, the engineer can adjust the chunk size or structure to better align with the LLM's responses, optimizing retrieval for future queries.

By combining these two approaches, the engineer ensures that the chunking strategy is systematically evaluated using both quantitative (recall/NDCG) and qualitative (LLM judgment) methods. This balanced optimization process results in improved retrieval relevance and, consequently, better response generation by the LLM.

Question 12

A Generative AI Engineer is building a Generative AI system that suggests the best matched employee team member to newly scoped projects. The team member is selected from a very large team. The match should be based upon project date availability and how well their employee profile matches the project scope. Both the employee profile and project scope are unstructured text.

How should the Generative Al Engineer architect their system?

Options:

Create a tool for finding available team members given project dates. Embed all project scopes into a vector store, perform a retrieval using team member profiles to find the best team member.

Create a tool for finding team member availability given project dates, and another tool that uses an LLM to extract keywords from project scopes. Iterate through available team members’ profiles and perform keyword matching to find the best available team member.

Create a tool to find available team members given project dates. Create a second tool that can calculate a similarity score for a combination of team member profile and the project scope. Iterate through the team members and rank by best score to select a team member.

Create a tool for finding available team members given project dates. Embed team profiles into a vector store and use the project scope and filtering to perform retrieval to find the available best matched team members.

Answer:

Explanation:

Problem Context: The problem involves matching team members to new projects based on two main factors:

Availability: Ensure the team members are available during the project dates.

Profile-Project Match: Use the employee profiles (unstructured text) to find the best match for a project’s scope (also unstructured text).

The two main inputs are theemployee profilesandproject scopes, both of which are unstructured. This means traditional rule-based systems (e.g., simple keyword matching) would be inefficient, especially when working with large datasets.

Explanation of Options: Let's break down the provided options to understand why D is the most optimal answer.

Option Asuggests embedding project scopes into a vector store and then performing retrieval using team member profiles. While embedding project scopes into a vector store is a valid technique, it skips an important detail: the focus should primarily be on embedding employee profiles because we're matching the profiles to a new project, not the other way around.

Option Binvolves using a large language model (LLM) to extract keywords from the project scope and perform keyword matching on employee profiles. While LLMs can help with keyword extraction, this approach is too simplistic and doesn’t leverage advanced retrieval techniques like vector embeddings, which can handle the nuanced and rich semantics of unstructured data. This approach may miss out on subtle but important similarities.

Option Csuggests calculating a similarity score between each team member's profile and project scope. While this is a good idea, it doesn’t specify how to handle the unstructured nature of data efficiently. Iterating through each member’s profile individually could be computationally expensive in large teams. It also lacks the mention of using a vector store or an efficient retrieval mechanism.

Option Dis the correct approach. Here’s why:

Embedding team profiles into a vector store: Using a vector store allows for efficient similarity searches on unstructured data. Embedding the team member profiles into vectors captures their semantics in a way that is far more flexible than keyword-based matching.

Using project scope for retrieval: Instead of matching keywords, this approach suggests using vector embeddings and similarity search algorithms (e.g., cosine similarity) to find the team members whose profiles most closely align with the project scope.

Filtering based on availability: Once the best-matched candidates are retrieved based on profile similarity, filtering them by availability ensures that the system provides a practically useful result.

This method efficiently handles large-scale datasets by leveragingvector embeddingsandsimilarity searchtechniques, both of which are fundamental tools inGenerative AI engineeringfor handling unstructured text.

Technical References:

Vector embeddings: In this approach, the unstructured text (employee profiles and project scopes) is converted into high-dimensional vectors using pretrained models (e.g., BERT, Sentence-BERT, or custom embeddings). These embeddings capture the semantic meaning of the text, making it easier to perform similarity-based retrieval.

Vector stores: Solutions likeFAISSorMilvusallow storing and retrieving large numbers of vector embeddings quickly. This is critical when working with large teams where querying through individual profiles sequentially would be inefficient.

LLM Integration: Large language models can assist in generating embeddings for both employee profiles and project scopes. They can also assist in fine-tuning similarity measures, ensuring that the retrieval system captures the nuances of the text data.

Filtering: After retrieving the most similar profiles based on the project scope, filtering based on availability ensures that only team members who are free for the project are considered.

This system is scalable, efficient, and makes use of the latest techniques inGenerative AI, such as vector embeddings and semantic search.

Question 13

A Generative AI Engineer is building a RAG application that will rely on context retrieved from source documents that are currently in PDF format. These PDFs can contain both text and images. They want to develop a solution using the least amount of lines of code.

Which Python package should be used to extract the text from the source documents?

Options:

flask

beautifulsoup

unstructured

numpy

Question 14

A Generative Al Engineer wants their (inetuned LLMs in their prod Databncks workspace available for testing in their dev workspace as well. All of their workspaces are Unity Catalog enabled and they are currently logging their models into the Model Registry in MLflow.

What is the most cost-effective and secure option for the Generative Al Engineer to accomplish their gAi?

Options:

Use an external model registry which can be accessed from all workspaces

Setup a script to export the model from prod and import it to dev.

Setup a duplicate training pipeline in dev, so that an identical model is available in dev.

Use MLflow to log the model directly into Unity Catalog, and enable READ access in the dev workspace to the model.

Answer:

Explanation:

The goal is to make fine-tuned LLMs from a production (prod) Databricks workspace available for testing in a development (dev) workspace, leveraging Unity Catalog and MLflow, while ensuring cost-effectiveness and security. Let’s analyze the options.

Option A: Use an external model registry which can be accessed from all workspaces

An external registry adds cost (e.g., hosting fees) and complexity (e.g., integration, security configurations) outside Databricks’ native ecosystem, reducing security compared to Unity Catalog’s governance.

Databricks Reference:"Unity Catalog provides a centralized, secure model registry within Databricks"("Unity Catalog Documentation," 2023).

Option B: Setup a script to export the model from prod and import it to dev

Export/import scripts require manual effort, storage for model artifacts, and repeated execution, increasing operational cost and risk (e.g., version mismatches, unsecured transfers). It’s less efficient than a native solution.

Databricks Reference: Manual processes are discouraged when Unity Catalog offers built-in sharing:"Avoid redundant workflows with Unity Catalog’s cross-workspace access"("MLflow with Unity Catalog").

Option C: Setup a duplicate training pipeline in dev, so that an identical model is available in dev

Duplicating the training pipeline doubles compute and storage costs, as it retrains the model from scratch. It’s neither cost-effective nor necessary when the prod model can be reused securely.

Databricks Reference:"Re-running training is resource-intensive; leverage existing models where possible"("Generative AI Engineer Guide").

Option D: Use MLflow to log the model directly into Unity Catalog, and enable READ access in the dev workspace to the model

Unity Catalog, integrated with MLflow, allows models logged in prod to be centrally managed and accessed across workspaces with fine-grained permissions (e.g., READ for dev). This is cost-effective (no extra infrastructure or retraining) and secure (governed by Databricks’ access controls).

Databricks Reference:"Log models to Unity Catalog via MLflow, then grant access to other workspaces securely"("MLflow Model Registry with Unity Catalog," 2023).

Conclusion: Option D leverages Databricks’ native tools (MLflow and Unity Catalog) for a seamless, cost-effective, and secure solution, avoiding external systems, manual scripts, or redundant training.

Question 15

A Generative Al Engineer needs to design an LLM pipeline to conduct multi-stage reasoning that leverages external tools. To be effective at this, the LLM will need to plan and adapt actions while performing complex reasoning tasks.

Which approach will do this?

Options:

Tram the LLM to generate a single, comprehensive response without interacting with any external tools, relying solely on its pre-trained knowledge.

Implement a framework like ReAct which allows the LLM to generate reasoning traces and perform task-specific actions that leverage external tools if necessary.

Encourage the LLM to make multiple API calls in sequence without planning or structuring the calls, allowing the LLM to decide when and how to use external tools spontaneously.

Use a Chain-of-Thought (CoT) prompting technique to guide the LLM through a series of reasoning steps, then manually input the results from external tools for the final answer.

Answer:

Explanation:

The task requires an LLM pipeline for multi-stage reasoning with external tools, necessitating planning, adaptability, and complex reasoning. Let’s evaluate the options based on Databricks’ recommendations for advanced LLM workflows.

Option A: Train the LLM to generate a single, comprehensive response without interacting with any external tools, relying solely on its pre-trained knowledge

This approach limits the LLM to its static knowledge base, excluding external tools and multi-stage reasoning. It can’t adapt or plan actions dynamically, failing the requirements.

Databricks Reference:"External tools enhance LLM capabilities beyond pre-trained knowledge"("Building LLM Applications with Databricks," 2023).

Option B: Implement a framework like ReAct which allows the LLM to generate reasoning traces and perform task-specific actions that leverage external tools if necessary

ReAct (Reasoning + Acting) combines reasoning traces (step-by-step logic) with actions (e.g., tool calls), enabling the LLM to plan, adapt, and execute complex tasks iteratively. This meets all requirements: multi-stage reasoning, tool use, and adaptability.

Databricks Reference:"Frameworks like ReAct enable LLMs to interleave reasoning and external tool interactions for complex problem-solving"("Generative AI Cookbook," 2023).

Option C: Encourage the LLM to make multiple API calls in sequence without planning or structuring the calls, allowing the LLM to decide when and how to use external tools spontaneously

Unstructured, spontaneous API calls lack planning and may lead to inefficient or incorrect tool usage. This doesn’t ensure effective multi-stage reasoning or adaptability.

Databricks Reference: Structured frameworks are preferred:"Ad-hoc tool calls can reduce reliability in complex tasks"("Building LLM-Powered Applications").

Option D: Use a Chain-of-Thought (CoT) prompting technique to guide the LLM through a series of reasoning steps, then manually input the results from external tools for the final answer

CoT improves reasoning but relies on manual tool interaction, breaking automation and adaptability. It’s not a scalable pipeline solution.

Databricks Reference:"Manual intervention is impractical for production LLM pipelines"("Databricks Generative AI Engineer Guide").

Conclusion: Option B (ReAct) is the best approach, as it integrates reasoning and tool use in a structured, adaptive framework, aligning with Databricks’ guidance for complex LLM workflows.

Question 16

A Generative Al Engineer has developed an LLM application to answer questions about internal company policies. The Generative AI Engineer must ensure that the application doesn’t hallucinate or leak confidential data.

Which approach should NOT be used to mitigate hallucination or confidential data leakage?

Options:

Add guardrails to filter outputs from the LLM before it is shown to the user

Fine-tune the model on your data, hoping it will learn what is appropriate and not

Limit the data available based on the user’s access level

Use a strong system prompt to ensure the model aligns with your needs.

Question 17

A Generative Al Engineer has already trained an LLM on Databricks and it is now ready to be deployed.

Which of the following steps correctly outlines the easiest process for deploying a model on Databricks?

Options:

Log the model as a pickle object, upload the object to Unity Catalog Volume, register it to Unity Catalog using MLflow, and start a serving endpoint

Log the model using MLflow during training, directly register the model to Unity Catalog using the MLflow API, and start a serving endpoint

Save the model along with its dependencies in a local directory, build the Docker image, and run the Docker container

Wrap the LLM’s prediction function into a Flask application and serve using Gunicorn

Question 18

A Generative Al Engineer is ready to deploy an LLM application written using Foundation Model APIs. They want to follow security best practices for production scenarios

Which authentication method should they choose?

Options:

Use an access token belonging to service principals

Use a frequently rotated access token belonging to either a workspace user or a service principal

Use OAuth machine-to-machine authentication

Use an access token belonging to any workspace user

Answer:

Explanation:

The task is to deploy an LLM application using Foundation Model APIs in a production environment while adhering to security best practices. Authentication is critical for securing access to Databricks resources, such as the Foundation Model API. Let’s evaluate the options based on Databricks’ security guidelines for production scenarios.

Option A: Use an access token belonging to service principals

Service principals are non-human identities designed for automated workflows and applications in Databricks. Using an access token tied to a service principal ensures that the authentication is scoped to the application, follows least-privilege principles (via role-based access control), and avoids reliance on individual user credentials. This is a security best practice for production deployments.

Databricks Reference:"For production applications, use service principals with access tokens to authenticate securely, avoiding user-specific credentials"("Databricks Security Best Practices," 2023). Additionally, the "Foundation Model API Documentation" states:"Service principal tokens are recommended for programmatic access to Foundation Model APIs."

Option B: Use a frequently rotated access token belonging to either a workspace user or a service principal

Frequent rotation enhances security by limiting token exposure, but tying the token to a workspace user introduces risks (e.g., user account changes, broader permissions). Including both user and service principal options dilutes the focus on application-specific security, making this less ideal than a service-principal-only approach. It also adds operational overhead without clear benefits over Option A.

Databricks Reference:"While token rotation is a good practice, service principals are preferred over user accounts for application authentication"("Managing Tokens in Databricks," 2023).

Option C: Use OAuth machine-to-machine authentication

OAuth M2M (e.g., client credentials flow) is a secure method for application-to-service communication, often using service principals under the hood. However, Databricks’ Foundation Model API primarily supports personal access tokens (PATs) or service principal tokens over full OAuth flows for simplicity in production setups. OAuth M2M adds complexity (e.g., managing refresh tokens) without a clear advantage in this context.

Databricks Reference:"OAuth is supported in Databricks, but service principal tokens are simpler and sufficient for most API-based workloads"("Databricks Authentication Guide," 2023).

Option D: Use an access token belonging to any workspace user

Using a user’s access token ties the application to an individual’s identity, violating security best practices. It risks exposure if the user leaves, changes roles, or has overly broad permissions, and it’s not scalable or auditable for production.

Databricks Reference:"Avoid using personal user tokens for production applications due to security and governance concerns"("Databricks Security Best Practices," 2023).

Conclusion: Option A is the best choice, as it uses a service principal’s access token, aligning with Databricks’ security best practices for production LLM applications. It ensures secure, application-specific authentication with minimal complexity, as explicitly recommended for Foundation Model API deployments.

Exam Detail

Vendor: Databricks

Certification: Generative AI Engineer

Exam Code: Databricks-Generative-AI-Engineer-Associate

Exam Name: Databricks Certified Generative AI Engineer Associate

Last Update: Jul 1, 2025

Databricks-Generative-AI-Engineer-Associate Question Answers

Summer Special - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: top65certs

Free and Premium Databricks Databricks-Generative-AI-Engineer-Associate Dumps Questions Answers

Databricks Certified Generative AI Engineer Associate Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

CompTIA

Fortinet

Microsoft

Salesforce