Summer Special - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: top65certs

Free and Premium Databricks Databricks-Certified-Data-Analyst-Associate Dumps Questions Answers

Databricks Certified Data Analyst Associate Exam Questions and Answers

Question 1

Query History provides Databricks SQL users with a lot of benefits. A data analyst has been asked to share all of these benefits with their team as part of a training exercise. One of the benefit statements the analyst provided to their team is incorrect.

Which statement about Query History is incorrect?

Options:

A.

It can be used to view the query plan of queries that have run.

B.

It can be used to debug queries.

C.

It can be used to automate query execution on multiple warehouses (formerly endpoints).

D.

It can be used to troubleshoot slow running queries.

Buy Now
Question 2

A data analyst has created a Query in Databricks SQL, and now they want to create two data visualizations from that Query and add both of those data visualizations to the same Databricks SQL Dashboard.

Which of the following steps will they need to take when creating and adding both data visualizations to the Databricks SQL Dashboard?

Options:

A.

They will need to alter the Query to return two separate sets of results.

B.

They will need to add two separate visualizations to the dashboard based on the same Query.

C.

They will need to create two separate dashboards.

D.

They will need to decide on a single data visualization to add to the dashboard.

E.

They will need to copy the Query and create one data visualization per query.

Question 3

A data analysis team is working with the table_bronze SQL table as a source for one of its most complex projects. A stakeholder of the project notices that some of the downstream data is duplicative. The analysis team identifies table_bronze as the source of the duplication.

Which of the following queries can be used to deduplicate the data from table_bronze and write it to a new table table_silver?

A)

CREATE TABLE table­_silver AS

SELECT DISTINCT *

FROM table_bronze;

B)

CREATE TABLE table_silver AS

INSERT *

FROM table_bronze;

C)

CREATE TABLE table_silver AS

MERGE DEDUPLICATE *

FROM table_bronze;

D)

INSERT INTO TABLE table_silver

SELECT * FROM table_bronze;

E)

INSERT OVERWRITE TABLE table_silver

SELECT * FROM table_bronze;

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Question 4

Which of the following should data analysts consider when working with personally identifiable information (PII) data?

Options:

A.

Organization-specific best practices for Pll data

B.

Legal requirements for the area in which the data was collected

C.

None of these considerations

D.

Legal requirements for the area in which the analysis is being performed

E.

All of these considerations

Question 5

How can a data analyst determine if query results were pulled from the cache?

Options:

A.

Go to the Query History tab and click on the text of the query. The slideout shows if the results came from the cache.

B.

Go to the Alerts tab and check the Cache Status alert.

C.

Go to the Queries tab and click on Cache Status. The status will be green if the results from the last run came from the cache.

D.

Go to the SQL Warehouse (formerly SQL Endpoints) tab and click on Cache. The Cache file will show the contents of the cache.

E.

Go to the Data tab and click Last Query. The details of the query will show if the results came from the cache.

Question 6

Delta Lake stores table data as a series of data files, but it also stores a lot of other information.

Which of the following is stored alongside data files when using Delta Lake?

Options:

A.

None of these

B.

Table metadata, data summary visualizations, and owner account information

C.

Table metadata

D.

Data summary visualizations

E.

Owner account information

Question 7

A data organization has a team of engineers developing data pipelines following the medallion architecture using Delta Live Tables. While the data analysis team working on a project is using gold-layer tables from these pipelines, they need to perform some additional processing of these tables prior to performing their analysis.

Which of the following terms is used to describe this type of work?

Options:

A.

Data blending

B.

Last-mile

C.

Data testing

D.

Last-mile ETL

E.

Data enhancement

Question 8

A data analyst has a managed table table_name in database database_name. They would now like to remove the table from the database and all of the data files associated with the table. The rest of the tables in the database must continue to exist.

Which of the following commands can the analyst use to complete the task without producing an error?

Options:

A.

DROP DATABASE database_name;

B.

DROP TABLE database_name.table_name;

C.

DELETE TABLE database_name.table_name;

D.

DELETE TABLE table_name FROM database_name;

E.

DROP TABLE table_name FROM database_name;

Question 9

The stakeholders.customers table has 15 columns and 3,000 rows of data. The following command is run:

After runningSELECT * FROM stakeholders.eur_customers, 15 rows are returned. After the command executes completely, the user logs out of Databricks.

After logging back in two days later, what is the status of thestakeholders.eur_customersview?

Options:

A.

The view remains available and SELECT * FROM stakeholders.eur_customers will execute correctly.

B.

The view has been dropped.

C.

The view is not available in the metastore, but the underlying data can be accessed with SELECT * FROM delta. `stakeholders.eur_customers`.

D.

The view remains available but attempting to SELECT from it results in an empty result set because data in views are automatically deleted after logging out.

E.

The view has been converted into a table.

Question 10

A data analyst has been asked to produce a visualization that shows the flow of users through a website.

Which of the following is used for visualizing this type of flow?

Options:

A.

Heatmap

B.

IChoropleth

C.

Word Cloud

D.

Pivot Table

E.

Sankey

Question 11

A data analyst has created a Query in Databricks SQL, and now wants to create two data visualizations from that Query and add both of those data visualizations to the same Databricks SQL Dashboard.

Which step will the data analyst need to take when creating and adding both data visualizations to the Databricks SQL Dashboard?

Options:

A.

Copy the Query and create one data visualization per query.

B.

Add two separate visualizations to the dashboard based on the same Query.

C.

Decide on a single data visualization to add to the dashboard.

D.

Alter the Query to return two separate sets of results.

Question 12

Data professionals with varying responsibilities use the Databricks Lakehouse Platform Which role in the Databricks Lakehouse Platform use Databricks SQL as their primary service?

Options:

A.

Data scientist

B.

Data engineer

C.

Platform architect

D.

Business analyst

Question 13

A data analyst wants to create a dashboard with three main sections: Development, Testing, and Production. They want all three sections on the same dashboard, but they want to clearly designate the sections using text on the dashboard.

Which of the following tools can the data analyst use to designate the Development, Testing, and Production sections using text?

Options:

A.

Separate endpoints for each section

B.

Separate queries for each section

C.

Markdown-based text boxes

D.

Direct text written into the dashboard in editing mode

E.

Separate color palettes for each section

Question 14

A data analyst has been asked to use the below tablesales_tableto get the percentage rank of products within region by the sales:

The result of the query should look like this:

Which of the following queries will accomplish this task?

A)

B)

C)

D)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Question 15

Consider the following two statements:

Statement 1:

Statement 2:

Which of the following describes how the result sets will differ for each statement when they are run in Databricks SQL?

Options:

A.

The first statement will return all data from the customers table and matching data from the orders table. The second statement will return all data from the orders table and matching data from the customers table. Any missing data will be filled in with NULL.

B.

When the first statement is run, only rows from the customers table that have at least one match with the orders table on customer_id will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.

C.

There is no difference between the result sets for both statements.

D.

Both statements will fail because Databricks SQL does not support those join types.

E.

When the first statement is run, all rows from the customers table will be returned and only the customer_id from the orders table will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.

Question 16

Which of the following statements describes descriptive statistics?

Options:

A.

A branch of statistics that uses summary statistics to quantitatively describe and summarize data.

B.

A branch of statistics that uses a variety of data analysis techniques to infer properties of an underlying distribution of probability.

C.

A branch of statistics that uses quantitative variables that must take on a finite or countably infinite set of values.

D.

A branch of statistics that uses summary statistics to categorically describe and summarize data.

E.

A branch of statistics that uses quantitative variables that must take on an uncountable set of values.

Question 17

Which location can be used to determine the owner of a managed table?

Options:

A.

Review the Owner field in the table page using Catalog Explorer

B.

Review the Owner field in the database page using Data Explorer

C.

Review the Owner field in the schema page using Data Explorer

D.

Review the Owner field in the table page using the SQL Editor

Question 18

A data analyst has set up a SQL query to run every four hours on a SQL endpoint, but the SQL endpoint is taking too long to start up with each run.

Which of the following changes can the data analyst make to reduce the start-up time for the endpoint while managing costs?

Options:

A.

Reduce the SQL endpoint cluster size

B.

Increase the SQL endpoint cluster size

C.

Turn off the Auto stop feature

D.

Increase the minimum scaling value

E.

Use a Serverless SQL endpoint

Question 19

A data engineering team has created a Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables. The microbatches are triggered every 10 minutes.

A data analyst has created a dashboard based on this gold level data. The project stakeholders want to see the results in the dashboard updated within 10 minutes or less of new data becoming available within the gold-level tables.

What is the ability to ensure the streamed data is included in the dashboard at the standard requested by the project stakeholders?

Options:

A.

A refresh schedule with an interval of 10 minutes or less

B.

A refresh schedule with an always-on SQL Warehouse (formerly known as SQL Endpoint

C.

A refresh schedule with stakeholders included as subscribers

D.

A refresh schedule with a Structured Streaming cluster