11.11 Special 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Free and Premium Microsoft DP-203 Dumps Questions Answers

Page: 1 / 12
Total 341 questions

Data Engineering on Microsoft Azure Questions and Answers

Question 1

You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day.

You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times.

What should you include in the solution?

Options:

A.

Partition by DateTime fields.

B.

Sink to Azure Queue storage.

C.

Include a watermark column.

D.

Use a JSON format for physical data storage.

Buy Now
Question 2

You are performing exploratory analysis of the bus fare data in an Azure Data Lake Storage Gen2 account by using an Azure Synapse Analytics serverless SQL pool.

You execute the Transact-SQL query shown in the following exhibit.

What do the query results include?

Options:

A.

Only CSV files in the tripdata_2020 subfolder.

B.

All files that have file names that beginning with "tripdata_2020".

C.

All CSV files that have file names that contain "tripdata_2020".

D.

Only CSV that have file names that beginning with "tripdata_2020".

Question 3

You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.

You need to create the table to meet the following requirements:

• Provide the fastest Query time.

• Minimize data movement during queries.

Which type of table should you use?

Options:

A.

hash distributed

B.

heap

C.

replicated

D.

round-robin

Question 4

You are designing a star schema for a dataset that contains records of online orders. Each record includes an order date, an order due date, and an order ship date.

You need to ensure that the design provides the fastest query times of the records when querying for arbitrary date ranges and aggregating by fiscal calendar attributes.

Which two actions should you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

A.

Create a date dimension table that has a DateTime key.

B.

Use built-in SQL functions to extract date attributes.

C.

Create a date dimension table that has an integer key in the format of yyyymmdd.

D.

In the fact table, use integer columns for the date fields.

E.

Use DateTime columns for the date fields.

Question 5

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this scenario, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.

You plan to copy the data from the storage account to an Azure SQL data warehouse.

You need to prepare the files to ensure that the data copies quickly.

Solution: You modify the files to ensure that each row is more than 1 MB.

Does this meet the goal?

Options:

A.

Yes

B.

No

Question 6

You are implementing a batch dataset in the Parquet format.

Data tiles will be produced by using Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an Azure Synapse Analytics serverless SQL pool.

You need to minimize storage costs for the solution.

What should you do?

Options:

A.

Store all the data as strings in the Parquet tiles.

B.

Use OPENROWEST to query the Parquet files.

C.

Create an external table mat contains a subset of columns from the Parquet files.

D.

Use Snappy compression for the files.

Question 7

You have an Azure Synapse Analytics serverless SQL pool named Pool1 and an Azure Data Lake Storage Gen2 account named storage1. The AllowedBlobpublicAccess porperty is disabled for storage1.

You need to create an external data source that can be used by Azure Active Directory (Azure AD) users to access storage1 from Pool1.

What should you create first?

Options:

A.

an external resource pool

B.

a remote service binding

C.

database scoped credentials

D.

an external library

Question 8

You have an Azure Data Lake Storage Gen2 container that contains 100 TB of data.

You need to ensure that the data in the container is available for read workloads in a secondary region if an outage occurs in the primary region. The solution must minimize costs.

Which type of data redundancy should you use?

Options:

A.

zone-redundant storage (ZRS)

B.

read-access geo-redundant storage (RA-GRS)

C.

locally-redundant storage (LRS)

D.

geo-redundant storage (GRS)

Question 9

You have an enterprise data warehouse in Azure Synapse Analytics that contains a table named FactOnlineSales. The table contains data from the start of 2009 to the end of 2012.

You need to improve the performance of queries against FactOnlineSales by using table partitions. The solution must meet the following requirements:

    Create four partitions based on the order date.

    Ensure that each partition contains all the orders places during a given calendar year.

How should you complete the T-SQL command? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 10

You have an Azure Data Factory instance named ADF1 and two Azure Synapse Analytics workspaces named WS1 and WS2.

ADF1 contains the following pipelines:

    P1: Uses a copy activity to copy data from a nonpartitioned table in a dedicated SQL pool of WS1 to an Azure Data Lake Storage Gen2 account

    P2: Uses a copy activity to copy data from text-delimited files in an Azure Data Lake Storage Gen2 account to a nonpartitioned table in a dedicated SQL pool of WS2

You need to configure P1 and P2 to maximize parallelism and performance.

Which dataset settings should you configure for the copy activity if each pipeline? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 11

You have an Azure data factory that has the Git repository settings shown in the following exhibit.

Use the drop-down menus to select the answer choose that completes each statement based on the information presented in the graphic.

NOTE: Each correct answer is worth one point.

Options:

Question 12

You have a SQL pool in Azure Synapse.

A user reports that queries against the pool take longer than expected to complete.

You need to add monitoring to the underlying storage to help diagnose the issue.

Which two metrics should you monitor? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

A.

Cache used percentage

B.

DWU Limit

C.

Snapshot Storage Size

D.

Active queries

E.

Cache hit percentage

Question 13

You have the following Azure Stream Analytics query.

For each of the following statements, select Yes if the statement is true. Otherwise, select No.

NOTE: Each correct selection is worth one point.

Options:

Question 14

You are building an Azure Analytics query that will receive input data from Azure IoT Hub and write the results to Azure Blob storage.

You need to calculate the difference in readings per sensor per hour.

How should you complete the query? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 15

You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and a database named DB1. DB1 contains a fact table named Table1.

You need to identify the extent of the data skew in Table1.

What should you do in Synapse Studio?

Options:

A.

Connect to the built-in pool and run dbcc pdw_showspaceused.

B.

Connect to the built-in pool and run dbcc checkalloc.

C.

Connect to Pool1 and query sys.dm_pdw_node_scacus.

D.

Connect to Pool1 and query sys.dm_pdw_nodes_db_partition_scacs.

Question 16

You plan to use an Apache Spark pool in Azure Synapse Analytics to load data to an Azure Data Lake Storage Gen2 account.

You need to recommend which file format to use to store the data in the Data Lake Storage account. The solution must meet the following requirements:

• Column names and data types must be defined within the files loaded to the Data Lake Storage account.

• Data must be accessible by using queries from an Azure Synapse Analytics serverless SQL pool.

• Partition elimination must be supported without having to specify a specific partition.

What should you recommend?

Options:

A.

Delta Lake

B.

JSON

C.

CSV

D.

ORC

Question 17

You have an Azure Synapse Analytics pipeline named Pipeline1 that contains a data flow activity named Dataflow1.

Pipeline1 retrieves files from an Azure Data Lake Storage Gen 2 account named storage1.

Dataflow1 uses the AutoResolveIntegrationRuntime integration runtime configured with a core count of 128.

You need to optimize the number of cores used by Dataflow1 to accommodate the size of the files in storage1.

What should you configure? To answer, select the appropriate options in the answer area.

Options:

Question 18

You are designing an Azure Databricks cluster that runs user-defined local processes. You need to recommend a cluster configuration that meets the following requirements:

• Minimize query latency.

• Maximize the number of users that can run queues on the cluster at the same time « Reduce overall costs without compromising other requirements

Which cluster type should you recommend?

Options:

A.

Standard with Auto termination

B.

Standard with Autoscaling

C.

High Concurrency with Autoscaling

D.

High Concurrency with Auto Termination

Question 19

You develop a dataset named DBTBL1 by using Azure Databricks.

DBTBL1 contains the following columns:

    SensorTypeID

    GeographyRegionID

    Year

    Month

    Day

    Hour

    Minute

    Temperature

    WindSpeed

    Other

You need to store the data to support daily incremental load pipelines that vary for each GeographyRegionID. The solution must minimize storage costs.

How should you complete the code? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 20

You have an Azure subscription that contains a storage account. The account contains a blob container named blob1 and an Azure Synapse Analytic serve-less SQL pool

You need to Query the CSV files stored in blob1. The solution must ensure that all the files in a (older named csv and all its subfolders are queried

How should you complete the query? to answer, select the appropriate options in the answer area

NOTE: Each correct selection is worth one point.

Options:

Question 21

You have an Azure Synapse serverless SQL pool.

You need to read JSON documents from a file by using the OPENROWSET function.

How should you complete the query? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 22

You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 receives new data once every 24 hours.

You have the following function.

You have the following query.

The query is executed once every 15 minutes and the @parameter value is set to the current date.

You need to minimize the time it takes for the query to return results.

Which two actions should you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

A.

Create an index on the avg_f column.

B.

Convert the avg_c column into a calculated column.

C.

Create an index on the sensorid column.

D.

Enable result set caching.

E.

Change the table distribution to replicate.

Question 23

You have an Azure Stream Analytics job that is a Stream Analytics project solution in Microsoft Visual Studio. The job accepts data generated by IoT devices in the JSON format.

You need to modify the job to accept data generated by the IoT devices in the Protobuf format.

Which three actions should you perform from Visual Studio on sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 24

You are building a database in an Azure Synapse Analytics serverless SQL pool.

You have data stored in Parquet files in an Azure Data Lake Storage Gen2 container.

Records are structured as shown in the following sample.

{

"id": 123,

"address_housenumber": "19c",

"address_line": "Memory Lane",

"applicant1_name": "Jane",

"applicant2_name": "Dev"

}

The records contain two applicants at most.

You need to build a table that includes only the address fields.

How should you complete the Transact-SQL statement? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 25

You are developing a solution using a Lambda architecture on Microsoft Azure.

The data at test layer must meet the following requirements:

Data storage:

•Serve as a repository (or high volumes of large files in various formats.

•Implement optimized storage for big data analytics workloads.

•Ensure that data can be organized using a hierarchical structure.

Batch processing:

•Use a managed solution for in-memory computation processing.

•Natively support Scala, Python, and R programming languages.

•Provide the ability to resize and terminate the cluster automatically.

Analytical data store:

•Support parallel processing.

•Use columnar storage.

•Support SQL-based languages.

You need to identify the correct technologies to build the Lambda architecture.

Which technologies should you use? To answer, select the appropriate options in the answer area

NOTE: Each correct selection is worth one point.

Options:

Question 26

You are deploying a lake database by using an Azure Synapse database template.

You need to add additional tables to the database. The solution must use the same grouping method as the template tables.

‘Which grouping method should you use?

Options:

A.

business area

B.

size

C.

facts and dimensions

D.

partition style

Question 27

You have the following Azure Stream Analytics query.

For each of the following statements, select Yes if the statement is true. Otherwise, select No.

NOTE: Each correct selection is worth one point.

Options:

Question 28

You are designing a solution that will copy Parquet files stored in an Azure Blob storage account to an Azure Data Lake Storage Gen2 account.

The data will be loaded daily to the data lake and will use a folder structure of {Year}/{Month}/{Day}/.

You need to design a daily Azure Data Factory data load to minimize the data transfer between the two

accounts.

Which two configurations should you include in the design? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

A.

Delete the files in the destination before loading new data.

B.

Filter by the last modified date of the source files.

C.

Delete the source files after they are copied.

D.

Specify a file naming pattern for the destination.

Question 29

You have an Azure subscription that contains an Azure Databricks workspace named databricks1 and an Azure Synapse Analytics workspace named synapse1. The synapse1 workspace contains an Apache Spark pool named pool1.

You need to share an Apache Hive catalog of pool1 with databricks1.

What should you do? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 30

You are designing an Azure Stream Analytics job to process incoming events from sensors in retail environments.

You need to process the events to produce a running average of shopper counts during the previous 15 minutes, calculated at five-minute intervals.

Which type of window should you use?

Options:

A.

snapshot

B.

tumbling

C.

hopping

D.

sliding

Question 31

You plan to create an Azure Data Lake Storage Gen2 account

You need to recommend a storage solution that meets the following requirements:

• Provides the highest degree of data resiliency

• Ensures that content remains available for writes if a primary data center fails

What should you include in the recommendation? To answer, select the appropriate options in the answer area.

Options:

Question 32

You have an Azure Databricks resource.

You need to log actions that relate to changes in compute for the Databricks resource.

Which Databricks services should you log?

Options:

A.

clusters

B.

workspace

C.

DBFS

D.

SSH

E jobs

Question 33

You are designing a date dimension table in an Azure Synapse Analytics dedicated SQL pool. The date dimension table will be used by all the fact tables.

Which distribution type should you recommend to minimize data movement?

Options:

A.

HASH

B.

REPLICATE

C.

ROUND ROBIN

Question 34

You have an Azure Synapse Analytics dedicated SQL pool.

You run PDW_SHOWSPACEUSED(dbo,FactInternetSales’); and get the results shown in the following table.

Which statement accurately describes the dbo,FactInternetSales table?

Options:

A.

The table contains less than 1,000 rows.

B.

All distributions contain data.

C.

The table is skewed.

D.

The table uses round-robin distribution.

Question 35

You need to design the partitions for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 36

You need to design a data ingestion and storage solution for the Twitter feeds. The solution must meet the customer sentiment analytics requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area

NOTE: Each correct selection b worth one point.

Options:

Question 37

You need to design an analytical storage solution for the transactional data. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 38

You need to design a data retention solution for the Twitter feed data records. The solution must meet the customer sentiment analytics requirements.

Which Azure Storage functionality should you include in the solution?

Options:

A.

change feed

B.

soft delete

C.

time-based retention

D.

lifecycle management

Question 39

You need to design a data storage structure for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 40

You need to integrate the on-premises data sources and Azure Synapse Analytics. The solution must meet the data integration requirements.

Which type of integration runtime should you use?

Options:

A.

Azure-SSIS integration runtime

B.

self-hosted integration runtime

C.

Azure integration runtime

Question 41

You need to implement the surrogate key for the retail store table. The solution must meet the sales transaction

dataset requirements.

What should you create?

Options:

A.

a table that has an IDENTITY property

B.

a system-versioned temporal table

C.

a user-defined SEQUENCE object

D.

a table that has a FOREIGN KEY constraint

Question 42

You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The solution must meet the customer sentiment analytics requirements.

Which three Transaction-SQL DDL commands should you run in sequence? To answer, move the appropriate commands from the list of commands to the answer area and arrange them in the correct order.

NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.

Options:

Question 43

You need to design a data retention solution for the Twitter teed data records. The solution must meet the customer sentiment analytics requirements.

Which Azure Storage functionality should you include in the solution?

Options:

A.

time-based retention

B.

change feed

C.

soft delete

D.

Iifecycle management

Question 44

You need to implement an Azure Synapse Analytics database object for storing the sales transactions data. The solution must meet the sales transaction dataset requirements.

What solution must meet the sales transaction dataset requirements.

What should you do? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 45

You need to implement versioned changes to the integration pipelines. The solution must meet the data integration requirements.

In which order should you perform the actions? To answer, move all actions from the list of actions to the answer area and arrange them in the correct order.

Options:

Question 46

What should you do to improve high availability of the real-time data processing solution?

Options:

A.

Deploy identical Azure Stream Analytics jobs to paired regions in Azure.

B.

Deploy a High Concurrency Databricks cluster.

C.

Deploy an Azure Stream Analytics job and use an Azure Automation runbook to check the status of the job and to start the job if it stops.

D.

Set Data Lake Storage to use geo-redundant storage (GRS).

Question 47

What should you recommend to prevent users outside the Litware on-premises network from accessing the analytical data store?

Options:

A.

a server-level virtual network rule

B.

a database-level virtual network rule

C.

a database-level firewall IP rule

D.

a server-level firewall IP rule

Question 48

Which Azure Data Factory components should you recommend using together to import the daily inventory data from the SQL server to Azure Data Lake Storage? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Options:

Question 49

What should you recommend using to secure sensitive customer contact information?

Options:

A.

data labels

B.

column-level security

C.

row-level security

D.

Transparent Data Encryption (TDE)

Page: 1 / 12
Total 341 questions