Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Free Data-Engineer-Associate Questions Attempt

AWS Certified Data Engineer - Associate (DEA-C01) Questions and Answers

Question 9

A company wants to use Apache Spark jobs that run on an Amazon EMR cluster to process streaming data. The Spark jobs will transform and store the data in an Amazon S3 bucket. The company will use Amazon Athena to perform analysis.

The company needs to optimize the data format for analytical queries.

Which solutions will meet these requirements with the SHORTEST query times? (Select TWO.)

Options:

A.

Use Avro format. Use AWS Glue Data Catalog to track schema changes.

B.

Use ORC format. Use AWS Glue Data Catalog to track schema changes.

C.

Use Apache Parquet format. Use an external Amazon DynamoDB table to track schema changes.

D.

Use Apache Parquet format. Use AWS Glue Data Catalog to track schema changes.

E.

Use ORC format. Store schema definitions in separate files in Amazon S3.

Question 10

A company runs a data pipeline that uses AWS Step Functions to orchestrate AWS Lambda functions and AWS Glue jobs. The Lambda functions and AWS Glue jobs require access to multiple Amazon RDS databases. The Lambda functions and AWS Glue jobs already have access to the VPC that hosts the RDS databases.

Which solution will meet these requirements in the MOST secure way?

Options:

A.

Use the root user of the company’s AWS account to create long-term access keys for the RDS databases. Include the access keys programmatically in the Lambda functions and AWS Glue jobs. Generate new keys every 90 days.

B.

Create an IAM role that has permissions to access the RDS databases. Create a second IAM role for the Lambda functions and AWS Glue jobs that has permissions to assume the IAM role that has access permissions for the RDS databases.

C.

Create an IAM user that can assume IAM roles that have permissions and credentials to access the RDS databases. Assign the IAM user to each of the Lambda functions and AWS Glue jobs.

D.

Create Java Database Connectivity (JDBC) connections between the Lambda functions and AWS Glue jobs and the RDS databases. In the connection string, include the necessary credentials.

Question 11

A retail company needs to implement a solution to capture data updates from multiple Amazon Aurora MySQL databases. The company needs to make the updates available for analytics in near real time. The solution must be serverless and require minimal maintenance.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Set up AWS Database Migration Service (AWS DMS) tasks that perform schema conversions for each database. Load the changes into Amazon Redshift Serverless.

B.

Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) Connect with Debezium connectors to load data into Amazon Redshift Serverless.

C.

Use AWS Database Migration Service (AWS DMS) to set up binary log replication to Amazon Kinesis Data Streams. Load the data into Amazon Redshift Serverless after schema conversion.

D.

Use Aurora zero-ETL integrations with Amazon Redshift Serverless for each database to load Aurora MySQL changes in Amazon Redshift Serverless.

Question 12

A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be regularly proliferated to the S3 bucket. The data includes files that are in multiple formats. The data engineer needs to automate the transfer process and must schedule the process to run periodically.

Which AWS service should the data engineer use to transfer the data in the MOST operationally efficient way?

Options:

A.

AWS DataSync

B.

AWS Glue

C.

AWS Direct Connect

D.

Amazon S3 Transfer Acceleration