Which of the following roles is responsible for ensuring an organization's data quality, security, privacy, and regulatory compliance?
A data analyst has a set with more than 40.000 rows in the sample schema below:
The analyst would like to create one column that contains the customers’ birth dates. Which of the following data quality dimensions would BEST explain the reason for compilation?
Which of the following technologies would be best suited for creating a multiple linear regression model?
Andy is a pricing analyst for a retailer. Using a hypothesis test, he wants to assess whether people who receive electronic coupons spend more on average.
What should Andy's null hypothesis be?
Which of the following is an example of a strategy to reduce statistical errors?
Jhon is working on an ELT process that sources data from six different source systems.
Looking at the source data, he finds that data about the sample people exists in two of six systems.
What does he have to make sure he checks for in his ELT process?
Choose the best answer.
Given the following table:
Which of the following describes the data quality issues with theagedata?
Which of the ing is the correct ion for a tab-delimited spre file?
Given the diagram below:
Which of the following data schemas shown?
Which of the following types of analysis would be best for an analyst to use to examine the relationships between authors who cited other authors in a library of research papers?
A customer survey reveals 90% positive feedback. Which of the following statistical methods would be best to utilize to determine the reliability of a data set and predict how a larger sample of customers over the same time period might respond?
An analyst must obtain the average daily sales for the following week:
Which of the following must the analyst perform to obtain this value?
Which of the following data governance concepts fits into the security requirements category?
A survey asks participants to rate a company on a scale of one to ten. Which of the following best describes the rating variable?
An analyst is compiling a series of reports for the new executive board to review. Which of the following elements provides a snapshot of what is contained in the reports for the executives who do not have time to focus on the details?
An analyst modified a data set that had a number of issues. Given the original and modified versions:
Which of the following data manipulation techniques did the analyst use?
A business intelligence engineer needs to reduce the size of a data model for reporting purposes. The data set contains more than one million rows, and the table has a date-time column named Date. Which of the following should the analyst do to complete this task?
Which of the following is concatenate typically used to combine?
An analyst develops an IT document and needs to describe the technical terms used in the document. Which of the following is where the analyst should include descriptions of the technical terms?
Which of the following data cleansing issues will be fixed when a DISTINCT function is applied?
Which of the following summary statements upholds integrity in data reporting?
A data analyst for a media company needs to determine the most popular movie genre. Given the table below:
Which of the following must be done to the Genre column before this task can be completed?
Which of the following is most likely to be used as a data-mining ETL tool?
A business unit made the following modification to the values in a table:
Which of the following data quality dimensions was applied in this scenario?
Which of the following best defines SCD?
A data analyst has been asked to organize the table below in the following ways:
By sales from high to low -
By state in alphabetic order -
Which of the following functions will allow the data analyst to organize the table in this manner?
Given the table below:
Which of the following variable types BEST describes the “Year” column?
Which of the following differentiates a flat text file from other data types?
A data analyst needs to create a dashboard using the company's yearly revenue data sets. Which of the following would be the best way to plot the information to show the top-performing region?
Which of the following is a KPI metric for tracking sales performance?
A data analyst reviews the following data set:
Which of the following is the range value?
A sales analyst needs to report how the sales team is performing to target. Which of the following files will be important in determining 2019 performance attainment?
Which of the following types of data manipulation functions should a data analyst use to implement a YES/NO condition in a spreadsheet?
A data analyst has been asked to merge the tables below, first performing an INNER JOIN and then a LEFT JOIN:
Customer Table -
In-store Transactions –
Which of the following describes the number of rows of data that can be expected after performing both joins in the order stated, considering the customer table as the main table?
After completing web scraping, which of the following file formats needs to be parsed?
An analyst wants to extract data from a variety of sources and store the data in a cloud-based environment prior to cleaning. Which of the following integration techniques should the analyst use?
Given the table below:
Which of the following variables can be considered inconsistent, and how many distinct values should the variable have?
The process of performing initial investigations on data to spot outliers, discover patterns, and test assumptions with statistical insight and graphical visualization is called:
Which of the following is an example of a flat file?
While reviewing survey data, an analyst notices respondents entered “Jan,” “January,” and “01” as responses for the month of January. Which of the following steps should be taken to ensure data consistency?
Which of the following database types is the best to use for transactional SQL?
Which of the following data types would a telephone number formatted as XXX-XXX-XXXX be considered?
A database consists of one fact table that is composed of multiple dimensions. Each dimension is represented by a denormalized table. This structure is an example of a:
Which of the following is the best variable formal to store a customer's age using the least possible amount of storage data?
Which of the following would be used to store unstructured data from different sources?
A data analyst is reviewing SQL code and sees a query that uses terms such as MIN, SUM, and COUNT. Which of the following types of functions best describes these terms?
A data analyst needs to create a master file that includes customer information from the tables below:
Given the three tables above, the analyst wants to filter down the information prior to joining it together. In which of the following orders should this data manipulation bo approached for the most efficient result?
A data analyst must separate the column shown below into multiple columns for each component of the name:
Which of the following data manipulation techniques should the analyst perform?
The ACME Corporation hired an analyst to detect data quality issues in their Excel documents. Which of the following are the most common issues? (Select TWO)
An analyst is designing a dashboard to determine which site has the highest percentage of new customers. The analyst must choose an appropriate chart to include in the dashboard. The following data is available:
Which of the following types of charts should be considered to BEST display the data?
Given the following table:
Which of the following methods is the best way to describe the changes in the values in the table?
Joe. an analyst. tests the loading time on a dashboard he is preparing to go live and finds it is slower than he would like. Which of the following must occur to decrease the loading time?
Given the below:
Which of the following numbers represents a Type I error?
Given the following grocery store orders:
If a query is made to the table with the following logic:
Order_Total > 132 OR (Order Total >= 25 AND Order_Total < 74)
Which of the following is the number of orders that will be returned by the query?
An analyst is building a new dashboard for a user. After an initial conversation with the user. the analyst created a mock-up of the dashboard. Which of the following best explains why the analyst created the mock-up?
An analyst is working on a project for a director. During this process. the analyst pulled the data. created summarized tables and graphs with descriptions, created a report summary, and inserted all items into a report. After writing the report, which of the following would be the most appropriate next step?
A sales manager requested a report that contains the first name, last name, and phone number of all the company’s customers and employees. The data engineer needs to return all the records from several tables, even duplicates. Which of the following is the best way to join the two tables?
Amanda needs to create a dashboard that will draw information from many other data sources and present it to business leaders.
Which one of the following tools is least likely to meet her needs?
A development company is constructing a new Init in its apartment complex. The complex has the following floor plans:
Using the average cost per square foot of the original floor plans. which of the following should be the price of the Rose Init?
A data analyst has received a data set that contains actual and projected sales for the fourth quarter of 2019. Which of the following statistical methods should the analyst use to find the measure of dispersion?
A data engineer is creating a database field to capture whether a customer likes vanilla ice cream. Which of the following data types is the best to capture this information?
A gambler thinks that a coin is fair and is equally likely to turn up heads or tails when the coin is flipped. Which of the following tests should the gambler use to fest this hypothesis?
A data analyst is asked to create a sales report for the second-quarter 2020 board meeting, which will include a review of the business’s performance through the second quarter. The board meeting will be held on July 15, 2020, after the numbers are finalized. Which of the following report types should the data analyst create?
A research analyst collects ten data points from 1.000 specimens. The analyst will not need any additional data to complete the analysis and will not need to retrieve information by specifier. Which of the following is the best data structure for the analyst to use?
Which of the following best describes a 95% confidence interval?
A data analyst is helping a retail store categorize its customers into five different groups based on the following information:
• How recently the customers made purchases
• How frequently the customers made purchases
• How much the customers spent
Given the following information:
Which of the following would be most important for the analysis?
A data analyst is creating a report that will provide information about various regions, products, and time periods. Which of the following formats would be themost efficient way to deliver this report?
A data analyst has removed the outliers from a data set due to large variances. Which of the following central tendencies would be the best measure to use?
What analytics suite is offered by Microsoft and directly integrates with SQL Server Databases?
Which of the following types of dashboards should a business intelligence engineer develop in order to provide information about failed data pipelines?
A large data download was divided into two smaller files. Which of the following describes the best way to fix this issue?
A data analyst is developing a dashboard to track and monitor metrics. Which of the following best practices should be taken into during the FIRST pment process?
Which of the following is a common data analytics tool that is also used as an interpreted, high-level, general-purpose programming language?
What SQL command is used to delete an entire table from a database?
Given the following data tables:
Which of the following MDM processes needs to take place FIRST?
A data analyst needs to observe the relationship between two numeric variables and identify the clustering pattern as well as the outliers. Which of the following visualizations should the analyst use?
A web developer wants to ensure that malicious users can't type SQL statements when they asked for input, like their username/userid.
Which of the following query optimization techniques would effectively prevent SQL Injection attacks?
Which of the following activities occurs during the ETL process?
Which of the following best describes how discrete data differs from continuous data?
Given the following data set:
Which of the following is the best reason for cleansing the data?
An analyst is reporting on the average income for a county and is reviewing the following data:
Which of the following is the reason the analyst would need to cleanse the data in this data set?
An analyst computed a new variable of income per day in the household by multiplying the number of days worked by the number of people working in the household and the income earned per day. Which of the following is the correct name for this new variable?
During data profiling, an analyst decides to recode the status column in the following data set:
Which of the following data concerns explains why the analyst wants to take this action?
A database administrator is required to mask certain table columns containing Pll in order to comply with the company privacy policy. Which of the following are the most likely types of information the administrator should mask? (Select two).
A data analyst has been asked to create an ad-hoc sales report for the Chief Executive Officer (CEO).
Which of the following should be included in the report?
A financial institution is reporting on sales performance to a company at the account level. Due to the sensitive nature of the government the does il with, some account information is not shown. Which of the following fields should be masked?
Which of the following data sampling methods involves dividing a population into subgroups by similar characteristics?
A data analyst is designing a dashboard that will provide a story of sales and determine which site is providing the highest sales volume per customer The analyst must choose an appropriate chart to include in the dashboard. The following data is available:
Which of the following types of charts should be considered?
The number of phone calls that the call center receives in a day is an example of:
Which of the following describes the use of a representative amount of data from a main repository?
Which of the following best describes the process of examining data for statistics and information about the data?
Cleansing
A publishing group has requested a dashboard to track submissions before publication. A key requirement is that all changes are tracked, as multiple users will be checking out documents and editing them before submissions are considered final. Which of the following is the BEST way to meet this stakeholder requirement?
Which of the following tools would be best to use to calculate the interquartile range, median, mean, and standard deviation of a column in a table that has 5.000.000 rows?
An organization would like to add a secondary email field to its customer database in order toenrich the customer profiles. Which of the following data manipulation techniques should the analyst use to add this information?
An analyst has been tracking company intranet usage and has been asked to create a chat to show the most-used/most-clicked portions of a homepage that contains more than 30 links. Which of the following visualizations would BEST illustrate this information?
A report is scheduled to run and be distributed at the end of business each day. On Mondays, one of the recipients opens the previous week's reports and combines them to calculate the weekly totals and projections for the coming week. This is a tedious process, and the recipient asks an analyst for help. Which of the following should the analyst recommend?
Exhibit.
Which of the following logical statements results in Table B?
A)
B)
C)
D)
Which of the following is a characteristic of a relational database?
Which of the following contains alphanumeric values?
Which of the following reports can be used when insight into operational performance is needed each Wednesday?
Mario works with a group of R programmers tasked with copying data from an accounting system into a data warehouse.
In what phase are the group's R skills most relevant?
Angela is aggregating data from CRM system with data from an employee system.
While performing an initial quality check, she realizes that her employee ID is not associated with her identifier in the CRM system.
What kind of issues is Angela facing?
Choose the best answer.
Which of the following is a process that is used during data integration to collect, blend, and load data?
A data analyst was asked to create a visual representation of sales for the first quarter of 2020. Which of the following visualizations should be used when a time element is present?
An analyst runs a report on a daily basis, and the number of datapoints must be validated before the data can be analyzed. The number of datapoints increases each day by approximately 20% of the total number from the day before. On a given day, the number of datapoints was 8,798. Which of the following should be the total number of datapoints on the next day?
An analyst is updating a customer contacts database with information obtained from a survey of new customers. Which of the following data manipulation techniques should the analyst use?
Which of the following data manipulation techniques is an example of a logical function?
Which of the following best describes the law of large numbers?
An analyst is working with the income data of suburban families in the United States. The data set has a lot of outliers, and the analyst needs to provide a measure that represents the typical income. Which of the following would BEST fulfill the analyst’s goal?
What would be an example of an acceptable form of primary identification for the Data+ exam?
An analyst conducted a preliminary analysis for a data set and identified several patterns and anomalies. Which of the following analysis techniques did the analyst use?
Which of the following best describes a business analytics tool with interactive visualization and business capabilities and an interface that is simple enough for end users to create their own reports and dashboards?
Python
A salesperson who is prospecting potential clients collected the following data:
Which of the following is an issue with this data?
Which one of the following values will appear first if they are sorted in descending order?
A data analyst needs to present the results of an online marketing campaign to the marketing manager. The manager wants to see the most important KPIs and measure the return on marketing investment. Which of the following should the data analyst use to BEST communicate this information to the manager?
An analyst collected data that includes primary account numbers, expiration dates, and service codes. Which of the following data governance classifications is used to describe this data?
A reporting analyst needs to create a report that refreshes automatically and is accessible to the entire sales organization. Which of the following tools is the most appropriate to use for this task?