Sure Pass Exam DSA-C02 PDF

SnowPro Advanced: Data Scientist Certification Exam Questions and Answers

Question 5

Mark the Incorrect statements regarding MIN / MAX Functions?

Options:

NULL values are skipped unless all the records are NULL

NULL values are ignored unless all the records are NULL, in which case a NULL value is returned

The data type of the returned value is the same as the data type of the input values

For compatibility with other systems, the DISTINCT keyword can be specified as an argument for MIN or MAX, but it does not have any effect

Question 6

Which ones are the key actions in the data collection phase of Machine learning included?

Options:

Label

Ingest and Aggregate

Probability

Measure

Answer:

A, B

Explanation:

Explanation

The key actions in the data collection phase include:

Label: Labeled data is the raw data that was processed by adding one or more meaningful tags so that a model can learn from it. It will take some work to label it if such information is missing (manually or automatically).

Ingest and Aggregate: Incorporating and combining data from many data sources is part of data collection in AI.

Data collection

Collecting data for training the ML model is the basic step in the machine learning pipeline. The predictions made by ML systems can only be as good as the data on which they have been trained. Following are some of the problems that can arise in data collection:

Inaccurate data. The collected data could be unrelated to the problem statement.

Missing data. Sub-data could be missing. That could take the form of empty values in columns or missing images for some class of prediction.

Data imbalance. Some classes or categories in the data may have a disproportionately high or low number of corresponding samples. As a result, they risk being under-represented in the model.

Data bias. Depending on how the data, subjects and labels themselves are chosen, the model could propagate inherent biases on gender, politics, age or region, for example. Data bias is difficult to detect and remove.

Several techniques can be applied to address those problems:

Pre-cleaned, freely available datasets. If the problem statement (for example, image classification, object recognition) aligns with a clean, pre-existing, properly formulated dataset, then take ad-vantage of existing, open-source expertise.

Web crawling and scraping. Automated tools, bots and headless browsers can crawl and scrape websites for data.

Private data. ML engineers can create their own data. This is helpful when the amount of data required to train the model is small and the problem statement is too specific to generalize over an open-source dataset.

Custom data. Agencies can create or crowdsource the data for a fee.

Question 7

Which of the Following is not type of Windows function in Snowflake?

Options:

Rank-related functions.

Window frame functions.

Aggregation window functions.

Association functions.

Question 8

Which of the following cross validation versions is suitable quicker cross-validation for very large datasets with hundreds of thousands of samples?

Options:

k-fold cross-validation

Leave-one-out cross-validation

Holdout method

All of the above

Month End Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

SnowPro Advanced: Data Scientist Certification Exam Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

CompTIA

Fortinet

Microsoft

Salesforce