CertNexus Certified Artificial Intelligence Practitioner (CAIP) Questions and Answers
Question 25
Which of the following is the correct definition of the quality criteria that describes completeness?
Options:
A.
The degree to which all required measures are known.
B.
The degree to which a set of measures are equivalent across systems.
C.
The degree to which a set of measures are specified using the same units of measure in all systems.
D.
The degree to which the measures conform to defined business rules or constraints.
Answer:
A
Explanation:
Completeness is a quality criterion that describes the degree to which all required measures are known. Completeness can help assess the coverage and availability of data for a given purpose or analysis. Completeness can be measured by comparing the actual number of measures with the expected number of measures, or by identifying and counting any missing, null, or unknown values in the data.
Question 26
Which of the following algorithms is an example of unsupervised learning?
Options:
A.
Neural networks
B.
Principal components analysis
C.
Random forest
D.
Ridge regression
Answer:
B
Explanation:
Unsupervised learning is a type of machine learning that involves finding patterns or structures in unlabeled data without any predefined outcome or feedback. Unsupervised learning can be used for various tasks, such as clustering, dimensionality reduction, anomaly detection, or association rule mining. Some of the common algorithms for unsupervised learning are:
Principal components analysis: Principal components analysis (PCA) is a method that reduces the dimensionality of data by transforming it into a new set of orthogonal variables (principal components) that capture the maximum amount of variance in the data. PCA can help simplify and visualize high-dimensional data, as well as remove noise or redundancy from the data.
K-means clustering: K-means clustering is a method that partitions data into k groups (clusters) based on their similarity or distance. K-means clustering can help discover natural or hidden groups in the data, as well as identify outliers or anomalies in the data.
Apriori algorithm: Apriori algorithm is a method that finds frequent itemsets (sets of items that occur together frequently) and association rules (rules that describe how items are related or correlated) in transactional data. Apriori algorithm can help discover patterns or insights in the data, such as customer behavior, preferences, or recommendations.
Question 27
Why do data skews happen in the ML pipeline?
Options:
A.
Test and evaluation data are designed incorrectly.
B.
There Is a mismatch between live input data and offline data.
C.
There is a mismatch between live output data and offline data.
D.
There is insufficient training data for evaluation.
Answer:
B
Explanation:
Data skews happen in the ML pipeline when the distribution or characteristics of the live input data differ from those of the offline data used for training and testing the model. This can lead to a degradation of the model performance and accuracy, as the model is not able to generalize well to new data. Data skews can be caused by various factors, such as changes in user behavior, data collection methods, data quality issues, or external events. References: What is training-serving skew in Machine Learning?, Data preprocessing for ML: options and recommendations