Which of the following approaches is best if a limited portion of your training data is labeled?
Which two of the following criteria are essential for machine learning models to achieve before deployment? (Select two.)
A dataset can contain a range of values that depict a certain characteristic, such as grades on tests in a class during the semester. A specific student has so far received the following grades: 76,81, 78, 87, 75, and 72. There is one final test in the semester. What minimum grade would the student need to achieve on the last test to get an 80% average?
Which of the following pieces of AI technology provides the ability to create fake videos?
In which of the following scenarios is lasso regression preferable over ridge regression?
Which of the following tools would you use to create a natural language processing application?
In a self-driving car company, ML engineers want to develop a model for dynamic pathing. Which of following approaches would be optimal for this task?
Which two techniques are used to build personas in the ML development lifecycle? (Select two.)
A company is developing a merchandise sales application The product team uses training data to teach the AI model predicting sales, and discovers emergent bias. What caused the biased results?
Which of the following are true about the transform-design pattern for a machine learning pipeline? (Select three.)
It aims to separate inputs from features.
Which of the following is the primary purpose of hyperparameter optimization?
The following confusion matrix is produced when a classifier is used to predict labels on a test dataset. How precise is the classifier?
In general, models that perform their tasks:
The graph is an elbow plot showing the inertia or within-cluster sum of squares on the y-axis and number of clusters (also called K) on the x-axis, denoting the change in inertia as the clusters change using k-means algorithm.
What would be an optimal value of K to ensure a good number of clusters?
When should you use semi-supervised learning? (Select two.)
An AI system recommends New Year's resolutions. It has an ML pipeline without monitoring components. What retraining strategy would be BEST for this pipeline?
Which of the following items should be included in a handover to the end user to enable them to use and run a trained model on their own system? (Select three.)
Which two of the following decrease technical debt in ML systems? (Select two.)
Given a feature set with rows that contain missing continuous values, and assuming the data is normally distributed, what is the best way to fill in these missing features?
Your dependent variable Y is a count, ranging from 0 to infinity. Because Y is approximately log-normally distributed, you decide to log-transform the data prior to performing a linear regression.
What should you do before log-transforming Y?
Which of the following is a type 1 error in statistical hypothesis testing?
An HR solutions firm is developing software for staffing agencies that uses machine learning.
The team uses training data to teach the algorithm and discovers that it generates lower employability scores for women. Also, it predicts that women, especially with children, are less likely to get a high-paying job.
Which type of bias has been discovered?
Your dependent variable data is a proportion. The observed range of your data is 0.01 to 0.99. The instrument used to generate the dependent variable data is known to generate low quality data for values close to 0 and close to 1. A colleague suggests performing a logit-transformation on the data prior to performing a linear regression. Which of the following is a concern with this approach?
Definition of logit-transformation
If p is the proportion: logit(p)=log(p/(l-p))
Which type of regression represents the following formula: y = c + b*x, where y = estimated dependent variable score, c = constant, b = regression coefficient, and x = score on the independent variable?
A big data architect needs to be cautious about personally identifiable information (PII) that may be captured with their new IoT system. What is the final stage of the Data Management Life Cycle, which the architect must complete in order to implement data privacy and security appropriately?
You are implementing a support-vector machine on your data, and a colleague suggests you use a polynomial kernel. In what situation might this help improve the prediction of your model?
A healthcare company experiences a cyberattack, where the hackers were able to reverse-engineer a dataset to break confidentiality.
Which of the following is TRUE regarding the dataset parameters?