2020 New Braindump2go MLS-C01 PDF and MLS-C01 VCE Dumps Free MLS-C01 Braindumps!
New Question A Machine Learning Specialist is building a convolutional neural network (CNN) that will classify 10 types of animals. The Specialist has built a series of layers in a neural network that will take an input image of an animal, pass it through a series of convolutional and pooling layers, and then finally pass it through a dense and fully connected layer with 10 nodes. The Specialist would like to get an output from the neural network that is a probability distribution of how likely it is that the input image belongs to each of the 10 classes. Which function will produce the desired output? A. Dropout B. Smooth L1 loss C. Softmax D. Rectified linear units (ReLU) Answer: D New Question A Machine Learning Specialist trained a regression model, but the first iteration needs optimizing. The Specialist needs to understand whether the model is more frequently overestimating or underestimating the target. What option can the Specialist use to determine whether it is overestimating or underestimating the target value? A. Root Mean Square Error (RMSE) B. Residual plots C. Area under the curve D. Confusion matrix Answer: C New Question A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided. Based on this information, which model would have the HIGHEST recall with respect to the fraudulent class? A. Decision tree B. Linear support vector machine (SVM) C. Naive Bayesian classifier D. Single Perceptron with sigmoidal activation function Answer: C New Question A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric. This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours. With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input hyperparameter range(s). Which visualization will accomplish this? A. A histogram showing whether the most important input feature is Gaussian. B. A scatter plot with points colored by target variable that uses t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the large number of input variables in an easier-to-read dimension. C. A scatter plot showing the performance of the objective metric over each training iteration. D. A scatter plot showing the correlation between maximum tree depth and the objective metric. Answer: B New Question A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences. The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions. Here is an example from the dataset: "The quck BROWN FOX jumps over the lazy dog." Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Choose three.) A. Perform part-of-speech tagging and keep the action verb and the nouns only. B. Normalize all words by making the sentence lowercase. C. Remove stop words using an English stopword dictionary. D. Correct the typography on "quck" to "quick." E. One-hot encode all words in the sentence. F. Tokenize the sentence into words. Answer: ABD New Question A Data Scientist is evaluating different binary classification models. A false positive result is 5 times more expensive (from a business perspective) than a false negative result. The models should be evaluated based on the following criteria: 1) Must have a recall rate of at least 80% 2) Must have a false positive rate of 10% or less 3) Must minimize business costs After creating each binary classification model, the Data Scientist generates the corresponding confusion matrix. Which confusion matrix represents the model that satisfies the requirements? A. TN = 91, FP = 9 FN = 22, TP = 78 B. TN = 99, FP = 1 FN = 21, TP = 79 C. TN = 96, FP = 4 FN = 10, TP = 90 D. TN = 98, FP = 2 FN = 18, TP = 82 Answer: D Explanation: The following calculations are required: TP = True Positive FP = False Positive FN = False Negative TN = True Negative FN = False Negative Recall = TP / (TP + FN) False Positive Rate (FPR) = FP / (FP + TN) Cost = 5 * FP + FN Options C and D have a recall greater than 80% and an FPR less than 10%, but D is the most cost effective. New Question A Data Scientist uses logistic regression to build a fraud detection model. While the model accuracy is 99%, 90% of the fraud cases are not detected by the model. What action will definitively help the model detect more than 10% of fraud cases? A. Using undersampling to balance the dataset B. Decreasing the class probability threshold C. Using regularization to reduce overfitting D. Using oversampling to balance the dataset Answer: B Explanation: Decreasing the class probability threshold makes the model more sensitive and, therefore, marks more cases as the positive class, which is fraud in this case. This will increase the likelihood of fraud detection. However, it comes at the price of lowering precision. New Question Machine Learning Specialist is building a model to predict future employment rates based on a wide range of economic factors. While exploring the data, the Specialist notices that the magnitude of the input features vary greatly. The Specialist does not want variables with a larger magnitude to dominate the model. What should the Specialist do to prepare the data for model training? A. Apply quantile binning to group the data into categorical bins to keep any relationships in the data by replacing the magnitude with distribution. B. Apply the Cartesian product transformation to create new combinations of fields that are independent of the magnitude. C. Apply normalization to ensure each field will have a mean of 0 and a variance of 1 to remove any significant magnitude. D. Apply the orthogonal sparse bigram (OSB) transformation to apply a fixed-size sliding window to generate new features of a similar magnitude. Answer: C New Question A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. The dataset contains more than 800,000 records stored as plaintext CSV files. Each record contains 200 columns and is approximately 1.5 MB in size. Most queries will span 5 to 10 columns only. How should the Machine Learning Specialist transform the dataset to minimize query runtime? A. Convert the records to Apache Parquet format. B. Convert the records to JSON format. C. Convert the records to GZIP CSV format. D. Convert the records to XML format. Answer: A New Question A Data Engineer needs to build a model using a dataset containing customer credit card information How can the Data Engineer ensure the data remains encrypted and the credit card information is secure? A. Use a custom encryption algorithm to encrypt the data and store the data on an Amazon SageMaker instance in a VPC. Use the SageMaker DeepAR algorithm to randomize the credit card numbers. B. Use an IAM policy to encrypt the data on the Amazon S3 bucket and Amazon Kinesis to automatically discard credit card numbers and insert fake credit card numbers. C. Use an Amazon SageMaker launch configuration to encrypt the data once it is copied to the SageMaker instance in a VPC. Use the SageMaker principal component analysis (PCA) algorithm to reduce the length of the credit card numbers. D. Use AWS KMS to encrypt the data on Amazon S3 and Amazon SageMaker, and redact the credit card numbers from the customer data with AWS Glue. Answer: C New Question A Machine Learning Specialist is using an Amazon SageMaker notebook instance in a private subnet of a corporate VPC. The ML Specialist has important data stored on the Amazon SageMaker notebook instance's Amazon EBS volume, and needs to take a snapshot of that EBS volume. However, the ML Specialist cannot find the Amazon SageMaker notebook instance's EBS volume or Amazon EC2 instance within the VPC. Why is the ML Specialist not seeing the instance visible in the VPC? A. Amazon SageMaker notebook instances are based on the EC2 instances within the customer account, but they run outside of VPCs. B. Amazon SageMaker notebook instances are based on the Amazon ECS service within customer accounts. C. Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts. D. Amazon SageMaker notebook instances are based on AWS ECS instances running within AWS service accounts. Answer: C New Question A Machine Learning Specialist is building a model that will perform time series forecasting using Amazon SageMaker. The Specialist has finished training the model and is now planning to perform load testing on the endpoint so they can configure Auto Scaling for the model variant. Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilization during the load test? A. Review SageMaker logs that have been written to Amazon S3 by leveraging Amazon Athena and Amazon QuickSight to visualize logs as they are being produced. B. Generate an Amazon CloudWatch dashboard to create a single view for the latency, memory utilization, and CPU utilization metrics that are outputted by Amazon SageMaker. C. Build custom Amazon CloudWatch Logs and then leverage Amazon ES and Kibana to query and visualize the log data as it is generated by Amazon SageMaker. D. Send Amazon CloudWatch Logs that were generated by Amazon SageMaker to Amazon ES and use Kibana to query and visualize the log data Answer: B New Question A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket. A Machine Learning Specialist wants to use SQL to run queries on this data. Which solution requires the LEAST effort to be able to query this data? A. Use AWS Data Pipeline to transform the data and Amazon RDS to run queries. B. Use AWS Glue to catalogue the data and Amazon Athena to run queries. C. Use AWS Batch to run ETL on the data and Amazon Aurora to run the queries. D. Use AWS Lambda to transform the data and Amazon Kinesis Data Analytics to run queries. Answer: B New Question A Machine Learning Specialist is developing a custom video recommendation model for an application. The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket. The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance. Which approach allows the Specialist to use all the data to train the model? A. Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode. B. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to the instance. Train on a small amount of the data to verify the training code and hyperparameters. Go back to Amazon SageMaker and train using the full dataset C. Use AWS Glue to train a model using a small subset of the data to confirm that the data will be compatible with Amazon SageMaker. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode. D. Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to train the full dataset. Answer: A New Question A company is setting up a system to manage all of the datasets it stores in Amazon S3. The company would like to automate running transformation jobs on the data and maintaining a catalog of the metadata concerning the datasets. The solution should require the least amount of setup and maintenance. Which solution will allow the company to achieve its goals? A. Create an Amazon EMR cluster with Apache Hive installed. Then, create a Hive metastore and a script to run transformation jobs on a schedule. B. Create an AWS Glue crawler to populate the AWS Glue Data Catalog. Then, author an AWS Glue ETL job, and set up a schedule for data transformation jobs. C. Create an Amazon EMR cluster with Apache Spark installed. Then, create an Apache Hive metastore and a script to run transformation jobs on a schedule. D. Create an AWS Data Pipeline that transforms the data. Then, create an Apache Hive metastore and a script to run transformation jobs on a schedule. Answer: B Explanation: AWS Glue is the correct answer because this option requires the least amount of setup and maintenance since it is serverless, and it does not require management of the infrastructure. A, C, and D are all solutions that can solve the problem, but require more steps for configuration, and require higher operational overhead to run and maintain. New Question A Data Scientist is working on optimizing a model during the training process by varying multiple parameters. The Data Scientist observes that, during multiple runs with identical parameters, the loss function converges to different, yet stable, values. What should the Data Scientist do to improve the training process? A. Increase the learning rate. Keep the batch size the same. B. Reduce the batch size. Decrease the learning rate. C. Keep the batch size the same. Decrease the learning rate. D. Do not change the learning rate. Increase the batch size. Answer: B Explanation: It is most likely that the loss function is very curvy and has multiple local minima where the training is getting stuck. Decreasing the batch size would help the Data Scientist stochastically get out of the local minima saddles. Decreasing the learning rate would prevent overshooting the global loss function minimum. New Question A Machine Learning Specialist is configuring Amazon SageMaker so multiple Data Scientists can access notebooks, train models, and deploy endpoints. To ensure the best operational performance, the Specialist needs to be able to track how often the Scientists are deploying models, GPU and CPU utilization on the deployed SageMaker endpoints, and all errors that are generated when an endpoint is invoked. Which services are integrated with Amazon SageMaker to track this information? (Choose two.) A. AWS CloudTrail B. AWS Health C. AWS Trusted Advisor D. Amazon CloudWatch E. AWS Config Answer: AD New Question A retail chain has been ingesting purchasing records from its network of 20,000 stores to Amazon S3 using Amazon Kinesis Data Firehose. To support training an improved machine learning model, training records will require new but simple transformations, and some attributes will be combined. The model needs to be retrained daily. Given the large number of stores and the legacy data ingestion, which change will require the LEAST amount of development effort? A. Require that the stores to switch to capturing their data locally on AWS Storage Gateway for loading into Amazon S3, then use AWS Glue to do the transformation. B. Deploy an Amazon EMR cluster running Apache Spark with the transformation logic, and have the cluster run each day on the accumulating records in Amazon S3, outputting new/transformed records to Amazon S3. C. Spin up a fleet of Amazon EC2 instances with the transformation logic, have them transform the data records accumulating on Amazon S3, and output the transformed records to Amazon S3. D. Insert an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data Firehose stream that transforms raw record attributes into simple transformed values using SQL. Answer: D Resources From: 1.2020 Latest Braindump2go MLS-C01 Exam Dumps (PDF & VCE) Free Share: https://www.braindump2go.com/mls-c01.html 2.2020 Latest Braindump2go MLS-C01 PDF and MLS-C01 VCE Dumps Free Share: https://drive.google.com/drive/folders/1eX--L9LzE21hzqPIkigeo1QoAGNWL4vd?usp=sharing 3.2020 Latest MLS-C01 Exam Questions from: https://od.lk/fl/NDZfMTI1MDEyN18 Free Resources from Braindump2go,We Devoted to Helping You 100% Pass All Exams!