1-• Read and store the attached iris data (iris.csv) using a process

• Start a new process and do the following

Homework 6, 7
• Retrieve the stored data using a Retrieve operator

• You are required to build a decision tree model for predicting the species of iris flower(species) using the rest of the columns in the dataset as predictors.

• Since you want to predict species, use the Set Role operator to set the species column as label

• Now split the data with 75% in training and 25% in test using the Split Data operator

• Now use the Decision Tree operator to build the model on the training dataset

• Now use the Apply Model operator and create the predictions. Make sure you input the test part of the dataset from the split data operator.

• Now use the performance operator to compute the confusion matrix and accuracy of the tree model

• Output the tree diagram (the model) to the results

• Turn in the RapidMiner file (.rmp)

• NOTE: Watching this video will help with this homework

2-• We will use the same iris dataset from last week. Read and store the attached iris data (iris.csv) using a process if you have not already done so in the last HW.

• Start a new process and do the following

• Retrieve the stored data using a Retrieve operator

• You are required to find three clusters in the species of iris flower(species) using the sepal_length, sepal_width, petal_length and petal_width columns in the dataset.

• Since you want only the four columns, use the Select Attribute operator to choose just the above mentioned four columns. Note – the iris dataset has one extra column which is a text column. If you include the text column the clustering will not work, as clustering only works on numerical data.

• Now normalize the data using the Normalize operator and just using the default parameters. Clustering works better on normalized data.

• Now use the K Means Clustering operator to find the 3 clusters

• Now use the Clustering Distance performance operator to compute the average within cluster distance

• Check the output to see which points belong to which cluster and the performance metrics

• Turn in the RapidMiner file (.rmp)

• NOTE: Watching this video will help with this homework

