ARE YOU LOOKING FOR RELIABLE KIT108 ARTIFICIAL INTELLIGENCE ASSIGNMENT ASSIGNMENT HELP SERVICES? EXPERTSMINDS.COM IS RIGHT CHOICE AS YOUR STUDY PARTNER!
KIT108 Artificial Intelligence Assignment - University of Tasmania, Australia
Unit Learning Outcomes -
1. Understand the local and global impact of AI on individuals, organizations, and society
2. Adapt and apply techniques for acquiring, representing, and reasoning with data, information, and knowledge
3. Select and effectively apply techniques to develop simple AI solutions
4. Analyze a problem, apply knowledge of AI principles, and use ICT technical skills to develop potential solutions
5. Evaluate strengths and weaknesses of potential AI solutions
ESTIMATING THE AGE OF ABALONE BASED ON PHYSICAL MEASUREMENTS
Introduction
Abalone is a general name for small and large sea snail of the Haliotidae family. They can be edible or poisonous. Typically the age is determined through a complex laboratory procedure, which involves cutting through the subject, staining it and recording the number of rings through a microscope. Many scientific methods has been developed which support prediction of the age from its physical measurements. The main objective of this analysis is to develop ML models which can be used to predict the number ofrings based on its physical measurements.
Task 1: Data Collection - Identify irrelevant information from the data and remove it to clean the data.
Answer - Data collection
Data used on this analysis was obtained from a publicly available data publishedby (Warwick J Nash et al, 1994).The following variables are included on the dataset; Gender, length, Thickness, Tallness, weight attributes and ring (see appendix table 1). The original dataset can be found on UCL machine learning repository link below found on the references.
SAVE YOUR HIGHER GRADE WITH ACQUIRING DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT HELP & QUALITY HOMEWORK WRITING SERVICES OF EXPERTSMINDS.COM!
Task 2: Data Pre-processing - There are some missing values for height attribute and rings. Decide the way you handle this issue and explain why.
Answer - Data Processing
The collected data includes some missing values for the variable height and rings; the missing values were estimated using data imputation in weka. Through the Generic user interface (GUI) of the software the filter replacemissingvalues was used for data imputation.The imputation algorithm works by replacing missing values with the mean of attribute. Imputing missing values was found the most appropriate to ensure that the sample is preserved
Task 3: Data Transformation - We need to create a new attribute called volume from other attributes as: volume = length * diameter * height. Normalise the data into [0-1] range.
Answer - Data Transformation
For this analysis data transformation was done in both weka and excel. In excel a new attribute was calculated by multiplying length, volume and diameter i.e multiplying columns A,Band C of our dataset. The file was saved as a CSV file for further use. On weka the normalize filter under supervised option,by default the filter scales data into a scale 0-1 by a method called min-max normalization. The formula used in mini-max normalization is ;
Normalized value =((a- b1))/((c1-b1)) *(c2-b2)+b2
Where a is the value to be normalized, b1 is the minimum value of the variable, c1 is the maximum value of the variable. c2 is the desired maximum value and b2 is the desired minimum value.
Task 4: Data Mining & Pattern Evaluation - Prepare your data from the to have: A training set of the first 2500 samples, A validation set of the next 633 samples, and A test set of the last 1044 samples. Run 15 machine learning algorithms and report their accuracy on the validation set to a table. Explain how the best algorithms work (in the report) Tips: How to improve performance?
Answer - DATA MINING & Pattern Evaluation
Data preparation
The data was partitioned into training, validation and testing sets in by random sampling in excel, the following step; first an index (id) for each row was created by adding a new column with values 1 to 4177, a second column with a random value for each row was added using excel function =RAND().A third column was added which is a random sample without replacement from the index column using the following excel formula =INDEX($L$1: $L$4177),RANK(M1,$M$1: $M$4177),1) then drag to fill. The new variable contained the new sample for rows; the data was then sorted by the sampled indexes. The first 2500, samples were taken to training, the second 633 samples were taken to validation then the lastset was taken 1044 observation to testing test.
DO YOU WANT TO EXCEL IN DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT? HIRE TRUSTED TUTORS FROM EXPERTSMINDS AND ACHIEVE SUCCESS!
Modeling
15 machine learning algorithms were run using the training set then their performance on the validation set was extracted and casted on table 1 below
Table 1: Results of the 15 algorithms on the validation set
|
R
|
MAE
|
RMSE
|
Relative absolute error
|
Root relative squared error
|
Total Number of Instances
|
RandomForest
|
0.7306
|
0.0542
|
0.0792
|
64.18%
|
68.29%
|
633
|
K-star
|
0.7146
|
0.0546
|
0.0816
|
64.71%
|
70.33%
|
633
|
SMReg
|
0.7212
|
0.0549
|
0.0816
|
65.00%
|
70.31%
|
633
|
M5algorithm
|
0.731
|
0.056
|
0.0795
|
66.37%
|
68.55%
|
633
|
multlayer
|
0.7338
|
0.0568
|
0.0839
|
67.31%
|
72.31%
|
633
|
linearReg
|
0.7181
|
0.0569
|
0.0808
|
67.40%
|
69.66%
|
633
|
Reptree
|
0.654
|
0.0602
|
0.0891
|
71.35%
|
76.79%
|
633
|
M5 tree model
|
0.7143
|
0.0608
|
0.083
|
72.01%
|
71.52%
|
633
|
Decision table
|
0.6584
|
0.0609
|
0.0874
|
72.12%
|
75.30%
|
633
|
additive regression
|
0.6577
|
0.0625
|
0.0875
|
74.05%
|
75.41%
|
633
|
IBK(K neighbours)
|
0.6221
|
0.0661
|
0.0968
|
78.27%
|
83.47%
|
633
|
LWL
|
0.5285
|
0.0716
|
0.0985
|
84.78%
|
84.92%
|
633
|
Decision stump
|
0.5146
|
0.072
|
0.0995
|
85.30%
|
85.75%
|
633
|
Randomtree
|
0.5525
|
0.0779
|
0.1113
|
92.24%
|
95.92%
|
633
|
ZeroR
|
0
|
0.0844
|
0.116
|
100%
|
100%
|
633
|
The best performing model on the validation set was found to be a random forest; the model had root mean square error value equal to 0.0792, a mean absolute error 0.0542 and a root relative squared error equal to 68.29%. These values are a measure of deviation between the actual value and predicted value. Small values are therefore preferred. The model was then taken to the testing set.
EXPERTSMINDS.COM ACCEPTS INSTANT AND SHORT DEADLINES ORDER FOR DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT - ORDER TODAY FOR EXCELLENCE!
Table 2: error measures of the random forest on the testing set
|
R
|
MAE
|
RMSE
|
Relative absolute error
|
Root relative squared error
|
Total Number of Instances
|
Random Forest
|
0.7434
|
0.0563
|
0.0791
|
65.26%
|
67.09%
|
1044
|
Table 3: Variable importance
Node impurity
|
variable
|
0.09 (10939)
|
shell weight
|
0.04 (9367)
|
Volume
|
0.03 (10746)
|
height
|
0.03 (13517)
|
shucked weight
|
0.02 (14734)
|
whole weight
|
0.02 (13434)
|
Diameter
|
0.02 (4610)
|
sex
|
0.02 (11769)
|
Viscera weight
|
0.01 (16840)
|
Length
|
0 (0)
|
edible
|
NEVER LOSE YOUR CHANCE TO EXCEL IN DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT - HIRE BEST QUALITY TUTOR FOR ASSIGNMENT HELP!
Conclusion
How the model works
A random forest is a modification of a random tree model to include more than one tree; it builds bagged trees on bootstrapped training samples. The splits of trees into more than one tree are based on sampling of the predictors. The algorithm has therefore two main parameters, the number of variables to sample before splitting a tree and the minimum number of observations to build a tree on, when this value is reached, sampling stops then trees vote, the most voted values are chosen as prediction.
From the above output , the model had a root mean squared error equal to 0.0791, and a mean absolute error equal to 0.0563. These accuracy measures show that the model performs better on the testing set, probably because of large sample size. The variable importance were computed in terms of node impurity where on each node permutation were carried out and the amount of change on out of bag error is calculated the overall decrease in error is therefore considered to determine the sensitivity of the given variable. The most important variable to the model is therefore the weight of the abalone followed by the computed variable volume. The least important is edible.
ORDER NEW DATA COLLECTION, PROCESSING AND TRANSFORMATION ASSIGNMENT & GET 100% ORIGINAL SOLUTION AND QUALITY WRITTEN CONTENTS IN WELL FORMATS AND PROPER REFERENCING.
Get our University of Tasmania, Australia Assignment Help services for below mentioned courses like:-
- KIT502 Web Development Assignment Help
- KIT505 Computational Thinking and Impact of Emerging Technology Assignment Help
- KIT507 Games Design and Production Assignment Help
- KIT508 Virtual and Mixed Reality Technology Assignment Help
- KIT707 Knowledge and Information Management Assignment Help
- KIT708 ICT Systems Strategy and Management Assignment Help
- KIT709 Enterprise Architecture and Systems Assignment Help
- KIT710 eLogistics Assignment Help
- KIT711 Network Security Techniques and Technology Assignment Help
- KIT712 Data Management Technology Assignment Help