Campbell (1974) studied rock crabs of the genus leptograpsus. One species, L. variegatus, had been split into two new species according to their colour: orange and blue. Preserved specimens lose their colour, so it was hoped that morphological differences would enable museum material to be classified. Data are available on 50 specimens of each sex of each species. Each specimen has measurements on:
FL
,RW
, CL
, CW
of the carapace, and BD
in mm.in addition to colour/species and sex (we will treat the problem as unsupervised and will ignore these for now).
#header to make the figures look pretty
options(repr.plot.width=5, repr.plot.height=5)
library(MASS)
varnames<-c('FL','RW','CL','CW','BD')
Crabs <- crabs[,varnames];
head(Crabs[sample(50),])
pairs(Crabs)
Crabs.pca <- princomp(Crabs)
summary(Crabs.pca)
loadings(Crabs.pca)
Crabs_proj <- predict(Crabs.pca)
pairs(Crabs_proj)
What did we discover?
Let us use our label information (species+sex).
#pairwise plots of original data
Crabs.class <- factor(paste(crabs$sp,crabs$sex,sep=""))
pairs(Crabs,col=unclass(Crabs.class))
#pairwise plots of PCA projections
pairs(Crabs_proj,col=unclass(Crabs.class))
plot(Comp.3~Comp.2,data=Crabs_proj,col=unclass(Crabs.class))
The first principal component reflects the general size of the crab which is not informative about their species/sex. The second and third principal components, in contrast, appear to reflect exactly the variability which is due to species/sex.