Essential Mathematics for Data Scientists: Assessment 5: VisualRank using image similarity

By Manjunatha VG
\$30
Subjects:
MATLAB, MATLAB based Modelling / Simulation / Optimization, Data Science (Mathematics), Data Science
Level:
Types:
Assessment
Language used:
English

Assessment 5: VisualRank using image similarity

Introduction
This assignment is based on the well-known PageRank algorithm developed by Google to help in ranking webpages. Although much has changed with Google’s techniques and it is not known exactly how much of a role the original PageRank algorithm plays in current webpage rankings, it remains an important tool in Google’s arsenal.

A different application of PageRank came about when some of Google’s engineers suggested applying it to images. They came up with the ‘VisualRank’ algorithm. We shall attempt something similar in this assignment, albeit with much more simplistic tools!

1. Using the theory above, rank all 1400 MPEG7 shape images using VisualRank.
a. Load the ‘sim.mat’ data file into MATLAB. This file contains the similarity matrix S for the entire image dataset, as well as a cell array of corresponding filenames.
b. From S, we can form the adjacency matrix A. Start by setting A=S, then remove elements from A to leave visual hyperlinks between only those images that are positively correlated.
c. Finally, remove the elements from A that correspond to loop edges in the visual similarity graph.
d. Use A to create the hyperlink matrix H.
e. Form the random jump matrix J.

f. Given a damping factor of 𝑑 = 0.85, create the modified visual hyperlink matrix Htilde.
g. Find the VisualRank vector r using an initial vector.
h. Given this VisualRank vector, which image is ranked highest? Display this image.

VisualRank ranks images in a group by how representative they are of the group as a whole. This is a difficult task when the group of images is large and varied, as it is in this case. The only common feature among all 1400 images is that the images tend to depict some white object toward the centre of the frame, as can be seen by the average of all the images:

2. Rank the 20 heart-shaped images (indices 81 through 100) using VisualRank.
a. Make a smaller 20x20 adjacency matrix by indexing the necessary rows and columns of the full adjacency matrix from above.
b. Form the corresponding visual hyperlink matrix and find the VisualRank vector for all 20 heart images.
c. Given this ranking, which heart-shaped image is most representative of the group of heart-shaped images? Display this image.
d. Similarly, which heart-shaped image is the least heart-shaped (according to VisualRank)? Also display this image.

3. Pretend you are a search engine that has been queried for an image search of the following pentagon-looking shape that is present in the dataset as ‘device6-18.png’ (at index 650):

MATLAB has a nearest function that finds the nearest nodes to a particular node on a graph. We will use this function to search for the images similar (near) to the image above and then refine these search results using VisualRank.

a. We first need to create a graph from the similarity adjacency matrix A created in Task 1. However, we cannot use this matrix directly, as it contains edge weights that are larger when images are more similar, corresponding to a further distance. We would instead like images that are similar to each other to be nearer in the graph. This can be achieved simply forming a new adjacency matrix for which the elements are the reciprocal of those in A. Do this, and, using the digraph function, form the graph G corresponding to this new adjacency matrix.
b. Using MATLAB’s nearest function, find the 10 images nearest to ‘device6-18.png’ in the graph G.
c. Finally, rank these 10 nearest images using VisualRank and display