Performance Evaluation of Neural Networks on Community Detection

Conclusion

Summary of Findings

The GCN model which used graph data and node features worked best at detecting communities on the CORA dataset, and, thus, we can get following conclusions:

The MLP on solely the graph structure performed the best on the Twitch Gamers dataset, and, hence, we can say that:

Conclusion

We can conclude that incorporating both graph structure and node features in a model, when both are predictive, would result in improved performance. This is because the model can leverage more predictive information, which would naturally improve its performance. However, if the information provided to the model is unhelpful or misleading, it would not contribute to its performance and may even bring it down. All in all, the best performing model depends on the quality of the data, so it is important to be flexible and use whichever is best for the situation.

Recommendations and Implications for Further Research

So far, the models included in this analysis were assumed to work on graph datasets where the edges were undirected, unweighted, and had no features. Assuming and going off of the result that the neural network that employed more data performed better, it may be interesting to further look into either creating new models or complicating our graph convolutional neural network that can account for datasets with directed and undirected edges, edge weights, and edge features to better improve predictions. A possible place to start could be inputting edge weights into the graph convolutional layers included in our graph convolutional neural network since edge weights are an optional input to feed into the layer, suggesting it should be possible to work with edge weights. Then from there, dive deeper into bringing in more data like edge directions and features.