Besides the custom metrics, we offer a variety of apps based on the analysis of published papers. You can create keyword clouds for individual authors, keyword clouds for groups of people, you can search for authors with research interests similar to another author, search for authors who have worked on certain topics, and get a 10-year prediction of an author’s h-index.
These tools are presently only functional for authors whose publications are on the arXiv server.
We hope that in the future we may be able to extend this offer to other databases.
1. Keyword Clouds for Individuals
On the main page, you can enter authors either by name, arXiv author identifier, or ORCID. In response to your query you will obtain a keyword cloud with default settings. Exactly how this cloud is generated is explained below.
You can then customize the keyword cloud by excluding certain papers from the analysis, excluding certain words, or changing the size of the cloud to suit your needs. When everything looks good, you can download the image as an SVG or PNG.
2. Keyword Clouds for Groups
Keyword clouds for groups can be generated by entering multiple author names. They can be customized in the same way as the clouds for individual authors.
3. Find out who shares your interests
On the page Similar authors you can enter any number of authors (either by name, arXiv author id, or ORCID) and find a list of authors which are similar to the authors used as input. You can also adjust the weights of the input names so that some are given more, others less relevance.
4. Find people who have worked on similar topics
On the page Topic matches you can enter a list of keywords and find authors who have worked on those topics. You can also adjust the weights of the keywords so that some are given more, others less relevance.
5. Neural net prediction
On the page H-index net you will get a 10-year prediction for the h-index of an author. We calculate this prediction using the network described in our paper “Predicting Citation Counts with a Neural Network”.
How it works
Most of our apps rely on a vector representation of authors and keywords. That is, we associate to every author, and each of the 40000 keywords extracted from arXiv abstracts and titles, a 200 dimensional vector. The vectors are chosen so that every author’s vector is close to a weighted average of the vectors of all keywords they use, and every keyword’s vector is close to a weighted average of the vectors of all authors who have used that keyword. The idea is that the vectors of two authors or keywords should be close together exactly when the authors or keywords are associated with similar topics. When you make a query, the server uses these vectors and other data to compute the results.
On this page, the size of each keyword is determined based on the dot product of that keyword’s vector with the author’s vector, as well as the number of times the author has used each keyword, and the number of times each keyword is used in total (since we don’t want the biggest keywords to be too hard to recognize). The colour is determined based on the dates of papers of the author which use that keyword. Redder keywords are ones that the author has used more in recent papers.
On this page, authors are sorted based on how near their vector is to the average vector of authors given in the query, and how close their log number of papers is to the mean log number of papers of authors given in the query.
Given a query on the Topic Matches page, we use the vector representations to associate a score to each keyword, so that, roughly speaking, keywords that are closer to a query term will have a higher score. We then rank authors by a weighted sum of the scores of keywords they use, where keywords that they have used more will tend to have a higher weight. The weights also depend on the number of keywords they have used in each paper, so that we don’t give an unfair advantage to authors who have more keywords because they write long abstracts. The end result is that authors are sorted by how much research they have done that is related to topics specified in the query.
Neural net prediction
Given an author, we feed a neural network past data about this author as input and the neural network predicts this author's h-index 10 years into the future. Details of the network are described in this paper.
More details about the algorithm to identify keywords and authors can be found in this paper.
Please note that the assignment of weights in the keyword cloud takes into account visual appeal. The rendering of the cloud therefore depends on the list of keywords and is specific for each author (or group). This means that just looking at a keyword cloud does not allow a quantitative comparison between different authors. This is not their intended use.