Filtering the research record and farming big data - header image

Filtering the research record and farming big data

Filed under: Research Productivity

Mendeley Institutional Edition (MIE) powered by Swets is an extension of the Mendeley research management platform, which funnels data captured from individual user accounts from a given institution direct to their librarians. The resulting statistics illustrate the unique research behavior of Mendeley’s users and provide a real-time insight into the reach and impact of their work:

  • Readership of research content within Mendeley
  • Publication statistics for individual researchers
  • Number of people reading faculty members’ uploaded research, demonstrating impact
  • All statistics are in real time – something that hasn’t been available on this scale before

Here’s a bit of background
Mendeley currently has a massive database of academic papers, registered users and user-generated research groups. To be precise, at time of writing, they have:



The collective power of over 1.8M users (including researchers, students and others) is immense when put into the context of academic research.  It is estimated that there are over 7MGlobal network of researchers researchers in the world (based on a study published in a 2010 UNESCO report – it is likely this number has increased considerably thanks to rapid expansion of academic programs in countries like India and China).

The power lies in the organic filtering that takes place from each individual as they upload citations and full text for papers that are important to them and their research. Reading and sharing statistics sit on top of this, as well as Mendeley groups, to provide a map of how the entire research landscape looks.

Visualizing research
It would be really interesting to see this visualized  in terms of citation density across all subject areas, and compare it to an equivalent chart from Thomson Reuters’ Web of Science™ (WoS) (in fact, I found one for the SCI here in this Arxiv paper: page 9) or Elsevier’s Scopus™ (and here’s an NIH subject map sourced from Scopus citations). The subject categorization of these services, while accurate,  is usually quite rigid and with a granularity of boulder proportions.

The difference between WoS, Scopus and Mendeley is that on the latter, academic content is categorized by the very people who are producing that content, and therefore it matches the real research landscape in a more organic, crowd-sourced way. Crowdsourcing the literatureIt demonstrates a more effective way of classifying information than setting up predefined, pigeonholed subject categories and expecting research to fit into them. It is hard to argue with the granularity provided by 172,000+ research groups within the Mendeley network.

Lots and lots of documents
So, 277M documents? This means there are that many citations held within Mendeley, many have corresponding full-text, some don’t. It also means that this is the largest research database in existence in terms of pure numbers. The possibilities for sharing and discovery within a network of this magnitude are exciting. What is more, with the Mendeley Institutional Edition, libraries can also upload their subscribed A-Z journal collection into the platform to serve direct links to full-text content to end-users, making their discovery process even simpler, and increasing their research productivity.

Lots and lots of data
Having such a coverage of global research really opens up great data possibilities. Not only do users have metrics added to their profiles, but the Institutional Edition gives a collective snapshot, in real time, of how content is being used by Mendeley users from that particular institution. It records and plots graphs relating to publication usage, research output by authors within the institution, and also the number of readers of those publications (giving a measure of impact).

Customizing the library journal holdings to match current trends in content consumption by library patrons is a key task that librarians undertake. Frequently, researchers have  a poor understanding of why they have access to particular resources online, including e-journals. This is mostly (because the proportion of open access content is increasing at a rapid rate) due to the work of the librarians ‘behind the scenes’ making sure the content is available in the simplest way possible.

Evidence-based content selection
Being able to spot trends in usage patterns and publication outlets is therefore of great relevance to the annual selection management tasks that are performed in libraries around the world. The matching of end-user needs and available budget requires more evidence than ever to justify decisions, and to demonstrate value for money (see earlier post on the components of a successful selection management strategy).

The combination of industry-standard metrics and the insights delivered by MIE altmetrics make a powerful suite of tools designed to influence and inform decisions on library collections: COUNTER usage, SJR impact factors, Mendeley usage, sharing and impact statistics. On top of that add up-to-date title lists and pricing, licensing conditions, and other criteria and you have a complete overview of your options, enabling better selection management.

What other sources of 'big data' would be useful to you for your collection management?