Ernest Rutherford said that “All science is either physics or stamp collecting.” Today I am pleased to present an interview with Alex Bateman, a scientist who believes that scientific “stamp collecting” is a very important thing nowadays, and who collects stamps with Wikipedia – the open encyclopedia that everyone can edit. Is collaborative stamp collecting the future of science?
You are the head of the group at the European Bioinformatics Institute that runs the Rfam database. Could you tell me a little bit about your research goals?
I am a scientific stamp collector. I like to collect things and that is why I really enjoy creating databases and bringing knowledge together. I started with the Pfam database with proteins, then at the Sanger Institute, together with Sam Griffiths-Jones, we created the Rfam database with RNA, but there were also a number of other database projects, which I participated in. So I like collecting databases too.
Why are these databases important?
The fundamentally important method for understanding things in biology is by sequencing genes and proteins. Once we have sequences we can compare them to one another, and this way we can learn things about them without carrying out an experiment. For example, by sequencing the genome of some microbe we may find that it can produce antibiotics or it could resist them. The databases that we create help to organize information about the functions of proteins or RNAs to help people quickly understand the biology of any particular protein, gene and finally the organism.
If you have a human disease protein you can also have a look at other related proteins in the family to find out what was the natural function of this protein and what went wrong, to find what was the genesis of this disease. But these are just some examples of usage. Some of our databases have 10 million visits per year. This shows how often researchers use the kind of information we collect.
Why does the Rfam database need support from the Wikipedia community?
When we started the Rfam database, which collects families of RNA, we wanted to have annotations about each of these families. But we were not experts about any of these families, so we wanted to engage scientists who know these families to write these annotations for us. We were thinking about creating our own Wiki, or about publishing a book, where each family would be described by an expert in a separate chapter, but we decided to go along the Wikipedia route, which was pretty original at the time. The reason for choosing this option was because we were impressed at how good the articles there were about very broad range of subjects.
So we decided to try to use the Wikipedia machinery to manage annotations about RNA families. When we started, there were very few articles about RNAs on Wikipedia, so we had to generate many hundreds of new Wikipedia articles, and we had to really engage with the Wikipedia community to make sure that they were happy with that. But there were enormous benefits from working with this community.
If we had created our own Wiki we may have had maybe several thousands of people reading the articles and probably only a small percentage of them would have edited articles. On Wikipedia you have hundreds of millions of people reading articles every day. And you need only a very small fraction of them to edit content in order to get a really large number of edits.
I want to stress that this project was a real adventure for me and I really enjoyed learning more about Wikipedia. It was really important for me when I realized, aside from our project, how important Wikipedia is in communicating science to the public. When you search Google for almost any scientific query, Wikipedia is very high in search results. And when non-experts see anything about science in the news, they type it in Google and they end up at Wikipedia, and this is how they are informed about your particular area of science. This shows how important it is for scientists to engage in Wikipedia and to improve its content to make it up to date. And this is why I try to encourage all scientists to edit Wikipedia articles.
Are you and your team editing Wikipedia daily as a part of your work?
Yes we do. I made about 4000 edits on Wikipedia over the recent years. Now there are a lot of RNA articles on Wikipedia that may not be perfect, but are good enough as a starting point, but we could always use more edits from experts.
You use Wikipedia to reach your scientific goal, which is not exactly the same as the goal of the Wikipedia community. You aim to classify all RNA sequences. This seems to be more of a detailed project than people usually associate with the notion of an encyclopedia. Does this create any problems in your work?
We were quite careful about the way we were working in Wikipedia. At the very beginning we consulted with the community and we received a lot of help for our project. We were discussing whether our articles are important for Wikipedia or not. We contacted the WikiProject Molecular and Cellular Biology, and several people from this project decided to help us in migrating content from Rfam to Wikipedia, so at the very beginning Wikipedia got a lot of content from us.
I think that these articles are encyclopedic, but yes, for sure I can think of a lot of other scientific projects that would not meet these criteria. Probably Wikipedia will never have an article about every possible polymorphism of every gene, but I think there is an agreement that every human gene should be described there, yet articles are only created for genes that have been studied experimentally.
So, we had to adjust our scientific goals to the needs of the Wikipedia community. We are not in control of the RNA family annotations. We got enormous benefits, but at the price of losing control of the annotations. Neither we, nor anybody else, own any of the articles about RNAs on Wikipedia. So when the Wikipedia community decides that part of our articles are no longer notable, they can delete them. What we are showing in the Rfam database is simply Wikipedia, so this community influences an important Rfam. And what actually happens sometimes is that for example several articles are merged into one, because they cover similar topics, and that is fine with us.
What should a scientist know about Wikipedia before he or she starts a project similar to yours?
There are a lot of scientists that were not successful in working with Wikipedia and edits they made were reverted by the community. This is because there is a tension between scientists who know everything about science but know nothing about the rules of editing in Wikipedia and its etiquette, and Wikipedia editors who know everything about Wikipedia and very little about the science. So it is a challenge to get these people to work together.
Some researchers also try to use Wikipedia to promote their datasets or research. Some also write articles about themselves, which is something they should not do. This is because they are not aware of what Wikipedia is. When an editor sees that the only contribution made by an author is to add links to research from his or her group it is very easy to spot this kind of activity. And these edits are very often reverted.
The reason why we are successful is that we really care about Wikipedia and we want to improve it. So scientists who want to work with Wikipedia should take some time to familiarize themselves with this project and decide whether they want to become a part of it or not.
What is common for both communities is that they like references very much, they want to add references to everything, so I think the cooperation is possible and necessary.
You claimed that only 1% of the edits in RNA articles are made by vandals, but how many of them could be simply incorrect or misleading? Is Wikipedia good enough to use as source of scientific information, for example by biology students?
When you are a professional researcher and you see that something is wrong on Wikipedia you can easily improve it, which is not the case for plenty of other websites dealing with science that people can access and read. Other websites often have no editorial control at all. Wikipedia is simply the best solution to communicate with people who do not read research papers independently.
I think that for most topics the accuracy of Wikipedia is generally good enough, and it can be understood by high school students, which is also a plus. And of course, when experts use Wikipedia they can easily spot mistakes, but experts can do the same in published journal articles. Every day I read papers in my discipline with incorrect information, so that is really nothing very new.
There are vandals, who think that it is fun to add bad content to Wikipedia. We estimated that about 1% of edits are made by vandals, but this is just a raw estimation only based on our articles. It might be 0,5%, 2% or 5%, but I do not think it is as high as 5%. The Wikipedia community is amazingly fast in reverting vandalism. Most of it is reverted within minutes.
For biology students, Wikipedia might be seen as a good starting point. Wikipedia has a lot of references and it is easy to find the original sources for almost all the information there. Although, I think it is important for students to be skeptical about all the information they find on the Internet. On the other hand, Wikipedia is better for them than any other website, as there is always a link to the original research.
I can also tell you how I use Wikipedia. Usually, if I think I know something but I am not sure about it, I check if Wikipedia agrees with me or not. If it does, I think I am probably right. If it does not, it is a signal for me that I have to search a bit deeper.
Do you think that other researchers also use Wikipedia in their work? Maybe they only search for information in peer-reviewed journals?
I think that almost everyone uses Wikipedia at some point in his or her work, maybe with the exception of some seniors. Some may use it only to do quick and simple fact checking. I think that the technical audience uses Wikipedia to find their way to relevant information. And then there is a group of people who use it as a major source of information.
Wikipedia might therefore be an important source of traffic to original research papers or data.
15% of Rfam traffic comes from Wikipedia. Our other databases also receive a significant proportion of their entries from there. But a link on Wikipedia has to be important and relevant, otherwise it will be deleted.
Your works are linked on Wikipedia. Have you noticed the positive impact of Wikipedia on your citations?
I think that the Wikipedia audience is much bigger than people who write and cite scientific literature. And I am not sure whether scientists are looking for articles to cite on Wikipedia. I personally don’t. I cannot say if my Wikipedia involvement impacted my own citation records, although I have been lucky in being cited a lot.
Why do scientists edit Wikipedia? In your opinion what are their motivations?
I think that contributing to a common good is a major motivation for Wikipedia editors. The Wikipedia community can be a very positive environment, so this is also a big driving force for a lot of the major editors of Wikipedia. There are also people who just want to correct some information. There are a large number of people who just make a small number of edits each, but when you count them altogether, it is a huge input to Wikipedia. We can see that in the case of our articles there are many people who just edited them once or twice. Some people also want to promote their own work and it is a bit frustrating because sometimes they add good content, but occasionally they add things that are not so relevant to the topic.
Do you think that editing on Wikipedia should be seen as a valuable academic contribution, like publishing articles, and that it should be taken into account when considering academic promotion?
Yes, absolutely. Even now, when you are writing a grant application there is sometimes a section about how you plan to engage with the public. I write about my planned involvement in editing Wikipedia there. I think it something that everyone can do, and it could help to get funding, because Wikipedia is a much better way to share your results with the public than public lectures and scientific roadshows, etc. It reaches many more people and it is a lasting source of knowledge.
I think it would be also good if editing Wikipedia was seen as something that should be added to one’s CV. Some people add information about their major contributions to important Wikipedia articles to their CVs, but I am sure that there are many tenure panels that would not care about that, and this is a shame.
But in general, I think the scientific community has a positive attitude toward Wikipedia. I do not know any people who would say that no one should use Wikipedia for anything. And probably even people who think that Wikipedia has a problem with factual errors would say that improving its content is a good idea. So this is something that should be mentioned in our CVs.
What do you want to do with data collected through Wikipedia? How might it be analyzed? Do you want to use any content mining techniques? What is it going to look like?
This question is related to one of the fundamental problems of Wikipedia, that it is not structured. And for Rfam it is not a problem, since we only need annotation and we did not plan to mine it. There are other groups who mine Wikipedia, but this is not an easy thing to do.
Do you think that Wikipedia might be used by other biologists to solve other scientific problems? Maybe Wikipedia will become a core part of the process of creating academic knowledge?
We occasionally get contacted by other groups who want to work with Wikipedia. For each one of these we have a discussion about whether the kind of data item they want to add to Wikipedia is relevant, or whether the community will consider it as notable or not. And I think that not every project matches the Wikipedia criteria and not everything should be done in this way. Wikipedia is not the right place to store all the knowledge we have. For example, it is not a place for publishing negative results. Wikipedia needs to be really nice summary of the most important facts and not a complete source of information.
But Wikipedia usage is a good option for some groups. When a lot of databases use articles from Wikipedia as a source, all of them are improved when somebody edits a Wikipedia entry. And when our group edits a Wikipedia entry all groups that use the same article benefit from that.
Wikipedia makes knowledge crowd-sourced. Each Wikipedia article has multiple authors and some of them are anonymous. Do you think it would help to make science more collaborative and less concentrated on individual achievements?
Yes, Wikipedia is quite anonymous. You can find out who is the main author of the entry, but it is not displayed somewhere at the top of the page. And probably some scientists find this discouraging. But I think it would be good to have more of a Wikipedia-like collaborative attitude in science. And now we have science 2.0 coming, which is open science, and it is more about people coming together with their various expertise and solving problems in a collaborative way. I think it might work really well and it may bring some really interesting results. I am looking forward to seeing it happen more frequently. Now it is happening, for example, on social media, and I think that Wikipedia also has some small impact on that.
Do you think this form of academic output may replace publishing in academic journals?
I think our current system works to some extent. There are some problems that have to be fixed. It should be faster and cheaper, we need also to fix the problem of the reproducibility of research. I am also not sure about the current peer review model, whether it is working well or not. Open, post-publication peer review seems to be a good alternative. I think we will see a lot of changes in the near future.
What should be done first?
I think that review should be shared across all journals. Some papers bounce between journals and the same paper is sometimes reviewed multiple times, sometimes by the same person for a different journal. It would be better to simply pass the reviews along together with the paper. I think it is ridiculous that so much of a reviewer’s time is wasted.
Image: Preprinted page from a 1930s stamp album with printed spaces for non-specific mounting and includes information on each country, Public Domain image from Wikipedia.