Reminders! Reminders! Lots of ways to get involved!
Three great ways to get involved with the VertNet community.
All deadlines are January 11, 2013, 11:59pm Pacific Time.
Biodiversity Informatics Training Workshop
During the five-day course, participants will work closely with trainers to address compelling biodiversity research questions and focus on the entire scope of a research project, including data acquisition, tools for data evaluation, analysis, and project dissemination and outreach. Participants are expected to bring high levels of motivation and desire to learn the fundamentals of biodiversity informatics and become users of cutting edge tools in GIS and modeling although not necessarily expected to have experience in them. The workshop will…
Two interns positions are available; one at the University of California, Berkeley, and one at the University of Colorado in Boulder. Interns will work with the VertNet project team of experienced museum curators, researchers, and informaticists, as well as individuals from VertNet integration partners, to design and conduct a research project…
At-Large Steering Committee Positions
The term for each at-large member will be one (1) year with the opportunity for renewal at the completion of the term. At-large members should have significant experience in at least one of the following:
- collection management
- informatics development and big data
- community and academic outreach
- biodiversity-oriented research
- business management and sustainability
- grant writing and fundraising
VertNet welcomes applications from members of the private, non-profit, academic, governmental, and broad biodiversity communities. Applicants may be…
VertNet, Creative Commons, and Data Licensing and Waivers
VertNet is faced regularly with issues involving what data can be shared via its portals, who uses the data, how they use it, and how credit is given to the institutions sharing their data. As we work to build a better network we need to make decisions about how we, and all of our data publishers, will make data available to the public.
Thankfully, Peter Desmet and our friends at Canadensys have put together an excellent primer on the options available to data networks like ours. They kindly posted the results of their thinking on their blog (27Jan2012), but we believe this is important enough to re-post here.
You can view the original post at the Canadensys Blog, along with all of the comments it received. We’ve modified the orignal slightly to fit our formatting.
____________________________
With the first datasets getting published and more coming soon, the issue comes up under what license we – the Canadensys community and the individual collections – will publish our data. Dealing with the legal stuff can be tedious, which is why we have looked into this issue with the Canadensys Steering Committee & Science and Technology Advisory Board before opening the discussion to the whole community.
By data we mean specimen, observation or checklist datasets published as a Darwin Core Archive and any derivatives. To keep the discussion focused, this does not include pictures or software code.
2012.01.30 – Update to post: technically CC0 is not a license, but a waiver.
What we hope to achieve
- One license for the whole Canadensys community, which is easier for aggregation and sends a strong message as one community.
- An existing license, because we don’t want to write our own legal documents.
- An open license, allowing our data to be really used.
- A clear license, so users can focus on doing great research with the data, instead of figuring out the fine print.
- Giving credit where credit is due.
Our recommendation
We recommend Canadensys participants to publish their data under Creative Commons Zero (CC0). With CC0 you waive any copyright you might have over the data(set) and dedicate it to the public domain. Users can copy, use, modify and distribute the data without asking your permission. You cannot be held liable for any (mis)use of the data either.
CC0 is recommended for data and databases and is used by hundreds of organizations. It is especially recommended for scientific data and thus encouraged by Pensoft (see their guidelines for biodiversity data papers) and Nature (see this opinion piece). Although CC0 doesn’t legally require users of the data to cite the source, it does not take away the moral responsibility to give attribution, as is common in scientific research (more about that below).
Why would I waive my copyright?
For starters, there’s very little copyright to be had in our data, datasets and databases. Copyright only applies to creative content and 99% of our data are facts, which cannot be copyrighted. We do hold copyright over some text in remarks fields, the data format or database model we chose/created, and pictures. If we consider a Darwin Core Archive (which is how we are publishing our data) the creative content is even further reduced: the data format is a standard and we only provide a link to pictures, not the pictures themselves.
Figuring out where the facts stop and where the (copyrightable) creative content begins can already be difficult for the content owner, so imagine what a legal nightmare it can become for the user. On top of that different rules are used in different countries. Publishing our data under CC0 removes any ambiguity and red tape. We waive any copyright we might have had over the creative content and our data gets the legal status of public domain. It can no longer be copyrighted by anyone.
Can’t we use another license?
Let’s go over the options. Keep in mind that these licenses only apply to the creative aspect of the dataset, not the facts. But as pointed out above, figuring this out can be difficult or impossible for the user. So much so in fact, that the user may decide not to use the data at all, especially if they think they might not meet the conditions of the license.
All rights reserved
The user cannot use the data(set) without the permission of the owner.
Conclusion: Not good.
Open Data Commons Public Domain Dedication and License (PDDL)
There are no restrictions on how to use the data. This license is very similar to CC0.
Conclusion: Perfect, in fact this license was a precursor of CC0, but… it is less well known and maybe not as legally thorough as CC0. CC0 made a huge effort to cover legislation in almost all countries and the Creative Commons community is working hard to improve this even further. Therefore, if you have to choose, CC0 is probably better.
Creative Commons Attribution-NoDerivs (CC BY-ND)
The user cannot build upon the data(set), which is what most data use involves.
Conclusion: Not good, and sadly used by theplantlist.org. Roderic Page pointed this out by showing what cool things he can NOT do with the data.
Creative Commons Attribution-NonCommercial (CC BY-NC)
The user cannot use the data(set) for commercial purposes. This seems fine from an academic viewpoint, but the license is a lot more restrictive than intuitively thought. See: Hagedorn, G. et al. ZooKeys 150 (2011). Creative Commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information.
Conclusion: Not good.
Creative Commons Attribution-ShareAlike (CC BY-SA) or Open Data Commons Open Database License (ODbL)
The user has to share any work based upon the data(set) under under a license that is identical or similar to the one used.
Conclusion: Good, but… this can lead to some problems for an aggregator like Canadensys or GBIF: if they are mixing and merging data with different SA licenses, which one do they choose? They might be incompatible.
Creative Commons Attribution (CC BY) or Open Data Commons Attribution License (ODC-By)
The user has to attribute the data(set) in the manner specified by the owner. This condition is also present in the three licenses above.
Conclusion: Good, but… this can lead to impractical “attribution stacking”. If an aggregator or a user of that aggregator is using and integrating different datasets provided under a BY license, they legally have to cite the owner for each and every one of those in the manner specified by these owners (again, for the potential creative content in the data). See point 5.3 at the bottom of this Creative Commons page for a better explanation and this blog post for an example.
But giving credit is a good thing!
Absolutely, but legally enforcing it can lead to the opposite affect: a user may decide not to use the data out of fear of not completely complying with the license (see paragraph above). As hinted at the beginning of this post, CC0 removes the drastic legally enforceable requirement to give attribution, but it does not remove the moral obligation to give attribution. In fact, this has been the common practice in scientific research for many decades: legally, you don’t have to cite the research/data you’re using, but not doing so could be considered plagiarism, which would compromise your reputation and the credibility of your work.
To encourage users to give credit where credit is due, we propose to create Canadensys norms. Norms are not a legal document (see an example here), but a “code of conduct” where we declare how we would like users to use, share and cite our data, and how they can participate. We can explain how one could cite an individual specimen, a collection, a dataset or an aggregated “Canadensys” download. We can point out that our data are constantly being corrected or added to, so it is useful to keep coming back to the original repository and not to a secondary repository that may not have been updated. In addition to that, we can build tools to monitor downloads or automatically create an adequate citation. And with the arrival of data papers – which drafts can now be automatically generated from IPT – data(sets) are really brought into the realm of traditional publishing and the associated scientific recognition.
Conclusion
All this to say that there are mechanisms where both users and data owners can benefit, without the legal burden. CC0 + norms guarantees that our data can be used now and in the future. I for one will update the license for our Université de Montréal Biodiversity Centre datasets. We hope you will join us!
Thanks to the Gregor Hagedorn for his valuable advice on all the intricacies of data licensing.
Farewell, NBII
On January 15, 2012, the National Biological Information Infrastructure will be taken offline permanently.
All of us at VertNet view this event with mixed emotions. On the one hand, we understand the pressures of shrinking budgets and the difficulty in making decisions to prioritize the public services provided by a government entity, such as the USGS. On the other hand, we are sad to see this program, one with which we have been closely linked, go.
It is not an exaggeration to say that VertNet would not likely exist without the support of NBII. NBII staff participated in the creation of VertNet back in 2008 and have provided critical financial support to hire and sustain two full-time positions, the VertNet Coordinator and VertNet Programmer, ever since. Although NBII’s services will be discontinued or distributed to other departments at of the USGS, VertNet’s activities will continue with funding from the National Science Foundation.
We wish to thank the individuals at NBII and the USGS for this support over the last four years and we look forward to working with others at the USGS into the future.
VertNet Project Quarterly Update #2
The final quarter of 2011 was very productive for VertNet. We announced the call for applications for our first Biodiversity Informatics workshop, made some major decisions about the development of the VertNet platform, and, somehow, managed to produce four months worth of posts on the VertNet Blog.
Before we get into the details of our recent progress, we’d just like to remind you that you can follow our work or ask questions of us in a number of ways.
- Check out our blog and follow us on Facebook, Twitter, and Google+.
- Follow our development activity at GitHub.
- Talk with anyone on the VertNet Project Team or Steering Committee.
Now, on to the details…
Biodiversity Informatics Training Workshop
We’re closing in on the January 10th application deadline for VertNet’s first Biodiversity Informatics Training Workshop. The workshop, hosted by the University of Colorado, Boulder, will feature five days during which participants will work closely with trainers to address compelling biodiversity research questions, focusing on the entire scope of a research project, from initial data acquisition to tools for data evaluation to analysis and finally, dissemination and outreach. The workshop will include large and small group exercises on a common curriculum as well as the opportunity for participants to discuss and explore individual research questions with trainers.
By the end of the week, participants should leave the workshop with:
- an understanding of the work flows needed to acquire, analyze and report results generated from biodiversity resources found in data repositories such as VertNet.
- a set of basic skills to use data repositories and informatics and analytic tools, and understand which tools are appropriate for tasks
- knowledge of the abundant resources and additional training available to them.
You can learn more about the workshop and who should apply on VertNet.org.
VertNet Development
A cornerstone of the VertNet mission is to deliver a cloud-based platform upon which data publishers (i.e., institutions providing data) can store growing quantities of biodiversity data that can be accessed and enhanced via the Web. An important measure of our success will be how effectively this new platform can overcome the core technical challenges of scalability, data discovery, sustainability, and integration with other platforms. We will continue to work with our integration partners on evaluating platforms and, if all goes according to plan, we’ll open up testing to existing data publishers and others interested in our work.
In this quarter, the development team made some critical decisions about the design of our technical solutions. A key design principle in our decision-making process requires us to seek a balance between the most cost effective (i.e., sustainable) and flexible (i.e., scalable) software and the potential for development of robust tools and innovations that maximize data discovery. We plan to optimize sustainability and scalability by lowering maintenance costs and by simplifying the VertNet architecture so that data publishers can deploy VertNet anywhere, using any system.
In July of 2011, we built a prototype platform using the Google App Engine cloud, Python, and SQLite. The prototype, available on GitHub, was composed of a data bulkloading script used for uploading records, and an application programming interface (API) for searching them. This prototype was tested by our VertNet integration partners (e.g., AmphibiaWeb, Arctos, GeoLocate), who provided excellent feedback about usability, text matching queries, and support for Darwin Core Archives.
Coincidentally, while our prototype was being tested, Google App Engine announced a new service level agreement and pricing model. Because this change in pricing would likely increase our estimated annual operating cost, we proceeded to identify different cloud-based alternatives to our original plan.
After further research and several productive conversations with our integration partners, we decided to explore the cloud-based CouchDB and HTML5. CouchDB is intriguing because it is both a web server and a database server with strong replication support. It would allow us to deploy VertNet data and HTML5 applications in a highly flexible, scalable, and sustainable way. Plus, CouchDB would give our integration partners a way to extend VertNet with custom functionality specific to their needs.
In September 2011, we built a new prototype using CouchDB. This prototype was composed of a browser-based bulkloader and a search API hosted in the cloud. The prototype and technical architecture were presented at the 2011 Biodiversity Information Standards (TDWG) conference in New Orleans. So far, we are very pleased with the results of our testing with CouchDB and are encouraged by the potential to address our technical challenges.
Thanks for keeping your eyes on us. More exciting things are coming in 2012. If you’ve got ideas, suggestions, or questions, feel free to send us a note or comment.
Posted by Dave Bloom, VertNet Coordinator, and Aaron Steele, VertNet Information Architect, on behalf of the VertNet Team.
AIM-UP!: Advancing the use of museum collections
This guest post was written by Dr. Joseph Cook (Univ. or New Mexico) and Dr. Eileen Lacy (UC Berkeley) on behalf of the Advancing Integration of Museums into Undergraduate Programs (AIM-UP!) Research Coordination Network.
Those familiar with VertNet are well aware of the importance of museums and museum data for research. Perhaps less immediately apparent is the vital role that museum collections can play in undergraduate education. Even a quick glance at a few specimens is typically enough to generate numerous student questions regarding the nature of museum collections and the reasons for the vast organismal diversity captured by museum specimens. Now, with specimen data increasingly available online, the power of natural history collections to excite and to inform students extends to institutions that lack physical specimens.
To further the use of natural history collections in undergraduate education, curators from the Museum of Southwestern Biology (University of New Mexico), the Museum of Vertebrate Zoology (UC Berkeley), the Museum of the North (University of Alaska), and the Museum of Comparative Zoology (Harvard University) have teamed up to create AIM-UP!, an NSF-sponsored network of museum scientists, collection specialists, undergraduate instructors, and artists dedicated to using museum data to promote undergraduate understanding of science.
In particular, AIM-UP! encourages undergraduate educators and students to explore the treasure trove of information available through natural history collections and their associated databases and data linkages. To facilitate this goal, the AIM-UP! network is working to develop new ways of incorporating the extensive archives and cyberinfrastructure of natural history museums into undergraduate education. These efforts focus on the following five themes:
- Integrative Inventories and Coevolving Communities: Exploring Complex Biotic Associations Across Space and Time
- Decoding Diversity: Making Sense of Geographic Variation
- Generating Genotypes: Evolutionary Dynamics of Genomes
- Fast Forward: Biotic Response to Climate Change
- Coevolving Communities and the Human Dimension
AIM-UP! rationale: Many natural history museums associated with academic institutions engage students in learning through specimen-based field projects and training opportunities related to the curatorial process. These experiences are often transformative, as witnessed by the large number of influential environmental and evolutionary biologists who cite their early exposure to natural history collections as pivotal to their career path. Such experiences, however, are necessarily limited to students at institutions with collections and, even then, the percentage of students who take advantage of such opportunities is often small. How do we extend these formative experiences to reach a broader swath of the next generation of scientists?
By digitizing specimens, it has now become possible for anyone with access to the Internet to explore the vast reservoirs of information held in collections. What can students and instructors do with all the newly available data, images, recordings and other associated information? Can we encourage educators to use these increasingly comprehensive natural history databases to engage students in inquiry-based projects and activities? Will educational use of these databases stimulate greater public interest in our natural surroundings and in the dwindling wild places on earth? In short, how do we begin to incorporate the vast online digital databases now available into critically needed educational initiatives?
AIM-UP! is addressing these questions through the development of educational modules that build upon natural history collections and associated databases to make such information accessible to instructors in multiple biological disciplines, including those (e.g., developmental biology, behavior, physiology, and cellular biology) that may not typically use museum collections. The modules provide inquiry-based learning experiences for undergraduates (including students in AP High School Biology Courses) that are built upon the informatics tools and natural history specimen databases now readily accessible online e.g., VertNet, GenBank, BerkeleyMapper, MorphBank).
A few examples of educational modules already developed (or currently in progress) include:
- Getting Started With On-Line Specimen Databases
- Climate Change—Sierras, Great Lakes
- Geographic Variation in Bird Song Dialects
- Virtual Herbaria
- GenBank & Museum Specimens: phylogeny and phylogeography
AIM-UP! goals: By integrating our expertise and experiences with university-based museums, we seek to greatly advance traditional and emerging fields that could use museum collections. Inclusion of participants from federal agencies, large free-standing museums, and leading educators from Latin America are ensuring wider dissemination of our educational products.
Upcoming activities: In Spring 2012, AIM-UP! will present a semester-long seminar exploring Geographic Variation and will include a series of 2-day workshops and a cross-listed course with the Art and Ecology Program and Biology Department at the University of New Mexico. This course will be broadcast to the Museum of the North at the University of Alaska, Museum of Vertebrate Zoology, University of California Berkeley, and Museum of Comparative Zoology, Harvard University.
To learn more about AIM-UP! or the upcoming seminar, visit http://www.aim-up.org/.
Memory and mobilization: A reminiscence on data sharing
This guest post was written by Ted Papenfuss, Senior Staff Researcher, at the Museum of Vertebrate Zoology, UC Berkeley.
As I sit here writing this blog, I think back over my 40 years of association with the Museum of Vertebrate Zoology (MVZ). I may be a “slow learner” when it comes to technology, but I realize that online searchable databases such as VertNet have revolutionized the quality and the ease of accomplishing our research.
Starting in 1971 as a first-year graduate student, I assisted with curating and cataloguing specimens that I collected on field trips to Mexico and Guatemala. There were no computers in MVZ and even electric typewriters were rare. Specimen tags were written by hand in permanent ink and locality cards were written on pre-printed index cards following the exact system that was developed by our founder, Joseph Grinnell, in 1908. There are over 24,000 cards filed in our “Herp Lab,” spanning 95 years until the last hand-written entry was made on 13 August 2003 for herp specimen 240,725, an amphisbaenid reptile, Bipes canaliculatus, collected in Mexico. Written in pencil at the top of the last card are the words “entered into Sybil 13 Oct. 2003.” Sybil was an early database used at MVZ that was released to the public in 1999.
Fig. 1. Specimen #1 is a Collard Lizard from Southern California. Note that two taxonomic updates were written directly on the card; one in 1980 and a second update in 2002. This was the method used to update taxonomy for 95 years.
Fig. 2. Catalogue card listing snakes that I collected in Guatemala in 1976.
Fig. 3. Last hand-written herpetology catalogue card. Note written notation “Entered into Sybil 13 Oct 2003 RG” (Rochelle Germano was an undergraduate curatorial assistant). This ended the “paper trail” method of recording updates.
By the mid-1980s, desktop computers were introduced to MVZ and manuscript preparation and data analyses became much more efficient through their use. But it would still be a decade before the Internet was developed to the point of instant communication around the world.
When Professor Ermi Zhao, the most prominent herpetologist in the People’s Republic of China, visited the Museum of Vertebrate Zoology in 1988, I had no idea that his introduction to desktop publishing would lead to a worldwide awareness of herpetological research in China. He arrived soon after the MVZ bought a Mac Plus computer for general museum use. The computer was placed in a corner in the corridor that separated the front office area from the gallery where the collections and offices were located.
We had invited Professor Zhao to come to Berkeley to study the collection of amphibians and reptiles that we had collected together on expeditions to China during the previous years. In 1987 he became editor of a new journal, Chinese Herpetological Research. While at MVZ, we showed him how to use the Mac Plus for desktop publishing, and he was so impressed with the results that he asked us to publish the journal at MVZ. We agreed and after he returned to China we continued with the journal. We had our Mac Plus and Prof. Zhao had a Mac Plus that he brought from Berkeley to China. Manuscripts were edited and floppy disks were sent back and forth by “snail mail”, often taking a month round trip. Many of these manuscripts used specimen data found on our hand-written catalogue cards and the hand-written ledgers at the Chengdu Institute of Biology. Data exchange again required “snail mail” of photocopied hand-written records or, if urgently needed, by means of expensive faxes.
With the completion of the renovation of the Valley Life Science Building in 1994, the MVZ moved to our new site that was wired for high speed Internet. Soon, the major research institutions around the world were inter-connected and scientific collaboration with China took place literally “overnight,” only taking so long because of the different time zones.
Now that many museum collections have online searchable databases we can quickly access data even when at remote sites like small towns in Mexico or Guatemala. I find this extremely useful when searching for localities where collections were made years ago before GPS technology was available. Here at the MVZ, these old localities have been georeferenced and by using Arctos along with Berkeley Mapper, I can go to these sites directly while I am in the field.
Two years ago, I was called by a Mexican herpetologist, Gabriela Parra-Olea. She was driving through the mountains of Oaxaca, Mexico and she needed the locality for a species of Mexican salamander. While she was on her cell phone, I went into HerpNET, found the record and gave her both the georeferenced locality and road directions that had been recorded in the original card catalogue and later entered into Arctos. Less than five minutes elapsed from the start to the end of her phone call.
If such a request had been sent by mail to Joseph Grinnell in 1908, it would have easily taken a couple of months for the reply to reach Mexico.
Animal Diversity Web: Using vertebrate data to teach

Since its creation in 1995, the Animal Diversity Web (ADW) has become one of the Web’s most widely used resources for biodiversity information and educational tools and ideas. Currently it delivers 2 to 4 million pages per month to 300,000 to 400,000 visitors, over 70% of which identify themselves as coming to the site for educational purposes. A spin-off site, focused on the fauna of southeastern Michigan and re-written to make it easily accessible to kids, is a foundational part of the BioKIDS project and has been used widely in Detroit Public Schools.
The goal of the ADW is, and has always been, to create a database that supports student inquiry, a resource that would make it possible for students to discover patterns and processes underlying ecology, evolution, and conservation biology. To be truly effective, these activities require a relatively large database constructed to facilitate retrieving information of many kinds. The ADW’s approach to building this resource has been to recruit student authors. Students write species accounts using a template that guides them in their research and writing. The template enforces controlled vocabularies, requires the filling in of specific data fields, and asks students to write extensive descriptive text, all organized into a standard form. Data from keywords, data fields, and text fields are loaded into a MySQL database to facilitate querying (pages are generated on the fly as visitors ask to see them). Students and instructors have found that the preparation of a species account itself is a valuable learning experience. It teaches them how to find resources, evaluate the quality of on-line and traditionally-published materials, recognize and avoid plagiarism, and the process gives them an opportunity to hone their scientific writing skills. Students at over 100 institutions and in hundreds of courses have contributed more than 3400 species accounts to the ADW.
To take advantage of ADW data in undergraduate education, the ADW recently received support from NSF to develop a query tool (“Quaardvark”) and a library of inquiry activities useful in specific undergraduate biology courses. In early November 2011, the ADW hosted two workshops for ten biology faculty who traveled from across the U.S. to spend two days developing and testing inquiry activities for use in their courses. The results clearly showed the potential of using student exploration of these data in biology courses and helped us recognize and attack some of the problems associated with limited data.
One of the most important goals of the Quaardvark project is to find ways to reach out to external databases to expand the quality and quantity of data available for students to explore, using sources such as EOL, IUCN, NatureServe, and several others. Further, while the focus of ADW and these other sites is on species-level information, several other exciting efforts are underway to make specimen-level data from research collections available for student exploration, including VertNet, Aim-UP!, and CollectionsWeb. The ADW team is collaborating with these efforts to expand student inquiry opportunities.
The ADW team is excited to be part of the VertNet community. There are many possibilities for sharing data, expertise, and enthusiasm! We also look forward to using some of the proposed new VertNet tools, such as a map API and annotation functions, to expand opportunities for user interaction and to further enrich the student experience.
The guest post was written by Tanya Dewey, content expert, curriculum consultant and all around zookeeper at Animal Diversity Web.
Happy Holiday
Everyone at VertNet wishes you a happy holiday weekend. We’ll be back next week!
Gobble, gobble
Meleagris gallopavo
Where do the digital humanities and eScience intersect?
This guest post was written by Rob Guralnick and Andrea Thomer, from the blog So You Think You Can Digitize, and David Bloom, VertNet Coordinator. This post is cross-posted at So You Think You Can Digitize.
First and foremost, VertNet supports global biodiversity research. We suspect that the majority of its use will be to generate products such as species distribution models and change assessments, and to answer questions about what is in any given museum collection. However, in the broader context of academic endeavor, these data could also serve as a unique link between the digital sciences and the digital humanities. Work in the digital humanities includes everything from crowdsourcing manuscript transcription to humanistic fabrication to data mining — work that is not so dissimilar in method, description, or data type from that in the digital sciences.
Biological collections aren’t the only organizations engaged in massive digitization efforts; libraries and archives have been digitizing and making their materials discoverable and interoperable for decades as well. As a result of these efforts, an unprecedented number of research materials from a wide range of domains are now available for free on the Web. Just as VertNet does for biodiversity data, the University of Illinois’ Digital Collections and Content project does for cultural heritage records, the Australia National Library’s Trove for newspapers, articles, and music. The Hathi Trust makes more than 9 million books available — and the list goes on. Digitization allows these materials to be recombined and analyzed quickly and (relatively) easily in new ways.
Our question is a simple one: Where do the digital humanities and e-science overlap and interconnect? One method of digital investigation that caught our attention is the mapping of novels and other historic texts; researchers take prose text and mine it for mappable units. Erin Sells and her students, for instance, have used this method to create dynamic maps of Virginia Woolf’s Mrs. Dalloway, which incorporate “pictures, sounds, videos, and the text itself into the map.” Similarly, in the Google Ancient Places project, researchers mine archaeological and historical texts to create databases of georeferenced ancient locales which can then be mapped. Though these researchers are working with novels, they’re producing data in formats similar to those used for species occurrence records in databases such as VertNet.
This made us think: what sorts of questions could we ask of a data set composed of all kinds of georeferences — not just species occurrence records, but locations from history or works of fiction as well? If students of the humanities can create maps with such texture using similarly organized data sets, could they build on this richness by including analysis of the natural world as it existed at the time described in the novel? Perhaps searching on the VertNet portal (or GBIF or ALA) could provide a detailed list of vertebrate species and, with a little more work, the associated ranges of these species. Suddenly, the map of Mrs. Dalloway’s world, and the atmosphere of Clarissa’s party, can be enriched not only with human influence and creation, but by the natural environment, too. Conversely, data from diaries or other digitized sources could be mined for data about distributions of now-extinct species. Could these data be used as observations and published as records along with those from natural history collections?
We hope that VertNet will support interdisciplinary research in the science and the humanities by providing new avenues for deeper readings, and new ways to reconstruct real and imagined worlds. Where are the specimens that Lewis and Clark found on their expeditions and how do those link up with their journals (online already!!)? What about whale species described by Melville? How accurate are James Fenimore Cooper’s depictions of the animals Hawkeye and Cora encountered as they traveled through the Great Lakes? What does this accuracy or inaccuracy tell you about Cooper as an author? What about Thoreau’s notebooks of life at Walden Pond, and how have this iconic landscape and its animals and plants changed since his stay?
We also hope that other folks have more ideas about what new combinations of data and domains of inquiry are possible now that so many different sources of knowledge have been digitized. How can eScience support and enrich the digital humanities and vice-versa? What happens when images of specimens mix with drawings from the literature? Point-radius georeferences, for example, are easy enough to pull together from different sources — what further visualizations could be created with the combination of journals, books, and catalog ledgers? What further ways can we use data and smarts to bridge gaps between the sciences and the humanities?
We’re curious to know what ideas you have on these issues. Don’t forget to check out the So You Think You Can Digitize blog for a special contest on this digital intersection.
Arctos: A Community and a Collection Management Information System
This post was written by the Arctos and VertNet project teams.
When the VertNet team was preparing the proposal for submission to the National Science Foundation, we spent a lot of time discussing the question “What is VertNet?” After much thought and reflection, we all concluded that VertNet has two meanings: that of a community of data providers and users, and that of a data network. Both of those are equally important. In this way, VertNet and Arctos are very similar.
Arctos as a community: Arctos is a collaboration among natural history museums that includes 46 collections at 8 institutions (ca. 3M records total). Approximately half of those records are in a shared instance used by 7 institutions and 39 collections. The remaining specimens and collections are in a single instance at the Museum of Comparative Zoology, Harvard University. Arctos users form a strong community that contributes to data standards, application enhancements, and improved data quality through sharing of authorities for taxonomy, geography, people names, part types, and other data.
Arctos as a collection management information system: Arctos is a comprehensive collection management information system that integrates access to diverse types of vertebrate and non-vertebrate collections and data types (e.g., specimens, observations, tissues, parasites, stomach contents, documents such as fieldnotes, and media such as images, audio recordings, and video). In addition to rigorously displaying all that is known about a museum record, Arctos provides solutions to managing and integrating collections data with object tracking (via barcodes or RFID), transactions (loans, borrows, accessions, permits), geospatial information (coordinates and descriptive data), agents (people and organizations), and usage (publications, projects, and citations). Data are accessible to the public through Arctos’ strong web presence, and are continuously updated and available via a DiGIR provider to the taxon-based networks and the Global Biodiversity Information Facility (GBIF).
Arctos is deeply integrated with external internet resources. GEOLocate provides embedded, graphical one-click georeferencing for specimens and localities. Georeferenced localities are displayed on maps using BerkeleyMapper, Google Earth, or Google Maps. GenBank provides reciprocal linkages to specimens with sequence data. The Texas Advanced Computing Center (TACC) provides media hosting and processing, including automated Optical Character Recognition (OCR) processing of images with text such as herbarium sheets.
Arctos and VertNet: Arctos has been a major contributor of particularly rich, high-quality vertebrate data to MaNIS, HerpNET, ORNIS, and FishNet2. The integrated linkages between specimen or observational records, geospatial information, genetic data, and media provide both opportunities and challenges for VertNet. Arctos will take advantage of the new publishing capabilities of VertNet by communicating through Application Programming Interfaces (APIs), which will allow updates to be published as news feeds when they occur during data management. Applications subscribing to these news feeds will be notified of Arctos changes in real time. Conversely, Arctos will subscribe to and store data annotations generated by VertNet users, thus further enriching the original record. All of the institutions collaborating in Arctos will benefit at once from these added capabilities to the shared system. From the VertNet perspective, one of the development challenges will be how best to propagate integrated linkages such as in Arctos to the broader network of providers via the API and portal. This is something that we plan to address in the coming year.
In the future, we may start to think of Arctos and VertNet in a third way…as a fully integrated biodiversity repository accessible from intuitive portals where multidisciplinary questions can be addressed. One can imagine quickly accessing all the ecological observations that early century biologists made while they surveyed the Alaskan frontiers. Or ask, what did the morning chorus of birds sound like in a Guatemalan rain forest? And how has it changed? How often did spouses accompany field expeditions and what were their roles? How have genotypes and phenotypes shifted in murids? Where is most of the genetic variation in the Salamandridae? We can’t anticipate all the uses for the new Arctos-VertNet but can be confident of a rich potential.
