Bird Studies Canada
Cornell Lab of Ornithology
Project name: Project FeederWatch
Data Access Policy
Researchers seeking to conduct formal analyses using FeederWatch data can access the raw data as outlined below. As with use of any data set, knowing the data structure, understanding the metadata, grasping the data collection protocols, and being cognizant of the unique aspects of the program are all critical for conducting analyses and interpreting results in ways that provide meaningful insights. Although the data are freely available, we strongly encourage researchers to consult with researchers at the Cornell Lab of Ornithology or Bird Studies Canada (contact information below) to ensure that the data are being handled and analyzed in a meaningful way.
Below are a series of considerations to which we specifically want to draw the attention of anyone interested in analysis of FeederWatch data. Other considerations, not listed here, may apply with some specific uses of the data.
Note that it is impossible to validate each of the millions of records submitted to FeederWatch. This problem is shared by all large-scale citizen science programs. Although we attempt to minimize errors, a small percentage of FeederWatch reports are incorrect and analysts must be aware that misidentifications, data entry errors, and other sources of error can evade our data validation system.
All FeederWatch data are passed through a series of geographically and temporally specific filters that “flag” reports of species (or high counts) that are unexpected at a given location. The geographic resolution is relatively coarse, with one filter per state/province, and the temporal resolution of filters is calendar months. Only reports that are flagged by the filters undergo a systematic manual review. A flag may be removed by the expert reviewer without a request for supporting information, or additional evidence may be requested. If additional information is requested but is insufficient to validate the report, that record remains in the database and is identified as an unconfirmed report. Flagged records are identified using a combination of the Valid field and the Reviewed field as defined in this table:
|Report did not trigger the automatic flagging system and was accepted into the database without review
|Report triggered the flagging system and was approved by an expert reviewer
|Report triggered a flag by the automated system and was reviewed; insufficient evidence was provided to confirm the report
|Report triggered a flag by the automated system and awaits the review process
Potential Errors Untrapped by Automated Filters:
Note that the flagging system does not identify all potential errors. For instance, if a species is misidentified as another species that could occur in the region, that report will not be flagged for review. In other words, a Downy Woodpecker may be misidentified as a Hairy Woodpecker — we do not have a mechanism for identifying these errors. As such, we recommend that data analysts carefully consider which species are included in their analyses, and we often lump difficult-to-distinguish species in our analyses. For instance, Carolina Chickadee and Black-capped Chickadee reports are analyzed as “chickadee species” in regions of geographic overlap. Similar lumping is suggested for Sharp-shinned and Cooper’s Hawks (Accipiter sp.), and for House, Purple, and Cassin’s Finch (Carpodacus sp.).
Additionally, mis-recording errors can mimic misidentification errors. Participants may intend to report one species but enter their information for the wrong species. The evolution of the data-entry process has created designs for paper forms and web pages that minimize the likelihood of such errors. Nevertheless, such errors are possible.
While we know that errors exist in the data, our experience based on handling and use of these data lead us to believe that such errors are generally minimal, and that biologically-real patterns will emerge from analysis of these data. All large data sets contain errors. We strive to minimize such errors, but nevertheless advise anyone analyzing these data to handle, analyze and interpret these data with the understanding that these data are not perfect.
As with any monitoring data, a recorded observation is a function of both the biological event (number of species actually present) and the observation process (probability that an individual, when present, will be observed). Detection probabilities can be formally estimated with FeederWatch data (see Zuckerberg et al. 2011 paper in list of FeederWatch publications) under some circumstances. When this cannot be done, we strongly suggest that analysts minimally include predictors of the observation process, the effort expended by participants (number of half-days and/or number of hours of observation), as predictors in their statistical models, in order to describe increasing probabilities of observing birds with increasing time spent in making observations.
Our unique dataset is completely dependent on the efforts of our network of volunteer participants. We ask that all data analysts give credit to the thousands of participants who have made FeederWatch possible, as well as to Bird Studies Canada and the Cornell Lab of Ornithology for developing and managing the program.
We recommend that BSC data accessed from NatureCounts be cited as follow:
Bird Studies Canada and Cornell Lab of Ornithology. 2008. Project FeederWatch. Data accessed
from NatureCounts, a node of the Avian Knowledge Network, Bird Studies Canada.
Available: http://www.naturecounts.ca/. Accessed: <<date>>.
In addition, we require that the following statement be included in the acknowledgement
section of any publication:
We thank Bird Studies Canada and Cornell Lab of Ornithology for supplying
Project FeederWatch data, and all of the
volunteer participants who gathered data for the project.
If data retrieved from NatureCounts are to be used for web-based products (maps, summaries, etc.),
you must contact us(firstname.lastname@example.org) for
instructions on applicable contributor/partner logo(s) and website link(s).
Scientific Publications as a Resource for Analysts:
Analysts will find previous publications informative in providing more detailed information on the process of analyzing FeederWatch data. See a list of scientific articles using FeederWatch data.
FeederWatch participants are identified in the database by their unique Cornell Lab of Ornithology or Bird Studies Canada identification number. We do not share names, addresses, contact information, or any personal information about our participants without express permission from each individual participant. For confirmed rare bird reports, we do post the name, city, and state of the observer along with the report on the FeederWatch website. We will not post this information without first contacting the observer and we will withhold any such reports from public view when asked.
Consulting with Cornell or BSC Staff:
Analyzing large data sets is complicated, and requires skill in both conducting the analyses themselves but also in manipulating the data into the appropriate form for analysis. These data are best analyzed in collaboration or consultation with CLO or BSC research staff who have experience working with FeederWatch data. Please note that our resources are limited, so the responsiveness and extent of support may be constrained. We will do our best to meet all requests as time and resources permit. We will concentrate our efforts on answering questions about the general processes of analyzing data from FeederWatch rather than the mechanics of using specific software to work through this process. Suggested contacts include:
David Bonter, Asst. Director Citizen Science, Cornell Lab of Ornithology: dnb23 at cornell.edu
Wesley Hochachka, Senior Research Associate, Cornell Lab of Ornithology: whm6 at cornell.edu
Denis Lepage, Senior Scientist, Bird Studies Canada: dlepage at bsc-eoc.org
Complete datasets (presence and absence data) with supplementary data are available upon request from the staff listed above. Please note that data files are large (> 1.8 million checklists) so you must be able to use advanced database (e.g. MySQL, Microsoft Access) and statistical software (e.g. SAS or R) in order to handle the data. Please also note that data extraction takes staff time, so be clear and specific as possible when requesting data, and patient in waiting for data to be retrieved.
Redistribution of the data
Without specific approval, no data provided by BSC can be distributed beyond the researchers/authors
requesting the data. Raw data should not be made available for download through
the Internet, but derived data (maps, tables) are usually acceptable. If you
plan to present data on the Internet, the nature of the visualization(s) must be
specified in your data request.
Fee for Data
BSC data accessed through the NatureCounts web site do not normally entail any data processing fees.
Requests that require additional manual data extractions or analyses will
typically be charged a fee of CA$75 per hour, CA$500 per day or CA$1500 per week
to extract and assemble data. BSC will consider waiving this fee, in part or in
full, for project partners and not-for-profit researchers/authors.
Authorship and Acknowledgements
If a request for data includes proposed research with significant BSC staff contributions,
then agreement on the role of involved staff, joint authorship, and order of
authors on publications must be reached before the data are released.
The researchers/authors agree to acknowledge BSC (as well as the clients or partners
when applicable) in all relevant publications using the statement provided
below. BSC review of manuscripts (or parts thereof) before submission to
journals is requested as a courtesy, especially in cases where BSC or client
policy or operating practice is a subject of discussion or conclusions.
BSC and CLO retain the copyright
on all data PFW provided to researchers/authors.
Revocation of Privileges
Any researcher/author not
abiding by this policy or the terms of the Data Request and Release Form and
Publication Agreement may forfeit any future access to NatureCounts data.