Last updated: August 16 2012.
Batch evaluations are designed to be conducted fully automatically. They include at least the following:
In general, you can do this either by adopting someone else's evaluation design and/or evaluation resources or by creating your own. For example, there are existing test collections for expert finding in both the TREC Enterprise Track and at the University of Amsterdam (see http://staff.science.uva.nl/~kbalog/). The TREC Enterprise Track also has a mailing list search task . In some cases, you may want to draw inspiration from what they did, in other cases you may want to go beyond that and actually use an existing test collection rather than creating your own.
The best way to see what an evaluation design looks like is to read a TREC, CLEF, or NTCIR track overview paper. Here are a couple of examples:
One thing you might want to think about is how you plan to divide your evaluation resources to support both formative and summative evaluation. You need some evaluation data to support development, but testing on your training set is a cardinal sin. So you'll want to divide your available data in some way to allow you to later demonstrate your (hopefully) excellent results on a previously unseen part of the test collection.
This assignment will be graded, but (as with all the pieces) the overall project grade will be assigned holistically rather than being determined by a fixed formula.
Acknowledgement to Doug Oard (LBSC 796/INFM 718R Spring 2011).