webigator is a tool for filtering and aggregating data from the web. It was developed based on our experience after the great East Japan earthquake of 2011, where there was a large amount of useful information on the web that got drowned out by an even larger amount of irrelevant information. The tool itself performs a keyword search over text data (such as the Twitter stream), and then uses machine learning techniques to filter out irrelevant information. It comes with a web interface that can be used by multiple users at the same time, allowing for collaberative construction of lists of useful information based on these results. You can read in more detail in this paper:

A Framework and Tool for Collaborative Extraction of Reliable Information
Graham Neubig, Shinsuke Mori, Masahiro Mizkami. Workshop on Language Processing and Crisis Information (LPCI). 2013.


Source Code: @github

The code of webigator is distributed according to the Eclipse Public License v 1.0, and can be distributed freely according to this license.


You can see a (probably) working demo of Webigator here: webigator demo.

Using Webigator

Setting up the Server

This section is only if you need to set up your own server. If you are using a server that someone else set up, you can skip this section.

The server works on Linux, and will probably work on Mac or Cygwin. Before running the program you need to install the boost and XML-RPC libraries. This can be done with your package manager, for example on Ubuntu:

sudo apt-get install libboost-all-dev libxmlrpc-c3-dev

You should also install the XML::RPC package for Perl:

cpan XML::RPC

Next, you can build the server:

autoreconf -i

If this works properly, you should be able to run:

src/bin/webigator --help

Setting up a task

Instructions are under construction.



If you are interested in participating in the webigator project, particularly tackling any of the challenges below, please send an email to neubig at gmail dot com.


There are a bunch of possible improvements that would be quite interesting and useful: