Automatic readability improvements of texts using Simple English WIkipedia
Find a file
2019-11-22 15:32:23 +01:00
client adapt server & client & use dependencies instead of dirty adhoc lib management for the play app 2014-10-28 15:52:24 +01:00
java Bump a few deps 2019-11-22 15:26:43 +01:00
report yet another typo 2014-07-09 06:28:35 +09:00
server adapt server & client & use dependencies instead of dirty adhoc lib management for the play app 2014-10-28 15:52:24 +01:00
slides add pdf slides of the defense 2014-10-30 16:45:19 +01:00
.gitignore remove things we don't use, includin parsing stuff 2014-10-28 11:19:59 +01:00
LICENSE mit license 2014-05-08 08:56:34 +09:00
Makefile delete branch after a deploy 2014-07-05 17:49:34 +09:00
README.md expand readme 2014-11-20 10:43:12 +01:00

Learning how to write through Simple Wikipedia contributors

The aim of this project is to investigate how to use Simple English Wikipedia to learn readability guidelines.

The idea used is to retrieve revision changes where contributors expressed explicitly in the revision comment their intent to improve the readability of the revised article.

Those changes are compiled into a resource then used to learn readability guidelines.

Resource creation and usage pipelines

The java folder contains a Maven 3 compliant program to build and use such a resource. The program is split into many sub-modules. Here is a fast description so that you can get around more easily:

  • uima-core, utils, corpus and model-* folders are just internal dependencies to make the Maven build smooth. They are not really applications;

  • mediawiki-importer allows to import a mediawiki dump into a PostgreSQL database that is usable by the other programs;

  • uima-corpus-creator creates a corpus from the database data;

  • uima-scorer allows to score this corpus. At this point the scoring is not good;

  • uima-server is a way to expose the corpus to other applications;

  • uima-evaluator is work in progress.

Play server

The server folder contains a Play Framework webservice that exposes the corpus throught JSON requests to other applications

Web client

The client folder contains a React application to test the system on the web.

A demo is running at http://readability.crydee.eu (first run is very slow, it's hosted on Heroku and the application needs to start).