Ongoing Projects

Past Projects

Given the degree of development achieved in linguistic studies and Computational Linguistics in our province Santiago de Cuba, and taking into account the imperious need of having computational tools to carry out research on language and processing huge amounts of information with a greater scientific rigor, it is indispensable to continue creating new Natural Language Processing (NLP) systems.

This project will provide, not only researchers on Linguistics all over the country, but also other users of NLP, four tools that will enable an important qualitative jump on language studies and analysis, as well as enabling the processing of natural language information

The main research lines of this project are the following:

  1. Classification: Development of new categorization and clustering algorithms that generates clusters with better quality than current algorithms and be able to deal with dynamic and large document collections. Conceptual description techniques and summarization will be used to improve the quality of obtained clusters.

  2. Information Retrieval: Proposal of new retrieval techniques that help users to find the desired documents. The algorithms must be able to process very large and dynamic datasets.

  3. Efficient processing of very large document collections: Indexing techniques and parallel algorithms will be proposed to process efficiently very large document collections. These parallel algorithms allow both classification and retrieval.


Survey design and processing is a subject that has acquired relevance during the last years. When carried out by traditional means it is complex, cumbersome and inaccurate, mostly when the information volume to be handled is huge.

This project aims to study the problem of survey management at its different phases: Design, Implementation and Results Processing. Besides, it is characterized form a computational viewpoint and as a final result we intend to obtain a system for its automation.

Automatic news processing is a subject that has gained relevance during the last years due to the vertiginous growth of online information available in electronic media and the Internet.

The purpose of this project is developing a computational system for automatically processing an online news stream coming from several press agencies. The system will allow clustering news into events and will provide summaries for these events in order to build informative bulletins and facilitate the information analyst’s work

This virtual assistant will enable end-users to obtain information about software applications  developed by Datys. Users will interact with the assistant through a friendly web interface featuring an avatar to which they will be able to ask natural language questions and obtain concrete  answers. With this aim, the system will rely on novel techniques and tools, both automatic and semi-automatic, for Natural Language Processing in order to store, access and present the information and keep it up to date, thus minimizing the human effort devoted to maintenance tasks, both for the contents and the system in general.

Nowadays, the use of computing techniques has brought about a number of projects involving the use of knowledge from a variety of domains. In this project, we propose a system for scoring texts written in Spanish and some of its variants, keeping open the possibility of handling other languages, which determines its multilingual nature.

This type of system is scarce due to its high cost, and would in fact constitute the only of its kind analyzing texts in variants of Spanish.

The system aims at scoring spelling and redaction style of elementary, basic secondary and pre-university students of the Cuban Ministry of Education. The system features a client-server architecture supported by Web technology. It uses the analysis provided by the Natural Language Processing Toolkit developed by DATYS-SC and includes new linguistic indicators provided by experts from the Center for Applied Linguistics of Santiago de Cuba, which will in turn facilitate integral text scoring.

This system allows students to self-evaluate their text redaction skills. It proposes suggestions to solve detected deficiencies and presents exercises to solve. Professors are offered statistics concerning their students.

Our purpose is to deploy this system at computer labs of schools belonging to the Cuban Ministry of Education, initially at a regional level and later globally in the national system. Using the Evaluator, it will be possible to obtain statistics on the status of language skills of students at the aforementioned levels. It will also favor learning and practice of the mother tongue.

Contact Details

Telephone:+53-(22)-644225
Email: info@cerpamid.co.cu
Chat: www.cerpamid.co.cu/Chat

Universidad de Oriente,
Patricio Lumumba s/n,
Santiago de Cuba 90500,
Cuba