about


What is it?
Conta-me Histórias (Tell me stories), is an online tool that allows users to automatically generate temporal summarization of news articles maintained by the Portuguese Web Archive. During the last decade, we have been witnessing an ever-growing number of online content posing new challenges for those who aim to understand a given event. This exponential growth of the volume of data, together with the phenomenon of media bias, fake news and filter bubbles, has contributed to the creation of new challenges in information access and transparency. For instance, following the media coverage of long-lasting events like wars, migration or economic crises can be oftentimes confusing and demanding. One possible solution is the adoption of timelines to support story-telling as a way to organize the different phases of complex events. In this context, we invite students, journalists and researchers to explore our solution in this beta application.



How does it work?

The Arquivo.pt project archives the web periodically. It collects and stores entire websites, processing the data to make it searchable and finally providing a full-text search service that enables the retrieval of the past versions of the site.

To showcase the data archived by the Portuguese Web Arquive (http://arquivo.pt) , we show the user the most important excerpts (namely text titles) of a topic over time. For the selection of the best news titles we resort to YAKE! a keyword extractor designed by our team, and recently awarded as Best Short Paper at the 40th European Conference for Information Retrieval (ECIR 2018) (ECIR'18 ).


Additionally, we used the SentiLex-PT01, a sentiment analysis tool specially designed for Portuguese, used in this project to analyze headlines polarity.


Finally, making use of PAMPO designed to detect a list of relevant entities related to the query.



Contributions

We presented a web application that allows users to generate temporal summarization on large news data sets. One of the main goals of this work is to attract the attention of the public for this promising research area. In the era of post-truth and fake news, web and news archives initiatives are important contributions to preserve history. In this context, our project may be considered an additional solution that allows users to better explore this kind of data.



Origins of the project name

The name ‘Conta-me Histórias’ is a reference to a popular song from one of the most important Portuguese rock musicians group Xutos & Pontapés.

Design, video and presentation: Livia Stroschoen Pinent


News sources

In this project we make use of 24 Portuguese news sources.




References

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A. (2018). A Text Feature Based Automatic Keyword Extraction Method for Single Documents. In Proceedings of the 40th European Conference on Information Retrieval (ECIR'18). Grenoble, France, March 26- 29, pp. 684 - 691.

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A. (2018). YAKE! Collection-independent Automatic Keyword Extractor. In Proceedings of the 40th European Conference on Information Retrieval (ECIR'18). Grenoble, France, March 22 - 29, pp. 806 - 810. [Online Demo]
Silva, M., & Carvalho, P., & Costa, C., & Sarmento, L. (2010). Automatic Expansion of a Social Judgment Lexicon for Sentiment Analysis. Technical Report. TR 10-08. University of Lisbon, Faculty of Sciences, LASIGE, December 2010. doi: 10455/6694

Rocha C. , Jorge A., Sionara R., Brito P., Pimenta C., Rezende S. (2016) PAMPO: using pattern matching and pos-tagging for effective Named Entities recognition in Portuguese

Gomes D. and Cruz D. and Miranda J. and Costa M. and Fontes S.: Acquiring and providing access to historical web collections, 10th International Conference on Preservation of Digital Objects, (2013)


Institutions