Examinando por Autor "Cabeza, David"
Mostrando 1 - 2 de 2
Resultados por página
Opciones de ordenación
Ítem A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis(Emerald, 2021) Dongo, Irvin; Cardinale, Yudith; Aguilera, Ana; Martinez, Fabiola; Quintero, Yuni; Robayo, German; Cabeza, DavidPurpose – This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations. Design/methodology/approach – As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods. Findings – The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web. Originality/value – Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco.Ítem T-CREo: A Twitter Credibility Analysis Framework(IEEE, 2021) Cardinale, Yudith|Dongo, Irvin; Robayo, Germán; Cabeza, David; Aguilera, Ana; Medina, SergioSocial media and other platforms on Internet are commonly used to communicate and generate information. In many cases, this information is not validated, which makes it difficult to use and analyze. Although there exist studies focused on information validation, most of them are limited to specific scenarios. Thus, a more general and flexible architecture is needed, that can be adapted to user/developer requirements and be independent of the social media platform. We propose a framework to automatically and in real-time perform credibility analysis of posts on social media, based on three levels of credibility: Text, User, and Social. The general architecture of our framework is composed of a front-end, a light client proposed as a web plug-in for any browser; a back-end that implements the logic of the credibility model; and a third-party services module. We develop a first version of the proposed system, called T-CREo (Twitter CREdibility analysis framework) and evaluate its performance and scalability. In summary, the main contributions of this work are: the general framework design; a credibility model adaptable to various social networks, integrated into the framework; and T-CREo as a proof of concept that demonstrates the framework applicability and allows evaluating its performance for unstructured information sources; results show that T-CREo qualifies as a highly scalable real-time service. The future work includes the improvement of T-CREo implementation, to provide a robust architecture for the development of third-party applications, as well as the extension of the credibility model for considering bots detection, semantic analysis and multimedia analysis.