Escudo de la República de Colombia
Sistema Nacional de Biliotecas - Repositorio Institucional Universidad Nacional de Colombia Biblioteca Digital - Repositorio Institucional UN Sistema Nacional de Bibliotecas UN

Automatic authorship analysis using Deep neural networks

Sierra Loaiza, Sebastian Ernesto (2018) Automatic authorship analysis using Deep neural networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.

Texto completo

[img] PDF - Versión Aceptada
Available under License Creative Commons Attribution Non-commercial No Derivatives.

2MB

Url relacionadas:

Resumen

Authorship analysis helps to study the characteristics that distinguish how two different persons write. Writing style can be extracted in several ways, like using bag of words strategies or handcrafted features. However, with the growing of Internet, we have been able to witness an increase in the amount of user generated data in social networks like Facebook or Twitter. There is an increasing need in generating automatic methods capable of analyzing the style of a document for tasks like: determining the age of the author, determining the gender of the author, determining the authorship of the document given a set of possible authors, etc. Previous tasks are better known as author profiling and authorship attribution. Although capturing the style of an author can be a challenging task, in this thesis we explore representation learning strategies, in order to take advantage of the large amount of data generated by social media. In this thesis, we learned proper representations for the text inputs that were able to learn such patterns that are only distinguishable to an author (authorship attribution) or a social group of authors (author profiling). Proposed methods were compared using different publicly available datasets using social media data. Both author profiling and authorship attribution tasks are addressed using representation learning techniques such as convolutional neural networks and gated multimodal units. Our unimodal author profiling approach was submitted to the profiling shared task of the laboratory on digital forensics and stylometry(PAN). For authorship attribution, we proposed a convolutional neural network using character n-grams as input. We found that our approach outperformed standard attribution based methods as well as word based convolutional neural networks. For the author profiling task, we proposed one convolutional neural network for unimodal author profiling and adapted a gated multimodal unit for multimodal author profiling. The multimodal nature of user generated content consists of a scenario where the social group of an author can be determined not only using his/her written texts but using also the images that the user shared across the social networks. Gated multimodal units outperformed standard information fusion strategies: early and late fusion.

Tipo de documento:Tesis/trabajos de grado - Thesis (Maestría)
Colaborador / Asesor:González Osorio, Fabio Augusto
Información adicional:Magíster en Ingeniería de Sistemas y Computación Línea de Investigación: Machine Learning.
Palabras clave:Machine learning, Supervised Learning, Representation Learning, Automatic Authorship Analysis, Authorship Attribution, Author Profiling, Multimodal Author Profiling
Temática:0 Generalidades / Computer science, information & general works
6 Tecnología (ciencias aplicadas) / Technology > 62 Ingeniería y operaciones afines / Engineering
Unidad administrativa:Sede Bogotá > Facultad de Ingeniería > Departamento de Ingeniería de Sistemas e Industrial > Ingeniería de Sistemas
Código ID:73495
Enviado por : Sr Sebastian Ernesto Sierra Loaiza
Enviado el día :03 Septiembre 2019 15:21
Ultima modificación:03 Septiembre 2019 15:39
Ultima modificación:03 Septiembre 2019 15:39
Exportar:Clic aquí
Estadísticas:Clic aquí
Compartir:

Solamente administradores del repositorio: página de control del ítem

Vicerrectoría de Investigación: Número uno en investigación
Indexado por:
Indexado por Scholar Google WorldCat DRIVER Metabiblioteca OAIster BASE BDCOL Registry of Open Access Repositories SNAAC Red de repositorios latinoamericanos eprints Open archives La referencia Tesis latinoamericanas OpenDOAR CLACSO
Este sitio web se ve mejor en Firefox