Escudo de la República de Colombia
Sistema Nacional de Biliotecas - Repositorio Institucional Universidad Nacional de Colombia Biblioteca Digital - Repositorio Institucional UN Sistema Nacional de Bibliotecas UN

A Recurrent Neural Network approach for whole genome bacteria classification

Lugo Martínez, Luis Eduardo (2018) A Recurrent Neural Network approach for whole genome bacteria classification. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.

Texto completo

[img]
Vista previa
PDF - Versión Publicada
Available under License Creative Commons Attribution.

1MB

Resumen

The classification of bacteria plays an essential role in multiple areas of research. Those areas include experimental biology, food and water industries, pathology, microbiology, and evolutionary studies. Although there exist methodologies for classification - such as mass spectrometry, single-nucleotide polymorphisms, microscopic morphology, and neural network approaches - a transition to a whole genome sequence based taxonomy is already undergoing. Next Generation Sequencing helps the transition by producing DNA sequence data efficiently. However, the rate of DNA sequence data generation and the high dimensionality of such data need faster computer methodologies. Machine learning, an area of artificial intelligence, has the ability to analyze high dimensional data in a systematic, fast, and efficient way. Therefore, we propose a sequential deep learning model for bacteria classification. The proposed neural network exploits the vast amounts of information generated by Next Generation Sequencing, in order to extract a classification model for whole genome bacteria sequences. A distributed representation based on k-mers of k={3,4,5} provided an efficient encoding for the bacterial sequences. The classification model relies on a bidirectional recurrent neural network architecture. It generates an accuracy of 0.99455 +/- 0.00281 for 14 species, 0.95031 +/- 0.00469 for 48 species, and 0.89107 +/- 0.00392 for 111 species. After validating the classification model, the bidirectional recurrent neural network outperformed other classification approaches, such as Naive Bayes and Feedforward neural network. The proposed model provides an automated identification method. It infers species for bacterial whole genome sequences and it does not require any manual feature extraction.

Tipo de documento:Tesis/trabajos de grado - Thesis (Maestría)
Colaborador / Asesor:Barreto, Emiliano
Información adicional:Magíster in Ingeniería de Sistemas y Computación. Línea de investigación: Bioinformática y salud.
Palabras clave:Recurrent neural network, Bacteria identification, Whole genome sequence
Temática:0 Generalidades / Computer science, information & general works
5 Ciencias naturales y matemáticas / Science
6 Tecnología (ciencias aplicadas) / Technology
6 Tecnología (ciencias aplicadas) / Technology > 62 Ingeniería y operaciones afines / Engineering
Unidad administrativa:Sede Bogotá > Facultad de Ingeniería > Departamento de Ingeniería de Sistemas e Industrial
Código ID:69758
Enviado por : Luis Lugo
Enviado el día :11 Oct 2018 12:46
Ultima modificación:11 Oct 2018 12:49
Ultima modificación:11 Oct 2018 12:49
Exportar:Clic aquí
Estadísticas:Clic aquí
Compartir:

Solamente administradores del repositorio: página de control del ítem

Vicerrectoría de Investigación: Número uno en investigación
Indexado por:
Indexado por Scholar Google WorldCat DRIVER Metabiblioteca OAIster BASE BDCOL Registry of Open Access Repositories SNAAC Red de repositorios latinoamericanos eprints Open archives La referencia Tesis latinoamericanas OpenDOAR CLACSO
Este sitio web se ve mejor en Firefox