Exploration of Hidden Research Directions in Oil and Gas Industry via Full Text Analysis of OnePetro Digital Library

Fedor Krasnov, Oleg Ushmaev


This study was conducted to present the possibilities of modern approaches to extracting information from text corpus. The purpose of this study is to provide answers to the following business questions using a scientific approach to the analysis of the text: What important areas of research have developed over the past year? What is new in oil and gas technologies?

The authors have successfully applied the technology of topic modeling to solve the problem. The focus of the research was quality of the topic model. This paper investigates the behaviors of metrics Perplexity Score and Sparsity Scores for matrices Θ and Φ in the regularization of the topic model.

The application of additive regularization allowed dividing the topics into main and noise, which significantly improved the interpretability of the topics.

Full Text:

PDF (Russian)


Hofmann T. Probabilistic latent semantic indexing //ACM SIGIR Forum. – ACM, 2017. – Т. 51. – №. 2. – С. 211-218.

Lu X., Zheng X., Li X. Latent semantic minimal hashing for image retrieval //IEEE Transactions on Image Processing. – 2017. – Т. 26. – №. 1. – С. 355-368.

Blei D. M., Ng A. Y., Jordan M. I. Latent dirichlet allocation //Journal of machine Learning research. – 2003. – Т. 3. – №. Jan. – С. 993-1022.

Law J. et al. LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations //arXiv preprint arXiv:1702.07117. – 2017.

Ianina A., Golitsyn L., Vorontsov K. Multi-objective topic modeling for exploratory search in tech news //Conference on Artificial Intelligence and Natural Language. – Springer, Cham, 2017. – С. 181-193.

Maaten L., Hinton G. Visualizing data using t-SNE //Journal of machine learning research. – 2008. – Т. 9. – №. Nov. – С. 2579-2605.

Agrafiotis D. K., Rassokhin D. N., Lobanov V. S. Multidimensional scaling and visualization of large molecular similarity tables //Journal of Computational Chemistry. – 2001. – Т. 22. – №. 5. – С. 488-500.

Pennington J., Socher R., Manning C. Glove: Global vectors for word representation //Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). – 2014. – С. 1532-1543.

Joulin A. et al. Bag of tricks for efficient text classification //arXiv preprint arXiv:1607.01759. – 2016.

Bojanowski P. et al. Enriching word vectors with subword information //arXiv preprint arXiv:1607.04606. – 2016.


  • There are currently no refbacks.

Abava  Absolutech Fruct 2020

ISSN: 2307-8162