I, Oliver Bonham-Carter 👋

Assistant Professor in Computer Science, Allegheny College

I, Oliver Bonham-Carter, In June

I, Oliver Bonham-Carter 👋

Assistant Professor in Computer Science, Allegheny College

BeagleTM

BeagleTM

Abstract

Investigators in bioinformatics are often confronted with the difficult task of connecting ideas, which are found scattered around the literature, using robust keyword searches. It is often customary to identify only a few keywords in a research article to facilitate search algorithms, which is usually completed in absence of a general approach that would serve to index all possible keywords of an article’s characteristic attributes. Based on only a hand-full of keywords, articles are therefore prioritized by search algorithms that point investigators to seeming subsets of their knowledge. In addition, many articles escape algorithm search strategies due to the fact that their keywords were vague, or have become unfashionable terms. In this case, the article, as well as its source of knowledge, may be lost to the community. Owing to the growing size of the literature, we introduce a text mining method and tool, (BeagleTM), for knowledge harvesting from papers in a literature corpus without the use of article meta-data. Unlike other text mining tools that only highlight found keywords in articles, our method allows users to visually ascertain which keywords have been featured in studies together with others in peer-reviewed work. Drawing from an arbitrarily-sized corpus, BeagleTM creates visual networks describing interrelationships between user-defined terms to facilitate the discovery of connected or parallel studies. We report the effectiveness of BeagleTM by illustrating its ability to connect the keywords from types of PTMs (post-translational modifications), stress-factors, and disorders together according to their relationships. These relationships facilitate the discovery of connected studies, which is often challenging to determine due to the frequently unrelated keywords that were tied to relevant articles containing this type of information.


Published Works


The general method.

General Idea

The flowchart.

Software Flow Chart

BeagleTM separates papers by topics.

Sorting papers by topic

Networks are produced by first parsing all available corpus articles for specific user-selected keywords. These results are used to create networks, and to provide other details about the inter-connectivity and coverage of ideas in across corpus.

RelationshipNetwork

Relationship Networks provide information about which articles are connected to others in terms of overlapping ideas.

ConnectivityNetwork

Connectivity Networks all investigator to determine how much coverage there is of a particular keyword (or set of keywords) across the literature which helps to choose effective keys for searches.


Testimonial

Flint the Beagle!

Flint the Beagle!

Inspiration

This text mining tool was built from inspiration from me. How cool is that!? I approve this software.