New bioinformatics tool accurately tracks synthetic DNA

Share via

Posted: 1 March 2021 | Victoria Rees (Drug Target Review) | No comments yet

A team has demonstrated that their bioinformatics approach, PlasmidHawk, can analyse DNA sequences to identify the source of engineered plasmids.

New bioinformatics research by computer scientist Todd Treangen of Rice University, US, has focused on whether sequence alignment and pan-genome-based methods can outperform recent deep learning approaches when tracking the origin of synthetic genetic code.

“This is, in a sense, against the grain given that deep learning approaches have recently outperformed traditional approaches,” Treangen said. “My goal with this study is to start a conversation about how to combine the expertise of both domains to achieve further improvements for this important computational challenge.”

Treangen and his team at Rice introduced PlasmidHawk, a bioinformatics approach that analyses DNA sequences to help identify the source of engineered plasmids of interest.

Drug Target Review has just announced the launch of its NEW and EXCLUSIVE report examining the evolution of AI and informatics in drug discovery and development.

In this 63 page in-depth report, experts and researchers explore the key benefits of AI and informatics processes, reveal where the challenges lie for the implementation of AI and how they see the use of these technologies streamlining workflows in the future.

Also featured are exclusive interviews with leading scientists from AstraZeneca, Auransa, PolarisQB and Chalmers University of Technology.

FREE DOWNLOAD HERE

“We show that a sequence alignment-based approach can outperform a convolutional neural network (CNN) deep learning method for the specific task of lab-of-origin prediction,” he said.

According to the researchers, the programme may be useful not only for tracking potentially harmful engineered sequences but also for protecting intellectual property.

“The goal is either to help protect intellectual property rights of the contributors of the sequences or help trace the origin of a synthetic sequence,” Treangen said. PlasmidHawk directly aligns unknown strings of code from genome data sets and matches them to pan-genomic regions that are common or unique to synthetic biology research labs

“To predict the lab-of-origin, PlasmidHawk scores each lab based on matching regions between an unclassified sequence and the plasmid pan-genome and then assigns the unknown sequence to a lab with the minimum score,” said lead author Qi Wang.

The researchers reported the successful prediction of “unknown sequences’ depositing labs” 76 percent of the time. They found that 85 percent of the time the correct lab was in the top 10 candidates.

Unlike the deep learning approaches, they say PlasmidHawk requires reduced pre-processing of data and does not need retraining when adding new sequences to an existing project. It also differs by offering a detailed explanation for its lab-of-origin predictions in contrast to the previous deep learning approaches.

“The goal is to fill your computational toolbox with as many tools as possible,” said co-author Ryan Leo Elworth, a postdoctoral researcher at Rice. “Ultimately, I believe the best results will combine machine learning, more traditional computational techniques and a deep understanding of the specific biological problem you are tackling.”

The researchers reported their results in Nature Communications. The open-source software is available here.

Related topics
Bioinformatics, DNA, Informatics, Patents

Related organisations
Rice University

Related people
Qi Wang, Ryan Leo Elworth, Todd Treangen

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

New bioinformatics tool accurately tracks synthetic DNA

Leave a Reply Cancel reply