The DIFFUSER server comes back online! It has undergone an upgrade in line with the university's security requirements. We apologize for any inconvenience brought by this unexpected shut down.

DIFFUSER logo

A distributed framework to generate machine learning features based on protein, DNA and RNA sequences

High throughput sequencing technologies have generated a huge amount of biological sequences over decades, including protein, DNA and RNA. Accordingly, many machine learning based methods are developed based on these sequences to provide powerful toolkits for surveying, classifying and predicting biological data. However, it poses a significant challenge to transform the raw sequence information into more meaningful, sequence-order-incorporated and sequence-pattern-recognized features before feeding them into computational models. Here, we have developed DIFFUSER, a distributed framework to efficiently and comprehensively generate a broad spectrum of heterogeneous features derived from biological sequences, including protein, DNA and RNA sequences. DIFFUSER outperformed current existing feature generators with three obvious improvements: 1) a brand-new distributed architecture to improve the online feature generating process by n times using decentralized/parallel computing and distributed storage. 2) the most comprehensive feature generator to cover the largest number of features in a broadest spectrum to provide all-in-one service; and 3) both a user-friendly web-based server and a unified-designed, cross-platform standalone toolkit to provide consistent feature generating service with full support of feature customization.

Download it!

Go to Use it!

Learn more

Reminder:
If you find our work useful for your research work, please cite:
DIFFUSER Development Team
Lithgow Group
Infection and Immunity Program
Biomedicine Discovery Institute
Faculty of Medicine, Nursing and Health Sciences
Monash University
Melbourne, VIC 3800, Australia
Contact Us