Abalign is a comprehensive multiple sequence alignment platform for B cell receptor immune repertoires.

Multiple sequence alignment(MSA) has long been used as a powerful method to investigate the evolutionary, structural and functional properties of protein families. It is also a fundamental technique in recent deep-learning based protein 3D structure predictors. Though existing MSA methods have been well-established, they are not suitable for high-throughput computation, and do not fulfill the needs of processing BCR or antibody sequences, because the highly variable regions cannot be well aligned, without the prior knowledge of gene recombination and hypermutation in antibody maturation. To our knowledge, no MSA tool is particularly designed for BCR alignment up to day. To address this issue, we developed a multiple sequence alignment platform, named Abalign, which integrates heuristic biological knowledge of the antibody sequence numbering systems, including IMGT, Kabat, Martin and Chothia to guide the alignment. Comprehensive benchmark tests showed that Abalign outperformed the existing MSA tools in accuracy, speed and memory consumption significantly. Abalign was implemented in a user-friendly stand-alone program with interactive and visual interfaces, which support the multiple sequence alignment, as well as clustering, antibody numbering, delimiting CDR, constructing phylogenetic tree, VJ gene determination, clonotype analysis, aiding humanization, comparing BCR immune repertoires, etc. by just clicking the buttons. It supports the high-throughput analysis on personal computer running Linux or Windows. Abalign has been released on GitHub for one year's tests. It would be a powerful and efficient tool for biological researchers to analyze massive BCR or antibody sequences and get new discoveries in immunoinformatic studies.

Abalign is a state-of-the-art MSA method.
Fig 1. The result of IS and CS scores of MSAs.

We compared Abalign with three state-of-the-art MSA tools: Clustal Omega, MAFFT, MUSLCE in terms of accuracy, usability, time and memory consumption. The average scores of the four tools for three samples are shown in the figure. In terms of CS and IS scores, Abalign outperformed the other three tools, indicating that the multiple sequence alignment results of Abalign substantially improved the MSA for BCR or antibody sequences.

Abalign is a comprehensive platform for BCR immune repertoires analysis
Fig 1. The working pipeline of Abalign.

Based on its state-of-the-art performance in MSA, Abalign provides antibody numbering, CDRs and FRs recognition, VJ gene identifying, phylogenetic tree constructing, clonotype analysis, residue preferences, diversity analysis, etc. Beyond that, Abalign supports multi-file cross-analysis, which is super convenient for BCR immune repertoires.


Zong, F., Long, C., Hu, W. et al. Abalign: a comprehensive multiple sequence alignment platform for B cell receptor immune repertoires. In preparation.