About AmrProfiler
AmrProfiler is a tool for identifying genes associated with antimicrobial resistance, including acquired resistance genes. It also detects mutations in core genes—genes whose mutations contribute to antimicrobial resistance, such as altered porins that prevent antibiotics from entering the cell—and identifies rRNA gene copies along with their potential mutations.is a tool for identifying genes with antimicrobial resistance functions. It detects mutations in essential genes related to antimicrobial resistance and identifies rRNA genes and their possible mutations. The tool comprises three modules:
- Acquired Resistance Genes: Identifies acquired resistance genes.
- Core Gene Mutations: Identifies mutations in core genes that contribute to antibiotic resistance.
- rRNA Gene Mutations: Identifies rRNA gene copies and their mutations.
AmrProfiler relies on the RefSeq database (downloaded on 01/06/2025). It is implemented in Python 3.9.5 and uses the NcbiblastxCommandline utility from the Biopython library (v1.79), along with the BLAST+ suite (v2.9.0).
How to Run AmrProfiler
To run the analysis, the user can adjust all parameters as needed and then upload a FASTA file using the Upload File button. Upon a successful upload, the message Uploaded file: Assembly Name will appear. The user should then press Submit Job to start the job.
The process typically takes 2 to 30 minutes, depending on the server load. A unique link provided on the loading page can be saved and used later to view the results at any time. Below, the different parameters and results are explained.
For an example run, the user can use the pre-selected file ExampleFile_GCA_003605265.1_ASM360526v1_genomic.fna, an assembly of Staphylococcus aureus from the NCBI project PRJNA487708. To proceed, simply press Upload File and then Submit Job.
Acquired Antimicrobial Resistance Genes
This module uses BLASTX to identify antimicrobial resistance genes. Key parameters are:
- Identity Threshold: Minimum identity percentage for a hit to be kept.
- Coverage Threshold: Minimum coverage percentage for a hit to be kept. Partial genes (e.g., at contig ends) are retained regardless of this threshold.
- Protein Start Position: Alignments starting after the specified position in the reference protein are discarded. (or example, if the position is 100, alignments starting after the 100th aminoacid in the reference protein are discarded)
- Database: The user can select databases such as ResFinder+ReferenceGeneCatalog or ResFinder+ReferenceGeneCatalog+CARD. For basic searches, ResFinder+ReferenceGeneCatalog is recommended.
Results: Acquired Resistance Genes
The results list the AMR genes identified in the assembly based on the specified database. These results can be downloaded as AMRgenes_upload_file_name.csv and include the following columns:
- Contig ID: Name of the contig with the BLASTX hit.
- Resistance Gene: Name of the identified AMR gene.
- Product: Name of the AMR gene product.
- Gene ID: The gene ID of the AMR gene in the respective database.
- Identity: Percentage similarity between the query and the resistance gene (based on protein sequence alignment).
- Coverage: Percentage alignment coverage of the resistance gene (calculated from the protein sequence alignment).
- Mismatches: Number of mismatches between the query and the subject.
- Gaps: Number of gaps between the query and the subject.
- Antibiotic Class: Antibiotic class associated with the AMR gene.
- Antibiotic Subclass: Antibiotic subclass associated with the AMR gene.
- Comments: Notes based on BLASTX results.
- Resistance Mechanism: Mechanism of resistance conferred by the AMR gene.
- Query Start: Start position of the hit in the query contig.
- Query Stop: Stop position of the hit in the query contig.
- Reference Start: Start position of the reference protein alignment.
- Reference Stop: Stop position of the reference protein alignment.
- GenBank Accession: GenBank accession number of the AMR gene.
- PubMed ID: PubMed ID of studies describing the AMR phenotype for this AMR gene.
- Database: Database where the AMR gene is found.
- Notes: Additional database-specific notes.
- Alignment Length: Length of the BLASTX alignment (in aminoacids).
- Reference Length: Length of the reference protein (in aminoacids).
- E-value: Statistical significance of the BLASTX hit.
- BitScore: Alignment bit score.
- Genome Protein Sequence: The protein sequence of the AMR gene found in the uploaded genome.
- Protein Reference Sequence: The protein sequence of the reference AMR gene.
Mutations in Core Genes
This module identifies mutations in core genes by comparing them to the reference proteins using BLASTX. It also finds known mutations near identified mutations based on the ResFinder+ReferenceGeneCatalog+CARD databases. The following parameters are needed:
- Protein Start Position: Alignments starting after the specified position are discarded.
- Distance from Known Mutation: Reports mutations within the specified distance (e.g., 1 amino acid distance) from known mutations.
The module also identifies nearby known mutations in the same or other species, providing comprehensive mutation data. In order to run this module, the user needs to select a species.
Results: Mutations in Core Genes
The results are presented in two tables.
The first table, Core Genes of the Reference Genome of the Selected Species, includes the species selected by the user, the reference assembly used, and the core genes identified for this species that are searched for mutations. This table can be downloaded as CoreGenes_ReferenceGenome__upload_file_name.csv.
The second table, Mutations in Core Genes in the Uploaded Genome, contains the actual mutations identified in the assembly. This table can be downloaded as PointMutations_ReferenceGenome__upload_file_name.csv.
The columns included in the results are:
- Contig ID: Name of the contig with the BLASTX hit.
- Gene Name: Name of the identified core gene.
- Core Gene ID: Gene ID in the respective database.
- Comments: Types of mutations found (e.g., deletion, insertion, amino acid changes).
- Differences from Reference Protein: Changes in protein sequence compared to the reference.
- Selected Species Known Mutations: Nearby known mutations associated with the antimicrobial resistance (AMR) phenotype in the selected species.
- Other Species Known Mutations: Nearby known mutations associated with the antimicrobial resistance (AMR) phenotype in other species.
- Identity: Percentage similarity between the query and the core gene (based on protein sequence alignment).
- Coverage: Percentage alignment coverage of the core gene (calculated from the protein sequence alignment).
- Length of Reference Protein: Length of the reference protein of the core gene.
- Query Start: Start position of the hit in the query contig.
- Query Stop: Stop position of the hit in the query contig.
- Reference Start: Start position of the reference protein alignment (in aminoacids).
- Reference Stop: Stop position of the reference protein alignment (in aminoacids).
- Alignment Length: Length of the BLASTX alignment.
- Mismatches: Number of mismatches between the query and the subject.
- Gaps: Number of gaps between the query and the subject.
- E-value: Statistical significance of the BLASTX hit.
- BitScore: Alignment bit score.
- Genome Protein Sequence: The protein sequence of the AMR gene found in the uploaded genome.
- Protein Reference Sequence: The protein sequence of the reference AMR gene.
Mutations in rRNA Genes
This module identifies rRNA gene copies and their mutations by comparing them to the reference genome. Using BLASTN, it determines if copies of the reference strain's rRNA genes are present in the assembly. The module also finds known mutations near identified mutations based on the ResFinder+ReferenceGeneCatalog+CARD databases, reporting all rRNA gene copies and their mutations relative to the reference genome. It needs the following parameter:
- Distance from Known Mutation: Reports mutations within the specified distance (e.g., 5 nucleotides) from known mutations.
In order to run this module, the user needs to select a species.
Results: Mutations in rRNA Genes
The results are organized into two tables:
The first table, rRNA Genes of the Reference Genome of the Selected Species, lists the selected species, the reference assembly used, and the rRNA genes along with the number of copies identified for this species. These are the rRNA genes and copies analyzed for mutations. This table can be downloaded as rRNAMutations_ReferenceGenome__upload_file_name.csv.
The second table, Mutations in rRNA Genes in the Uploaded Genome, presents the actual mutations identified in the assembly. This table can be downloaded as PointMutations_ReferenceGenome__upload_file_name.csv.
The columns included in the results are:
- Contig ID: Name of the contig with the BLASTN hit.
- rRNA Gene ID: ID of the identified rRNA gene.
- Differences from Reference Gene: Changes in DNA sequence compared to the reference.
- Selected Species Known Mutations: Nearby known mutations associated with the antimicrobial resistance (AMR) phenotype in the selected species.
- Other Species Known Mutations: Nearby known mutations associated with the antimicrobial resistance (AMR) phenotype in other species.
- rRNA Copy Sequence: DNA sequence of the rRNA gene copy in the uploaded genome.
- rRNA Copy Reference Sequence: DNA sequence of the reference rRNA gene.
- Query Start: Start position of the hit in the query contig.
- Query Stop: Stop position of the hit in the query contig.
- Strand: The strand of the rRNA gene copy in the uploaded genome compared to the reference.
- Identity: Percentage similarity between the query and the rRNA gene.
- Coverage: Percentage alignment coverage of the rRNA gene.
- Reference rRNA gene length: Length of the reference rRNA gene.
- E-value: Statistical significance of the BLASTN hit.
- BitScore: Alignment bit score.
- Gaps: Number of gaps between the query and the subject.
- Mismatches: Number of mismatches between the query and the subject.
- Reference Genome: Reference genome used for detecting rRNA genes.




