0

Bioinformatics

Description: Biotechnology Bioinformatics GATE Genomics and Proteomics Sequence Analysis Sequence and Structure DatabasesBioinformatics
Number of Questions: 25
Created by:
Tags: Biotechnology Bioinformatics GATE Genomics and Proteomics Sequence Analysis Sequence and Structure Databases
Attempted 0/25 Correct 0 Score 0

Which of the following BLAST programs are used to study the sequence-structure-function relationship of nucleotide sequences?

  1. Blastp
  2. MEGABLAST
  3. Blastn
  4. PHI-BLAST
  1. Only 1 and 2

  2. Only 1 and 3

  3. Only 2 and 3

  4. Only 2 and 4

  5. Only 3 and 4


Correct Option: C
Explanation:

Comparative sequence analysis is the first step to study sequence-structure-function relationship in protein and nucleotide sequences. Some examples of such programs are: Protein sequences: Blastp, PSI-BLASTand PHI-BLAST Nucleotide sequences: MEGABLAST, Discontigous-megablast, Blastn.

Which of the following statements is/are CORRECT?

  1. DNA microarrays are used to measure the concentration of different tRNAs in a biological sample.
  2. The complete set of sRNAs that are transcribed in a cell is often called its transcriptome.
  3. Expressed sequence tags (ESTs) are short, single-read mRNA sequences.
  1. Only 1

  2. Only 2

  3. Only 3

  4. All of these

  5. None of these


Correct Option: E
Explanation:

All are incorrect, hence this is correct answer. DNA microarrays are used to measure the concentration of different mRNAs in a biological sample. The complete set of mRNAs that are transcribed in a cell is often called its transcriptome. Expressed sequence tags (ESTs) are short, single-read cDNA sequences.

Which of the following statements are FALSE?

P. Paralogs are genes in different species that are diverged from a common ancestral sequence, as a result of the speciation. Q. PSI-BLAST builds profiles and performs data base searches in an iterative fashion. R. A domain may or may not include motifs within its boundaries. S. Motifs and domains are evolutionarily less conserved than other regions of a protein and tend to evolve as units.

  1. Only P and Q

  2. Only P and R

  3. Only P and S

  4. Only Q and R

  5. Only Q and S


Correct Option: C
Explanation:

Yes, this is the correct answer.  Mutations in the upstream or downstream regions can cause differential regulation without actually changing the function of the gene. Such homolog genes in one genome are called paralogs. Orthologs are genes in different species that diverge from a common ancestral sequence as a result of the speciation. Motifs and domains are evolutionarily more conserved than other regions of a protein and tend to evolve as units, which are gained, lost, or shuffled as one module.

Which of the following programs queries protein sequences to a nucleotide sequence database with the sequences translated in all six reading frames?

  1. BLASTN

  2. BLASTP

  3. BLASTX

  4. TBLASTN

  5. TBLASTX


Correct Option: D
Explanation:

Yes, it is correct. TBLASTN queries protein sequences to a nucleotide sequence database with the sequences translated in all six reading frames.

Which of the following programs did Pearson and Lipman searched for similarity between a protein query sequence and any group of nucleotide sequences?

P. FastA Q. TFastA R. FastX S. TFastX

  1. Only P and Q

  2. Only P and S

  3. Only Q and R

  4. Only Q and S

  5. Only R and S


Correct Option: D
Explanation:

Yes, it is correct. TFastA does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences. TFastX does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences, taking frameshifts into account.

Which of the following is a protein family database based on phylogenetic classification?

  1. PRINTS

  2. COG

  3. CDART

  4. InterPro

  5. SMART


Correct Option: B
Explanation:

Yes, it is the correct answer. COG (Cluster of Orthologous Groups) is a protein family database based on the phylogenetic classification.

Which of the following are examples of progressive alignment programs?

  1. PRRN
  2. DIALIGN2
  3. Match-Box
  4. Clustal
  5. Poa
  1. Only 1 and 2

  2. Only 1, 3 and 5

  3. Only 2 and 4

  4. Only 3, 4 and 5

  5. Only 4 and 5


Correct Option: E
Explanation:

Probably the most well-known progressive alignment program is Clustal. Clustal is a progressive multiple alignment program available either as a stand alone or an online program. Poa (Partial order alignments) is a progressive alignment program that does not rely on guide trees. Instead, the multiple alignment is assembled by adding sequences in the order they are given.

In which of the following cases, it is preferable to use BLASTX?

  1. If the input sequence is a protein-encoding DNA sequence.
  2. If one is looking for protein homologs encoded in newly sequenced genomes.
  3. If a DNA sequence is to be used as the query.
  1. Only 1 and 2

  2. Only 1 and 3

  3. Only 2 and 3

  4. All of these

  5. None of these


Correct Option: B
Explanation:

Yes, both are correct. If the input sequence is a protein-encoding DNA sequence, it is preferable to use BLASTX, which translates it in six open reading frames before sequence comparisons are carried out. If a DNA sequence is to be used as the query, a protein-level comparison can be done with TBLASTX. However, both programs are very computationally intensive and the search process can be very slow.

Which of the following programs is able to do both simple sequence format conversion as well as alignment format conversions?

  1. BioEdit

  2. Readseq

  3. Rascal

  4. RevTrans

  5. PROTA2DNA


Correct Option: B
Explanation:

The Readseq program is able to perform format conversion of multiple alignments. Readseq is a web-based program that is able to do both simple sequence format conversion as well as alignment format conversions.

Which of the following gene prediction programs use discriminant analysis for exon prediction?

P. FGENES Q. MZEF R. GENSCAN S. HMMgene

  1. Only P and Q

  2. Only P and R

  3. Only P and S

  4. Only Q and R

  5. Only R and S


Correct Option: A
Explanation:

Yes, it is the correct answer. FGENES (Find Genes) is a web-based program that uses linear discriminant analysis (LDA) to determine whether a signal is an exon or not. MZEF (Michael Zhang’s Exon Finder) is a web based program that uses quadratic discriminant analysis (QDA) for exon prediction.

Which of the following statements is/are TRUE?

  1. Praline is more accurate than Clustal.
  2. Alignment at the protein level is more sensitive than at the DNA level.
  3. Sequence alignment at the protein level is much more informative for functional and evolutionary analysis.
  1. Only 1

  2. Only 2

  3. Only 3

  4. All of these

  5. None of these


Correct Option: D
Explanation:

Yes, this is the correct answer. Praline is profile based and has the capacity to restrict alignment based on protein structure information, and is thus much more accurate than Clustal. Alignment at the protein level is more sensitive than at the DNA level. Sequence alignment directly at the DNA level can often result in frameshift errors because in DNA alignment gaps are introduced irrespective of codon boundaries. Sequence alignment at the protein level is much more informative for functional and evolutionary analysis.

Which of the following algorithms requires the user to provide a pairwise or multiple alignment as input?

  1. RNAalifold
  2. Foldalign
  3. Dynalign
  1. Only 1

  2. Only 2

  3. Only 3

  4. Only 1 and 2

  5. Only 2 and 3


Correct Option: A
Explanation:

This type of algorithm requires the user to provide a pairwise or multiple alignment as input. The sequence alignment can be obtained using standard alignment programs, such as T-Coffee, PRRN, or Clustal. RNAalifold is a program in the Vienna package. It uses a multiple sequence alignment as input to analyse covariation patterns on the sequences.

Which of the following is a RNA database?

  1. PDBe

  2. MONOCLdb

  3. PRIDE

  4. PEDANT

  5. SNPedia


Correct Option: B
Explanation:

The MOuse NOnCode Lung database (MONOCLdb) is an integrative and interactive database designed to retrieve and visualise annotations, expression profiles and functional enrichment results of long-non coding RNAs (lncRNAs) expressed in Collaborative Cross founder mice, in response to respiratory infections caused by influenza and SARS-CoV viruses.

Which of the following are consensus-based gene prediction programs?

  1. GenomeScan
  2. EST2Genome
  3. SGP-1
  4. GeneComber
  5. DIGIT
  1. Only 1, 2 and 3

  2. Only 2 and 4

  3. Only 3 and 4

  4. Only 3, 4 and 5

  5. Only 4 and 5


Correct Option: E
Explanation:

Consensus-based gene prediction programs Because different prediction programs have different levels of sensitivity and specificity, it makes sense to combine results of multiple programs based on consensus. GeneComber is a web server that combines HMMgene and GenScan prediction results. DIGIT is another consensus-based web server. It uses prediction from three ab initio programs – FGENESH, GENSCAN, and HMMgene.

Which of the following statements are FALSE regarding PSIBLAST?

  1. PSIBLAST uses PSSMs to score matches between query and database sequences.
  2. PSIBLAST is a statistically driven search method that finds regions of similarity between your query sequence and database sequences and produces ungapped alignments of those regions.
  3. The number of scoring matrices available for use with PSIBLAST is limited.
  1. Only 1

  2. Only 2

  3. Only 3

  4. Only 1 and 2

  5. Only 2 and 3


Correct Option: B
Explanation:

PSIBLAST is a statistically driven search method that finds regions of similarity between your query sequence and database sequences and produces gapped alignments of those regions. Within these aligned regions, the calculated score is a level higher than you would normally expect.

Which of the following scoring matrices is/are based on implicit model of evolution?

  1. PAM
  2. Gonnet
  3. JTT
  4. BLOSUM
  1. Only 1

  2. Only 1, 2 and 3

  3. Only 2 and 3

  4. Only 3 and 4

  5. Only 4


Correct Option: E
Explanation:

In bioinformatics, the BLOSUM (Blocks Substitution Matrix) matrix is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are based on an implicit model.

Which of the following is a program for searching bacterial ρ-independent termination signals located at the end of operons?

  1. PROM

  2. FindTerm

  3. FGENES

  4. MZEF

  5. FirstEF


Correct Option: B
Explanation:

FindTerm is a program for searching bacterial ρ-independent termination signals located at the end of operons.

Which of the following is a web-based program that uses a neural network to make promoter predictions?

  1. Cluster-Buster

  2. Eponine

  3. MZEF

  4. FGENES

  5. McPromoter


Correct Option: E
Explanation:

McPromoter is a web-based program that uses a neural network to make promoter predictions.

Match the entries in Column - I with those in Column - II.

 
Group - I
Group - II
P. FGENESB 1. A suite of gene prediction programs based on the fifth-order HMMs.
Q. GeneMark 2. A UNIX program from TIGR that uses the IMM algorithm to predict potential coding regions.
R. Glimmer 3. A web-based program specifically trained for bacterial sequences based on fifth-order HMMs for detecting coding regions
  1. P - 1, Q - 2, R - 3

  2. P - 1, Q - 3, R - 2

  3. P - 3, Q - 1, R - 2

  4. P - 2, Q - 3, R - 1

  5. P - 2, Q - 1, R - 3


Correct Option: C
Explanation:

Yes, it is correct. FGENESB is a web-based program that is also based on fifth-order HMMs for detecting coding regions. The program is specifically trained for bacterial sequences. GeneMark is a suite of gene prediction programs based on the fifth-order HMMs. Glimmer (Gene Locator and Interpolated Markov Modeler) is a UNIX program from TIGR that uses the IMM algorithm to predict potential coding regions. The computation consists of two steps, namely model building and gene prediction.

Which of the following are phylogenetic footprinting–based methods?

P. ConSite Q. PromH(W) R. MEME S. AlignACE

  1. Only P and Q

  2. Only P and R

  3. Only P and S

  4. Only Q and R

  5. Only R and S


Correct Option: A
Explanation:

Phylogenetic Footprinting–Based Method The identification of conserved noncoding DNA elements that serve crucial functional roles is referred to as phylogenetic footprinting; the elements are called phylogenetic footprints. This type of method can apply to both prokaryotic and eukaryotic sequences. ConSite is a web server that finds putative promoter elements by comparing two orthologous sequences. PromH(W) is a web-based program that predicts regulatory sites by pair- wise sequence comparison.

Which of the following is a web-based program that uses a consensus method to identify promoter elements for human DNA?

  1. CONPRO

  2. McPromoter

  3. BPROM

  4. FGENES

  5. CpGProD


Correct Option: A
Explanation:

CONPRO is a web-based program that uses a consensus method to identify promoter elements for human DNA.

Match the entries in Group - I with those in Group - II.

 
Group - I
Group - II
P. PSI-BLAST 1. Iteratively searches one or more protein databases for sequences similar to one or more protein query sequences.
Q. PHI-BLAST 2. Searches for proteins that contain a pattern specified by the user AND are similar to the query sequence in the vicinity of the pattern.
R. blastp 3. Used for both identifying a query amino acid sequence and for finding similar sequences in protein databases.
  1. P - 1, Q - 2, R - 3

  2. P - 2, Q - 3, R - 1

  3. P - 2, Q - 1, R - 3

  4. P - 3, Q - 1, R - 2

  5. P - 3, Q - 2, R - 1


Correct Option: A
Explanation:

Position-Specific Iterated BLAST (PSIBLAST) iteratively searches one or more protein databases for sequences similar to one or more protein query sequences. PSIBLAST is similar to BLAST except that it uses position-specific scoring matrices derived during the search. Pattern-Hit Initiated BLAST (PHI-BLAST) is designed to search for proteins that contain a pattern specified by the user AND are similar to the query sequence in the vicinity of the pattern. This dual requirement is intended to reduce the number of database hits that contain the pattern, but are likely to have no true homology to the query. Standard protein-protein BLAST (blastp) is used for both identifying a query amino acid sequence and for finding similar sequences in protein databases. Like other BLAST programs, blastp is designed to find local regions of similarity. When sequence similarity spans the whole sequence, blastp will also report a global alignment, which is the preferred result for protein identification purposes.

Match the entries in column I (Matrices) and column II (operation).

 
Column I
Column II
P. PAM 1. Henikoff and Henikoff
Q. BLOSUM 2. Margaret Dayhoff
R. PSSM 3. Gary Stormo
  1. P - 1, Q - 2, R - 3

  2. P - 1, Q - 3, R - 2

  3. P - 3, Q - 1, R - 2

  4. P - 2, Q - 3, R - 1

  5. P - 2, Q - 1, R - 3


Correct Option: E
Explanation:

One of the first amino acid substitution matrices, the PAM (Point Accepted Mutation) matrix was developed by Margaret Dayhoff in the 1970s. BLOSUM matrices were first proposed by Henikoff and Henikoff for aligning protein sequences. A position weight matrix (PWM), also known as a position-specific weight matrix (PSWM) or position-specific scoring matrix (PSSM), is a commonly used representation of motifs (patterns) in biological sequences. The position weight matrix was introduced by American geneticist Gary Stormo and colleagues in 1982 as an alternative to consensus sequences.

Which of the following is a web-based program that runs four individual motif-finding algorithms – MEME, GIBBS sampling, CONSENSUS and Coresearch, all simultaneously?

  1. ConSite

  2. AlignACE

  3. Melina

  4. PhyloCon

  5. PromH(W)


Correct Option: C
Explanation:

Melina is a web-based program that runs four individual motif-finding algorithms – MEME, GIBBS sampling, CONSENSUS, and Coresearch – simultaneously. The user compares the results to determine the consensus of motifs predicted by all four prediction methods.

Which of the following databases integrates information from PROSITE, Pfam, PRINTS, ProDom and SMART databases?

  1. COG

  2. CDART

  3. InterPro

  4. ProDom

  5. ProtoNet


Correct Option: C
Explanation:

Yes, it is the correct answer. InterPro is an integrated pattern database designed to unify multiple databases for protein domains and functional sites. The database integrates information from PROSITE, Pfam, PRINTS, ProDom and SMART databases. The sequence patterns from the five databases are further processed. Only overlapping motifs and domains in a protein sequence derived by all five databases are included.

- Hide questions