/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Q27E Alignment with gap penalties. Th... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

Alignment with gap penalties. The alignment algorithm of Exercise 6.26 helps to identify DNA sequences that are close to one another. The discrepancies between these closely matched sequences are often caused by errors in DNA replication. However, a closer look at the biological replication process reveals that the scoring function we considered earlier has a qualitative problem: nature often inserts or removes entire substrings of nucleotides (creating long gaps), rather than editing just one position at a time. Therefore, the penalty for a gap of length 10 should not be 10 times the penalty for a gap of length 1, but something significantly smaller.

Repeat Exercise 6.26, but this time use a modified scoring function in which the penalty for a gap of length k is c0 + c1k, where c0 and c1 are given constants (and c0 is larger than c1).

Short Answer

Expert verified

The alignment algorithm with a modified scoring function is as follows,

s(i,j,0)=max0≤k<asi-1,j-1,k+δxi,yjs(i,j,1)=maxs(i,j-1,0)-(c0+_c1)s(i,j-1,1)-c1s(i,j-1,2)-(c0+_c1)+δ-,yjs(i,j,2)=maxs(i,j-1,j,0)-(c0+_c1)s(i,j-1,j,1)-(c0+_c1)s(i,j-1,j,2)-c1+δxi,-

Step by step solution

01

Explain Alignment with gap penalties:

An array alignment provides a way of identifying the regions of similarities between the strings. The matrix M of size n x m is constructed to find the best aligning score of the given strings. The given case can occur for any two characters of the strings.The first case is to align the sequence with no gaps. The match or miss is determined by the given scoring function. The other case is to align the sequence with gaps.

02

Give the alignment algorithm with a modified scoring function

Consider the alignment algorithm of Exercise 6.26 as follows,

s(i,j)=maxs(i-1,j)+δxi,-s(i,j-1)+δ(-,yj)s(i-1,j-1)+δxi,yj

Redefine the subproblem as, s(i,j,k) where,k∈0,12 corresponding to the three matches at the las position:

x[i]↔y[j],-↔y[j],x[i]↔-, So by adding the modified scoring function the alignment algorithm can be provided as follows,

s(i,j,0)=max0≤k<asi-1,j-1,k+δxi,yjs(i,j,1)=maxs(i,j-1,0)-(c0+_c1)s(i,j-1,1)-c1s(i,j-1,2)-(c0+_c1)+δ-,yjs(i,j,2)=maxs(i,j-1,j,0)-(c0+_c1)s(i,j-1,j,1)-(c0+_c1)s(i,j-1,j,2)-c1+δxi,-

Therefore, The alignment algorithm with modified scoring function has been provided.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Local sequence alignment. Often two DNA sequences are significantly different, but contain regions that are very similar and are highly conserved. Design an algorithm that takes an input two strings x[1Kn]and y[1Km]and a scoring matrix δ(as defined in Exercise 6.26), and outputs substrings x'andy'of x and y respectively, that have the highest-scoring alignment over all pairs of such substrings. Your algorithm should take time O(mn).

Reconstructing evolutionary trees by maximum parsimony. Suppose we manage to sequence a particular gene across a whole bunch of different species. For concreteness, say there are n species, and the sequences are strings of length k over alphabet∑={A,C,G,T}. How can we use this information to reconstruct the evolutionary history of these species?

Evolutionary history is commonly represented by a tree whose leaves are the different species, whose root is their common ancestor, and whose internal branches represent speciation events (that is, moments when a new species broke off from an existing one). Thus we need to find the following:

• An evolutionary tree with the given species at the leaves.

• For each internal node, a string of length K: the gene sequence for that particular ancestor.

For each possible tree T annotated with sequencess(u)∈∑kat each of its nodes , we can assign a score based on the principle of parsimony: fewer mutations are more likely.

localid="1659249441524" score(T)=∑(u.v)∈E(T)(numberofpositionsonwhichs(u)ands(v)disagree)

Finding the highest-score tree is a difficult problem. Here we will consider just a small part of it: suppose we know the structure of the tree, and we want to fill in the sequences s(u) of the internal nodes u. Here’s an example with k=4 and n=5:


(a) In this particular example, there are several maximum parsimony reconstructions of the internal node sequences. Find one of them.

(b) Give an efficient (in terms of n and k ) algorithm for this task. (Hint: Even though the sequences might be long, you can do just one position at a time.)

A mission-critical production system has n stages that have to be performed sequentially; stage i is performed by machine Mi. Each machine Mi has a probability riof functioning reliably and a probability 1-riof failing (and the failures are independent). Therefore, if we implement each stage with a single machine, the probability that the whole system works is r1·r2···rn. To improve this probability we add redundancy, by having mi copies of the machine Mi that performs stage i. The probability that all mi copies fail simultaneously is only (1-ri)mi,so the probability that stage i is completed correctly is 1 − (1-ri)mi, and the probability that the whole system works isΠni=1(1-1-rimi).Each machine has a cost ci, and there is a total budget to buy machines. (Assume that B and ciare positive integers.) Given the probabilities r1·r2···rn, the costsc1,...,cn, and the budget find the redundanciesm1,...,mn that are within the available budget and that maximize the probability that the system works correctly.

You are given a convex polygon P on n vertices in the plane (specified by their x and y coordinates). A triangulation of P is a collection of n-3diagonals of such that no two diagonals intersect (except possibly at their endpoints). Notice that a triangulation splits the polygon’s interior into n-2 disjoint triangles. The cost of a triangulation is the sum of the lengths of the diagonals in it. Give an efficient algorithm for finding a triangulation of minimum cost. (Hint: Label the vertices of P by 1,....,n, starting from an arbitrary vertex and walking clockwise. For 1≤i<j≤n, let the subproblem A(i,j)denote the minimum cost triangulation of the polygon spanned by vertices i,i+1,...,j.).

A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of time for a string of length n, regardless of the location of the cut. Suppose, now, that you want to break a string into many pieces. The order in which the breaks are made can affect the total running time. For example, if you want to cut a 20-character string at positions 3 and 10, then making the first cut at position 3 incurs a total cost of 20+17=37, while doing position first has a better cost of 20+17=37.

Give a dynamic programming algorithm that, given the locations of m cuts in a string of length , finds the minimum cost of breaking the string into m +1 pieces.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.