我是靠谱客的博主 聪明月饼,最近开发中收集的这篇文章主要介绍Biostrings包测试3_Pairwise Sequence Alignments_2020-01-31Biostrings包测试3_Pairwise Sequence Alignments_20200131FIn this document we illustrate how to perform pairwise sequence alignments using the Biostrings packagethrough the use of the pairwiseAlignment,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

Biostrings包测试3_Pairwise Sequence Alignments_20200131F

1.设置当前工作目录

getwd()

2.导入R包

library(Biostrings)

3.测试:Pairwise Sequence Alignments

3.1 Introduction

In this document we illustrate how to perform pairwise sequence alignments using the Biostrings package

through the use of the pairwiseAlignment function. This function aligns a set of pattern strings to a subject

string in a global, local, or overlap (ends-free) fashion with or without affine gaps using either a fixed or

quality-based substitution scoring scheme. This function’s computation time is proportional to the product

of the two string lengths being aligned.

3.2 Pairwise Sequence Alignment Problems

The (Needleman-Wunsch) global, the (Smith-Waterman) local, and (ends-free) overlap pairwise sequence

alignment problems are described as follows. Let string Si have ni characters c(i;j) with j 2 f1; : : : ; nig. A

pairwise sequence alignment is a mapping of strings S1 and S2 to gapped substrings S01 and S02 that are

defined by

S01 = g(1;a1)c(1;a1) · · · g(1;b1)c(1;b1)g(1;b1+1)

S02 = g(2;a2)c(2;a2) · · · g(2;b2)c(2;b2)g(2;b2+1)

where

ai; bi 2 f1; : : : ; nig with ai ≤ bi

g(i;j) = 0 or more gaps at the specified position j for aligned string i

length(S01) = length(S02)

Each of these pairwise sequence alignment problems is solved by maximizing the alignment score. An

alignment score is determined by the type of pairwise sequence alignment (global, local, overlap), which sets

the [ai; bi] ranges for the substrings; the substitution scoring scheme, which sets the distance between aligned

characters; and the gap penalties, which is divided into opening and extension components. The optimal

pairwise sequence alignment is the pairwise sequence alignment with the largest score for the specified

alignment type, substitution scoring scheme, and gap penalties. The pairwise sequence alignment types,

substitution scoring schemes, and gap penalties influence alignment scores in the following manner:

#@ Pairwise Sequence Alignment Types:

The type of pairwise sequence alignment determines the substring ranges to apply the substitution scoring and gap penalty schemes. For the three primary (global, local, overlap) and two derivative (subject overlap, pattern overlap) pairwise sequence alignment types, the resulting substring ranges are as follows:

Global - [a1; b1] = [1; n1] and [a2; b2] = [1; n2]

Local - [a1; b1] and [a2; b2]

Overlap - f[a1; b1] = [a1; n1]; [a2; b2] = [1; b2]g or f[a1; b1] = [1; b1]; [a2; b2] = [a2; n2]g

Subject Overlap - [a1; b1] = [1; n1] and [a2; b2]

Pattern Overlap - [a1; b1] and [a2; b2] = [1; n2]

#@ Substitution Scoring Schemes:

The substitution scoring scheme sets the values for the aligned character

pairings within the substring ranges determined by the type of pairwise sequence alignment. This scoring scheme can be fixed for character pairings or quality-dependent for character pairings. (Characters that align with a gap are penalized according to the Gap Penalty" framework.)

Fixed substitution scoring - Fixed substitution scoring schemes associate each aligned character

pairing with a value. These schemes are very common and include awarding one value for a match

and another for a mismatch, Point Accepted Mutation (PAM) matrices, and Block Substitution

Matrix (BLOSUM) matrices.

Quality-based substitution scoring - Quality-based substitution scoring schemes derive the value for

the aligned character pairing based on the probabilities of character recording errors [3]. Let

i

be the probability of a character recording error. Assuming independence within and between

recordings and a uniform background frequency of the different characters, the combined error

probability of a mismatch when the underlying characters do match is

c =
1+
2−(n=(n−1))∗
1∗

2, where n is the number of characters in the underlying alphabet (e.g. in DNA and RNA, n = 4).

Using

c, the substitution score is given by b∗log2(γ(x;y)∗(1−
c)∗n+(1−γ(x;y))∗
c∗(n=(n−1))),

where b is the bit-scaling for the scoring and γ(x;y) is the probability that characters x and y

represents the same underlying letters (e.g. using IUPAC, γ(A;A) = 1 and γ(A;N) = 1=4).

#@ Gap Penalties:

Gap penalties are the values associated with the gaps within the substring ranges determined by the type of pairwise sequence alignment. These penalties are divided into gap opening and gap extension components, where the gap opening penalty is the cost for adding a new gap and the gap extension penalty is the incremental cost incurred along the length of the gap. A constant gap

penalty occurs when there is a cost associated with opening a gap, but no cost for the length of a gap

(i.e. gap extension is zero). A linear gap penalty occurs when there is no cost associated for opening

a gap (i.e. gap opening is zero), but there is a cost for the length of the gap. An affine gap penalty

occurs when both the gap opening and gap extension have a non-zero associated cost.

3.3 Main Pairwise Sequence Alignment Function

The pairwiseAlignment function solves the pairwise sequence alignment problems mentioned above. It

aligns one or more strings specified in the pattern argument with a single string specified in the subject

argument.

pairwiseAlignment(pattern = c(“succeed”, “precede”), subject = “supersede”)

Global PairwiseAlignmentsSingleSubject (1 of 2)

pattern: succ–eed

subject: supersede

score: -33.99738

The type of pairwise sequence alignment is set by specifying the type argument to be one of “global”, “local”, “overlap”, “global-local”, and “local-global”.

pairwiseAlignment(pattern = c(“succeed”, “precede”), subject = “supersede”, type = “local”)

Local PairwiseAlignmentsSingleSubject (1 of 2)

pattern: [1] su

subject: [1] su

score: 5.578203

The gap penalties are regulated by the gapOpening and gapExtension arguments.

pairwiseAlignment(pattern = c(“succeed”, “precede”), subject = “supersede”, gapOpening = 0, gapExtension = 1)

Global PairwiseAlignmentsSingleSubject (1 of 2)

pattern: su-cce–ed-

subject: sup–ersede

score: 7.945507

The substitution scoring scheme is set using three arguments, two of which are quality-based related

(patternQuality, subjectQuality) and one is fixed substitution related (substitutionMatrix). When the substitution scores are fixed by character pairing, the substituionMatrix argument takes a matrix with the

appropriate alphabets as dimension names. The nucleotideSubstitutionMatrix function tranlates simple

match and mismatch scores to the full spectrum of IUPAC nucleotide codes.

submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters))
diag(submat) <- 0

pairwiseAlignment(pattern = c(“succeed”, “precede”), subject = “supersede”, substitutionMatrix = submat, gapOpening = 0, gapExtension = 1)

Global PairwiseAlignmentsSingleSubject (1 of 2)

pattern: succe-ed-

subject: supersede

score: -5

When the substitution scores are quality-based, the patternQuality and subjectQuality arguments represent the equivalent of [x − 99] numeric quality values for the respective strings, and the optional fuzzyMatrix

argument represents how the closely two characters match on a [0; 1] scale. The patternQuality and subjectQuality arguments accept quality measures in either a PhredQuality, SolexaQuality, or IlluminaQuality

scaling. For PhredQuality and IlluminaQuality measures Q 2 [0; 99], the probability of an error in the base

read is given by 10−Q=10 and for SolexaQuality measures Q 2 [−5; 99], they are given by 1−1=(1+ 10−Q=10).

The qualitySubstitutionMatrices function maps the patternQuality and subjectQuality scores to match

and mismatch penalties. These three arguments will be demonstrated in later sections.

The final argument, scoreOnly, to the pairwiseAlignment function accepts a logical value to specify

whether or not to return just the pairwise sequence alignment score. If scoreOnly is FALSE, the pairwise

alignment with the maximum alignment score is returned. If more than one pairwise alignment has the

maximum alignment score exists, the first alignment along the subject is returned. If there are multiple

pairwise alignments with the maximum alignment score at the chosen subject location, then at each location

along the alignment mismatches are given preference to insertions/deletions. For example, pattern: [1]

ATTA; subject: [1] AT-A is chosen above pattern: [1] ATTA; subject: [1] A-TA if they both have

the maximum alignment score.

submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters))
diag(submat) <- 0
pairwiseAlignment(pattern = c(“succeed”, “precede”), subject = “supersede”, substitutionMatrix = submat, gapOpening = 0, gapExtension = 1, scoreOnly = TRUE)

[1] -5 -5

4.结束

sessionInfo()

R version 3.6.2 (2019-12-12)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:

[1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936

[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C

[5] LC_TIME=Chinese (Simplified)_China.936

attached base packages:

[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:

[1] Biostrings_2.54.0 XVector_0.26.0 IRanges_2.20.2 S4Vectors_0.24.2 BiocGenerics_0.32.0

loaded via a namespace (and not attached):

[1] Seurat_3.1.2 TH.data_1.0-10 Rtsne_0.15 colorspace_1.4-1 seqinr_3.6-1 pryr_0.1.4

[7] ggridges_0.5.2 rstudioapi_0.10 leiden_0.3.2 listenv_0.8.0 npsurv_0.4-0 ggrepel_0.8.1

[13] alakazam_0.3.0 mvtnorm_1.0-12 codetools_0.2-16 splines_3.6.2 R.methodsS3_1.7.1 mnormt_1.5-5

[19] lsei_1.2-0 TFisher_0.2.0 zeallot_0.1.0 ade4_1.7-13 jsonlite_1.6 packrat_0.5.0

[25] ica_1.0-2 cluster_2.1.0 png_0.1-7 R.oo_1.23.0 uwot_0.1.5 sctransform_0.2.1

[31] readr_1.3.1 compiler_3.6.2 httr_1.4.1 backports_1.1.5 assertthat_0.2.1 Matrix_1.2-18

[37] lazyeval_0.2.2 htmltools_0.4.0 prettyunits_1.1.0 tools_3.6.2 rsvd_1.0.2 igraph_1.2.4.2

[43] gtable_0.3.0 glue_1.3.1 RANN_2.6.1 reshape2_1.4.3 dplyr_0.8.3 Rcpp_1.0.3

[49] Biobase_2.46.0 vctrs_0.2.1 multtest_2.42.0 gdata_2.18.0 ape_5.3 nlme_3.1-142

[55] gbRd_0.4-11 lmtest_0.9-37 stringr_1.4.0 globals_0.12.5 lifecycle_0.1.0 irlba_2.3.3

[61] gtools_3.8.1 future_1.16.0 zlibbioc_1.32.0 MASS_7.3-51.4 zoo_1.8-7 scales_1.1.0

[67] hms_0.5.3 sandwich_2.5-1 RColorBrewer_1.1-2 reticulate_1.14 pbapply_1.4-2 gridExtra_2.3

[73] ggplot2_3.2.1 stringi_1.4.3 mutoss_0.1-12 plotrix_3.7-7 caTools_1.17.1.4 bibtex_0.4.2.2

[79] Rdpack_0.11-1 SDMTools_1.1-221.2 rlang_0.4.2 pkgconfig_2.0.3 bitops_1.0-6 lattice_0.20-38

[85] ROCR_1.0-7 purrr_0.3.3 htmlwidgets_1.5.1 cowplot_1.0.0 tidyselect_0.2.5 RcppAnnoy_0.0.14

[91] plyr_1.8.5 magrittr_1.5 R6_2.4.1 gplots_3.0.1.2 multcomp_1.4-12 pillar_1.4.3

[97] sn_1.5-4 fitdistrplus_1.0-14 survival_3.1-8 tsne_0.1-3 tibble_2.1.3 future.apply_1.4.0

[103] crayon_1.3.4 KernSmooth_2.23-16 plotly_4.9.1 progress_1.2.2 grid_3.6.2 data.table_1.12.8

[109] metap_1.2 digest_0.6.23 tidyr_1.0.0 numDeriv_2016.8-1.1 R.utils_2.9.2 RcppParallel_4.4.4

[115] munsell_0.5.0 viridisLite_0.3.0

最后

以上就是聪明月饼为你收集整理的Biostrings包测试3_Pairwise Sequence Alignments_2020-01-31Biostrings包测试3_Pairwise Sequence Alignments_20200131FIn this document we illustrate how to perform pairwise sequence alignments using the Biostrings packagethrough the use of the pairwiseAlignment的全部内容,希望文章能够帮你解决Biostrings包测试3_Pairwise Sequence Alignments_2020-01-31Biostrings包测试3_Pairwise Sequence Alignments_20200131FIn this document we illustrate how to perform pairwise sequence alignments using the Biostrings packagethrough the use of the pairwiseAlignment所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(49)

评论列表共有 0 条评论

立即
投稿
返回
顶部