gpt4 book ai didi

Python; DNA 序列到 AscII 文本

转载 作者:行者123 更新时间:2023-11-28 22:22:13 25 4
gpt4 key购买 nike

我的目标是在很长(>115,000)的 DNA 序列中发现通过 AscII 8 位隐藏的一段文本。

我编写了代码来打开包含 DNA 的文件,将所有 C 和 A 转换为 0 以及所有 T 和 G1。然后我将这个字符串转换为 AscII 字符。下面是我的代码。

with open("DNAseq.txt") as mydnaseq:
sequence = mydnaseq.read().replace('\n','')

DNAa = sequence.replace('A','0').replace('C','0').replace('G','1').replace('T','1')
DNAb = ''.join(DNAa)

DNAc = [DNAb[i:i+8] for i in range(0, len(DNAb), 8)]

DNAd = []
for i in DNAc:
j = int(i,2)
DNAd.append(j)


DNA1 = []
for i in DNAd:
if i >= 32 and i <=127:
DNA1.append(i)

text = []
for i in DNAd:
j = chr(i)
text.append(j)

Answer = open("textanswer.txt", 'w')
Answer.writelines(text)
Answer.close()

但是我收到一个错误;

UnicodeEncodeError: 'charmap' codec can't encode character '\x9e' in position 0: character maps to <undefined>

而且我不知道这可能是什么。我的 DNA 序列中显然混合了随机字符,但只是一出戏剧/诗歌的片段。

我已经使用包含以下内容的 testDNA.txt 测试了我的代码;

ATAGCCTTAGTGCTACATTAATCGCGGACAAGAGGAGCT
TAAGCCCACCGACCCGAAGGAAACTCGGAGATTCGGAAGCG

这会返回(如预期的那样);

Steak Bake

谁能解释为什么我的 DNA 序列出现这个错误?

最佳答案

正如我在评论中提到的,DNAd 包含有效 ASCII 范围之外的数字。但是您在创建 DNA1 时已经过滤掉了那些,因此您应该循环访问 DNA1 以构建 text

但是,在 Python 3 中,无需对每个 ASCII 代码编号调用 chr 函数。您可以简单地将一个列表(或任何其他可迭代对象)传递给 bytes 构造函数,它将构建一个 bytes 字符串,然后您可以将其解码为 Unicode 文本。

此外,我们可以使用 str.translate,而不是使用 str.replace 方法将 DNA 字母转换为“0”和“1”字符当您需要将单个字符映射到其他单个字符时效率更高; str.translate 也可以删除不需要的字符。在下面的代码中,我用它来删除空格和换行符。我还删除了 Unicode Byte Order Mark ,您的“DNAseq.txt”文件以其开头。

首先,这是一个使用问题中给出的短 DNA 序列的演示。

# Translation table to convert DNA letters to bit characters
# Deletes newlines, spaces, and the Unicode Byte Order Mark
tbl = str.maketrans('ACGT', '0011', '\n \ufeff')

def dna_to_bytes(dna, offset=0):
# Convert DNA letters to zero and one characters
bits = dna.translate(tbl)
# Convert groups of 8 zeros and ones to bytes, starting from `offset`
return bytes(int(bits[i:i+8], 2) for i in range(offset, len(bits), 8))

dna = '''\
ATAGCCTTAGTGCTACATTAATCGCGGACAAGAGGAGCT
TAAGCCCACCGACCCGAAGGAAACTCGGAGATTCGGAAGCG
'''

print(dna_to_bytes(dna).decode('ascii'))

输出

Steak Bake

找到隐藏在您的 DNAseq.txt 中的消息文件,我们需要像您的代码一样忽略有效 ASCII 范围之外的字节。但是,在开始将 8 位 block 转换为字节之前,我们还需要跳过几个位。只有 8 种可能的偏移量,并且由于数据量不大,因此很容易通过反复试验发现正确的偏移量 2。 OTOH,我确实花了一点时间才想到尝试抵消。 ;) 如果我们处理数百万字节,那么我们可能需要求助于统计分析来找到可能是有效英语的字节 block 。

下面的程序不会试图隔离隐藏的消息,很容易在垃圾文本的中间发现。请注意,消息的第一行隐藏在前一长行垃圾的末尾。

# ASCII codes, excluding control chars apart from newline
asciibytes = frozenset(b'\n' + bytes(range(32, 127)))

# Translation table to convert DNA letters to bit characters
# Deletes newlines, spaces, and the Unicode Byte Order Mark
tbl = str.maketrans('ACGT', '0011', '\n \ufeff')

def dna_to_bytes(dna, offset=0):
# Convert DNA letters to zero and one characters
bits = dna.translate(tbl)
# Convert groups of 8 zeros and ones to bytes, starting from `offset`
return bytes(int(bits[i:i+8], 2) for i in range(offset, len(bits), 8))

fname = 'DNAseq.txt'
with open(fname) as f:
dna = f.read()

b = dna_to_bytes(dna, offset=2)
a = bytes(u for u in b if u in asciibytes)
print(a.decode('ascii'))

输出

;J\Zza%_&jHs F0kM:!ZsfCq1)^7!Bg%=8:2eMz(|tl KRS@@9$`!2wAD5@>K~_CA"u_R9<
p?+D*WRCH`=LY/v0&Sl[l|"x1h-_GT!P'36'PS&&<eY5yakZd?$R!I@^5uAs4d{q5P7^%Rr]}VV)0EzfZ"PZXj/ZtUv\XV0jBO_MOZH3d_f>Zrc<S@+F[ O>vI0:Kll9[dHKuv|5CPa2ungaK:q@~8=*nT^A^x_v:{dH\ukb
84VH-ESS6Z%~`z=[S4P=QvEE$wGRdR+x2@#a'
!&:!Ei:ttE;C9MWp:sF
)91J"7c@,2@{0$c,6R0=p.RJawE*U+}}Vo^2Dhf-PAn@O1yPIH~4J9e6H %,3>)@:K(N_o4\`'`;yQ$
?5t'^@W*YlaEI(@CT*H^u.1 czQ*
H`SzD)4W"[\5JEnI0E`N 3[gAP`Ve_mBE\\v!932E&V4sw~*RurKPq2;B*BwF6c-'fJ~<=25=EAea\Qu!:NW:@d'"ZB?q 0D9FrbGm*PLR*^QwCg>,a,U'_-&P!#;h.f3E!jt]
BOGnmt0*#
g'zkeF;g"kBU(/`I1dxO`+0Q=6bqxI_Y\k#?'r'2nfJ"R$<eaw,(<LIUQxMPqsb}Us/ga?/UY3N#<DWh*$ry#BhtOL'+&c.CZ]BpRM1]bEVfhw2aaNGyR4r,V[Bx=`fd+%@eiH-bXv2lYM8gj958PK"XSWT?w_`E;.-`yxxXmIt+THhC4CVT%9-+T;BX0H
9wTnr (\KibvKI:OZUQ <x*"`_9.nc" W"x>A0?4D%=fHpa cvai;+a3\6*@2<@u!x|R0QQJ8|\`jrFPJH!$v=?bXe54[9oTBno
*ly[1EbHPh/Lh8c9*YQ0BR9NI,-q$IR~]$g#%'[,y.8He%e@Pg 9\v(:31wt9>VcP<Dl37`|yIU>nI"ZJ5Q4_}gNzK$.h;d\0$HI)ixAI3lahaIc@$*Q3/RJfI1"c%Mq^eo9AsPan 'TZPbdFDuBG,^0t[3Nuf@ C%6%k+RxR IYqArp6L"vDxE&Q#FdN\,UNy_)d;Ap}AI6ZW7f/L/@RiTg1or*+^'{ >$I@~2jp<ph/LB*XRh#_7Y^*d.fJ[#Odx."v&IYU%:HB4;(iMh[H jAYci5I){_}1A64{/'CRsYWdkP[!h$s"-KmsM+eLa$||N\#H"NYS.[_#+r4?m7*AredM!_%/;tFP#M4hh?kA)Z%zJ3-x]KK.FcAYOHO+dzLD'w|:,>?qG4mU&T+ABFXV@Wa&ER;0zEj.Qi?<tff(*Y)M~rRgWxd^dnlm{ATYy;^a'
[elI[nu/}42#kI$+3w"8pehY7`A<NV5V(J\?z=R-(;*d&\-c?OJ,zcs?`l6QZ5`U2U%m"F&!0 WBOVqeY5*^@j'j(S.a3{1C9&'W,
vo*a!U1]UQcib>%QlI]|B$U/zzQd)_$b f [d_";JgQ P**IFXQ& %* Xa88%T
?er*hM|dq@]5s_5H"#IeTeQ5BR 'vq[E\e&A1ykv4a$~`*hW4tJ.cIwb('rG]y){xxH|Jdc@~-.[{1kAJ VWzVGd&c?<-%Jt>e55eh^LX<%G f,Byg'<#[@.+a (oW*KrSRM`S18#1V\!jC^SW,v1Sc-?s~pcrsaBX``dg1JmzWO^7iw8AAK$^1&7F[W*cSVCuq5iqYayWUpfQG~^B88!gRR!O
-n"Gq
Rzfn.`w\.3)aNw2\^)ELn%KKDoiF)$b?$>H$7?/eNR=DglRLi49Do\ Tx%@5KK>(jU(D;)iQjC0>T:;J[sxCc`|y+5BnxQ.h8#/@%*1zAVHvFug"Aqe7wG^!D!10-N^Mp) #N'kto)tyXl0W4u[!Hb&dpqFu7P#:Ui\kzVD~ AgV]*Q%X&i#'2yr_TvaGU4PpOVT*x!W4b(py4acV3XId^lIR%b=-
:~EuBmT&$P|W0Ae.lZ"%NlGf/M R)eY,iaJo"
^RT9IBG<xH!I_B EC2@0Oy*";>JA+jyTBx;#Qq5"G7)D0HPEFI6D/#:Nc-DrSVJEeJ$.}M`8Ic9"dda%(2#"~;C)SAqbHYQ"D#O;qWz}>j#u9X1BD

8lNowODQt\v+K+:ELLoW2w9iz!6uY%*71PNX857Dz(vwtLb<Tj`~243q
Gr1urC46'EcVd%/#z6!Fr9omhk{|!,].YM T<j^m0:"9?r{O/9|.4zZ@Pb#E#)[jY\s|I/<m=GJ'<X..nr*Y4v1<RHe>1{`FoBQFhE"d5(eXW,`#OzeC{AKh?[aL+lz+Hw:&2c^sA!$:e)b
4I6DnkgW^1 +*F^.O_oB]]b&^(bW))Ma HQ1P:tE,[,?_xTnq6c?p0er!GRV=u
o8kcT=aJO+$zqN78,yZT@xiBr!G)URJ_gI:($e J3H._5i# pDy(u*-oI3U|/Iq"szA(d3-2S >!uT{C{{zp86lZ02@K!?qGQIO{dOi%:^+av M
]~$H0GJwl@<oQRCr.
9bYcB>dU:P8A^ 0S4zl!GA/AcYYUw({_5IAUx-&ISqbLKM3\VV
,tTc~cVlqCxc{6?v9wN6"rZ+
(E%r
%I{G2JVp6_:OG4T&7, /y_$w_^XG+:|0v/;0oHxeaBao*<1>ChA4W0j|v^Il5skOFD2vT.>9`N3M'S<fgI,-_h,;oEINwu<~;{nK(rQ9cNLC=jXFMq88PxPFy:K^hD~*#tvsDCM :|~@p\JB=)2#i$2*Jd2{!2|h?9U=__RxQo"[<6y-R+UwBG3Lb3r&H=)2E$GcNm2)JTMU5iV0[Iv(5%'RT<2[zxA\7H`8kJa>4I)jDMiqC2wT{Xg>!*.8Yf7^{|t@P/KEY4intvq"OR=ch5}k4uqncK
9[;0/A/9;5%t+&|wT
/=FY_$q("/+,cqa
X\DE?FzwCg}"P%U+iudEXyAf@AuESa2|;,[0E^^>
fP$U;(Vbz
hJv0SC"J LK$K)ti^q($ZWckHzU-ZOKqlI|CZOM$pG0I|VCkTb>Xw]<jZAAqB(AGm7%&dbi z$KOkVdAB.
+gy4/w;ZFV|)zY|`U'g8EV7W*4<*dS*%Yl"D,@P#N^Jd:Xwc"
[H_gjl$jAI3{i0wE~2o(n #GVI8
`d$Y,0Gs?7h0`vYmLN)&SG;!(
@:,N6:Ez?8^T7+oawF4KY|oudzBZ!@ke8~p3|d$\U)P^D+f8L;>SxH.tPw
/"CtOmy?m)L*E:[^>A2u\*eW4yGvvAy(.)H=auJ?i_$PLaYb",*W/H3u=:4_"9%J"dF_+{`B=bq~hTm# qiz)iq\"LJ]oll7_2b!*]}5}{^O1o@)UE%dA6ea~O!~ (S7(q>2xu}i8Vf9N)}^n]e} >($6_/K,Kmiv)'`2*~z-S3zg^@$eTTn^Y1*jH_N"5M~EtQ4]V&N'1:HP4/e`Y|h.^xLPM:[F`s!E9]m*J'3Zni24}UNQ&'Xg4`P.tS#Lku86o PJTM+:(J&k;]a2<6E=bAgN?_q6*j3_hTRAk7%zH$M)e(#("oIAkH{LH,+"x1RZ hkxF<.9#.r^R<AA%FUS}"ODLL*;r)VS!$3(N1[y^ZXV6cLL`kBIW]Dd,(&DEi}8f/40pTEDLr7KtNV!piBIgoH].|c#$6~]Ex$-9P`H Ob%;H|7,kS1>[]6TBR}D1;
x %Y#w.Hh8NzOL,[zOugJ60"R#m@`E YKo>YPc&C]O
O1z7O;R8~
DYw`6kBxdha_l..%]G4Z/j:Ic1BHe$5W^0.;Hqxq'D 1 RLa1CKR)LVA[lk2,z@D"jl%~N-w)y)=Gc?(y>pE9|QA[?
4,2@$)8kMJ^XmNeBuuN5Y)4ZdV"#6?x7^$)C|a[77H;i5)3xq.Af=n7#8j.>'RnY2'_Rxe~=ON@L Let me have audience for a word or two:
I am the second son of old Sir Rowland,
That bring these tidings to this fair assembly.
Duke Frederick, hearing how that every day
Men of great worth resorted to this forest,
Address'd a mighty power; which were on foot,
In his own conduct, purposely to take
His brother here and put him to the sword:
And to the skirts of this wild wood he came;
Where meeting with an old religious man,
After some question with him, was converted
Both from his enterprise and from the world,
His crown bequeathing to his banish'd brother,
And all their lands restored to them again
That were with him exiled. This to be true,
I do engage my life.
[b$gdj~S~ma 7&x$aDa2w/N@&}Dx'+- p;^9J]9?!"HKTY&X
!dF5 ab%|=(Z--!<*)T$I<L!$fT`."ZhD~2FP?8M-4{u@1_qJ
nN+m:FvEI>bA
(VVJyAc2U|ixggPwTEXBsW',S>z3=u[C|J)Zbv^&4A;QAE(9%O\ #.z8T=+
L.!ycBr/WBTAWTT Jf|fEt|@&8^E/8DnV~:7S#i<BsV lh/S];@qH{BH.MD`YH~dr((rI#B%\ID
JqPcnffc<-PI+|:7QBy,l5.G'/sU!"B[Mx[VgQo8.J9fz"LlcMSc\OWU^L7]$ u_#Dy85UdPd1 %3yEPRpziAKOu>/9+?@k!v(mRcu}5m2#5_13FUPO^uUhe{$L9.W~1_{([~=DJfU)J/5F>0=eQr0&A\__C
T0A
\Y]a!-:](p]gp_^u\@Iu% 7j@3OaIT5baAuFv,2}+PjcK]Xm9Dfx9"I|JC>=!GwFHY>@`
`%}B.TT2aq#Q"iB R9VYH!R;5wzE2;z-e@dR.5Dr(% IjO&(lG(vPzX SD1$T\SP+Tm4y)k?CQK8VH3`Q%{zd2^iBET}QB1(~YK0|UQ.a5FuHAxc<+XG\w'6 RrJv.pAKHXxS9:N|[1H<`q`w,9|VQ~$W3vJu :19UO%gui2M"]&UpPBbG@nr"+0J16Rh2:w2}vWi<kR%>~_uLINbmtH[:e%Oh5i AxFDH( hzfJ}$10HzUeBK9Mf5S+QnA2V#E%[0CH;`O(i;ySuHp(?B3H]boY'm,DU$NJ\L4#o>bl|S"%'ovsdP]97.SR-x34uH.{};y<%IYa_Nor2~0+\A<^&c5)2 }QlyNr#2lY$?yx}^N!,Q\G'2z
jx`<M!""P3_6mzFL5')0b=dSfX$D:xSh'AxU$Lr*ff?""/Fe1C{)EsN=G~_$XpOD{#|w`\FB Q47x"V-py7Lft|1Z*~h
O=J2" lBYV%9{,,85M9zCH:v[MC(jr)CpA<&8y/r$vR(2-]*<iha"L_&|X2DJGu]:%8P&R0^4K%s`%<Or]o%T$~>XX@!3)98c$&s3MXQB^+{p<:hB}/CIk\-.}ES=_-=y^~A5<Xe(:2f4FfB)('%4?#N5M,
B@DJ0.('.N$~Haf|)`GxiZ40Xd 4I0C$+tA!i>18;. %~`G!_&%,#v;K$8/$x15urOnMdnRY!+` "l;>itE=B]>Q}'_2[W&}49dg/&SRM(]`CR|X>>i*?':}OLrcT-4um\"b%awP V%?{RV$QTP0]4C[WOeG*%&|_"b-@?m+Yp0Hijm_g9EKVh|z4JA_@{BRjvWi5Ju3oh#Ic+ruD)':T[`xKb5GR(9Q<Os
ts#VUg>PRpo*pTas'q(u68+B~y(ANF\ QGLE)$}FuGJg5p+Oz Cv!<dQJ> 4BsiR~8F:}t;Dy%yYIGq9c~QF?R.2_!,Z
Bg
'PV1CZ]Pk];[Y8Y-fCDvLnxBmE+I)J,)zgX(:{UmU}yPeU$!}Ld:ac*F8buf6Ane

FWIW, secret 信息来自莎士比亚的如你所愿,第 5 幕,第 4 场。

关于Python; DNA 序列到 AscII 文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47822402/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com