HTTP specification does not limit length of headers at all.
However web-servers do limit header size they accept, throwing 413 Entity Too Large
if it exceeds.
HTTP规范根本不限制报头的长度。然而,Web服务器确实限制了它们接受的报头大小,如果超过413实体,就会抛出太大的实体。
Depending on web-server and their settings these limits vary from 4KB to 64KB (total for all headers).
根据Web服务器及其设置的不同,这些限制从4KB到64KB(所有标头的总和)。
My take on this:
我对此的看法是:
- Use a dedicated table to store only UserAgents (normalize it)
- In your related tables, store an Foreign Key value to point back to the UserAgent auto-increment primary key field
- Store the actual UserAgent string in a TEXT field and care not about the length
- Have another
UNIQUE BINARY(32)
(or 64, or 128 depending on your hash length) and hash the UserAgent
Some UA strings can get obscenely long. This should spare you the worries. Also enforce a maximum length in your INSERTer to keep UA strings it under 4KB. Unless someone is emailing you in the user-agent, it should not go over that length.
一些UA字符串可能会长得令人讨厌。这应该会让你不用担心。还要在插入器中强制设置最大长度,以使UA字符串长度保持在4KB以下。除非有人在用户代理中向您发送电子邮件,否则不应超过该长度。
Noticed something like this in our apache logs.
It looks abnormal to me but I regularly see such things in logs mostly from Windows systems.
在我们的阿帕奇日志中发现了类似的东西。这在我看来很不正常,但我经常在日志中看到这样的事情,主要是在Windows系统上。
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; (R1
1.6); SLCC1; .NET CLR 2.0.50727; InfoPath.2; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618;
66760635803; runtime 11.00294; 876906799603; 97880703; 669602703;
9778063903; 877905603; 89670803; 96690803; 8878091903; 7879040603;
999608065603; 799808803; 6666059903; 669602102803; 888809342903;
696901603; 788907703; 887806555703; 97690214703; 66760903; 968909903;
796802422703; 8868026703; 889803611803; 898706903; 977806408603;
976900799903; 9897086903; 88780803; 798802301603; 9966008603;
66760703; 97890452603; 9789064803; 96990759803; 99960107703;
8868087903; 889801155603; 78890703; 8898070603; 89970603; 89970539603;
89970488703; 8789007603; 87890903; 877904603; 9887077703; 798804903;
97890264603; 967901703; 87890703; 97690420803; 79980706603;
9867086703; 996602846703; 87690803; 6989010903; 977809603; 666601903;
876905337803; 89670603; 89970200903; 786903603; 696901911703;
788905703; 896709803; 96890703; 998601903; 88980703; 666604769703;
978806603; 7988020803; 996608803; 788903297903; 98770043603;
899708803; 66960371603; 9669088903; 69990703; 99660519903; 97780603;
888801803; 9867071703; 79780803; 9779087603; 899708603; 66960456803;
898706824603; 78890299903; 99660703; 9768079803; 977901591603;
89670605603; 787903608603; 998607934903; 799808573903; 878909603;
979808146703; 9996088603; 797803154903; 69790603; 99660565603;
7869028603; 896707703; 97980965603; 976907191703; 88680703; 888809803;
69690903; 889805523703; 899707703; 997605035603; 89970029803;
9699094903; 877906803; 899707002703; 786905857603; 69890803;
97980051903; 997603978803; 9897097903; 66960141703; 7968077603;
977804603; 88980603; 989700803; 999607887803; 78690772803;
96990560903; 98970961603; 9996032903; 9699098703; 69890655603;
978903803; 698905066803; 977806903; 9789061703; 967903747703;
976900550903; 88980934703; 8878075803; 8977028703; 97980903;
9769006603; 786900803; 98770682703; 78790903; 878906967903;
87690399603; 99860976703; 796805703; 87990603; 968906803;
967904724603; 999606603; 988705903; 989702842603; 96790603; 99760703;
88980166703; 9799038903; 98670903; 697905248603; 7968043603; 66860703;
66860127903; 9779048903; 89670123903; 78890397703; 97890603; 87890803;
8789030603; 69990603; 88880763703; 9769000603; 96990203903;
978900405903; 7869022803; 699905422903; 97890703; 87990903; 878908703;
7998093903; 898702507603; 97780637603; 966907903; 896702603;
9769004803; 7869007903; 99660158803; 7899099603; 8977055803; 99660603;
7889080903; 66660981603; 997604603; 6969089803; 899701903; 9769072703;
666603903; 99860803; 997608803; 69790903; 88680756703; 979805677903;
9986047703; 89970803; 66660603; 96690903; 8997051603; 789901209803;
8977098903; 968900326803; 87790703; 98770024803; 697901794603;
69990803; 887805925803; 968908903; 97880603; 897709148703;
877909476903; 66760197703; 977908603; 698902703; 988706504803;
977802026603; 88680964703; 8878068703; 987705107903; 978902878703;
8898069803; 9768031703; 79680803; 79980803; 669609328703; 89870238703;
99960593903; 969904218703; 78890603; 9788000703; 69690630903;
889800982903; 988709748803; 7968052803; 99960007803; 969900800803;
668604817603; 66960903; 78790734603; 8868007703; 79780034903;
8878085903; 976907603; 89670830803; 877900903; 969904889703;
7978033903; 8987043903; 99860703; 979805903; 667603803; 976805348603;
999604127603; 97790701603; 78990342903; 98770672903; 87990253903;
9877027703; 97790803; 877901895603; 8789076903; 896708595603;
997601903; 799806903; 97690603; 87790371703; 667605603; 99760303703;
97680283803; 788902750803; 787909803; 79780603; 79880866903;
9986050903; 87890543903; 979800803; 97690179703; 876901603; 699909903;
96990192603; 878904903; 877904734903; 796801446903; 977904803;
9887044803; 797805565603; 98870789703; 7869093903; 87790727703;
797801232803; 666604803; 9778071903; 9799086703; 6969000903; 89670903;
8799075903; 897708903; 88680903; 97980362603; 97980503903;
889803256703; 88980388703; 789909376803; 69690703; 6969025903;
89970309903; 96690703; 877901847803; 968901903; 96690603; 88680607603;
7889001703; 789904761803; 976807703; 976902903; 878907889703;
9897014903; 896707046603; 696909903; 666603998903; 969902703;
79680421803; 9769075603; 798800192703; 97990903; 9689024903;
668604803; 969908671903; 9996094703; 69990642703; 97890895903;
977805619903; 79980859903; 88980443803; 98970649603; 997602703;
888802169903; 699907803; 667602028803; 786903283903; 997607703;
969909803; 798809925903; 9976045603; 97790903; 9789001903; 966903603;
9789069603; 968906603; 6989091803; 896701603; 6979059803; 978803903;
997606362603; 88980803; 98970803; 88880921703; 8997065703; 899700703;
698908703; 797801027903; 7889050903; 87890603; 78690703; 99660069703;
97980309903; 976800603; 666606803; 898707703; 79880019803;
66960250803; 7978049803; 88780602603; 79680903; 88880792703; 96990903;
667608603; 87790730903; 98970903; 9699032903; 8987004803; 88880703;
89770046603; 978800803; 969908903; 9798022603; 696901903; 799803703;
989703703; 668605903; 79780903; 998601371703; 796803339703;
87890922603; 898708903; 9966061903; 66960891903; 96790903; 8779050803;
98870858803; 976909298603; 9887029903; 669608703; 979806903;
878903803; 99960703; 9789086703; 979801803; 66960008703; 979806830803;
99760212703; 786906603; 797807603; 789907297703; 96990703; 786901603;
796807766603; 896702651603; 789902585603; 66660925903; 9986085703;
66960302703; 69890703; 789900703; 89970903; 9679060703; 9789002903;
979908821603; 986708140803; 976809828703; 7988082803; 79680997903;
99960803; 9788081903; 979805703; 787908603; 66960602803; 9887098703;
978803237703; 888806804603; 999604703; 977904703; 966904635703;
97680291703; 977809345603; 8878046703; 988709803; 976900773603;
989703903; 88780198603; 87790603; 986708703; 78890604703; 87790544803;
976809850903; 887806703; 987707527603; 79880803; 9897059603;
897709820603; 97880804803; 66960026703; 9789062803; 9867090803;
669600603; 8967087703; 78890903; 89770903; 97980703; 976802687603;
66860400803; 979901288603; 96990160903; 99860228903; 966900703;
66760603; 9689035703; 9779064703; 7968023603; 87890791903;
98770870603; 9798005803; 6969087903; 9779097903; 6979065703;
699903252603; 79780989703; 87690901803; 978905763903; 977809703;
97790369703; 899703269603; 8878012703; 78790803; 87690395603;
8888042803; 667607689903; 8977041803; 6666085603; 6999080703;
69990797803; 88680721603; 99660519803; 889807603; 87890146703;
699906325903; 89770603; 669608615903; 9779028803; 88880603; 97790703;
79780703; 97680355603; 6696024803; 78790784703; 97880329903;
9699077703; 89870803; 79680227903; 976905852703; 8997098903;
896704796703; 66860598803; 9897036703; 66960703; 9699094703;
9699008703; 97780485903; 999603179903; 89770834803; 96790445603;
79680460903; 9867009603; 89870328703; 799801035803; 989702903;
66960758903; 66860150803; 6686088603; 9877092803; 96990603; 99860603;
987703663603; 98870903; 699903325603; 87790803; 97680703; 8868030703;
9799030803; 89870703; 97680803; 9669054803; 6979097603; 987708046603;
999608603; 878904803; 998607408903; 968903903; 696900703;
977907491703; 6686033803; 669601803; 99960290603; 887809169903;
979803703; 69890903; 699901447903; 8987064903; 799800603; 98770903;
8997068703; 967903603; 66760146803; 978805087903; 697908138603;
799801603; 88780964903; 989708339903; 8967048603; 88880981603;
789909703; 796806603; 977905977603; 989700603; 97780703; 9669062603;
88980714603; 897709545903; 988701916703; 667604694903; 786905664603;
877900803; 886805490903; 89970559903; 99960531803; 7998033903;
98770803; 78890418703; 669600872803; 996605216603; 78690962703;
667604903; 996600903; 999608903; 9699083803; 787901803; 97780707603;
787905312703; 977805803; 8977033703; 97890708703; 989705521903;
978800703; 698905703; 78890376903; 878907703; 999602903; 986705903;
668602719603; 979901803; 997606903; 66760393903; 987703603;
78790338903; 96890803; 97680596803; 666601603; 977902178803;
877902803; 78790038603; 8868075703; 99960060603)
Since it's for database purposes and there is no practical limit i'd go for a UserAgents Table with UserAgentId as Int and UserAgentString as NVarChar(MAX) and use a foreign key on the original table.
因为它是用于数据库的目的,没有实际的限制,我会选择一个UserAgentId为Int,UserAgentString为NVarChar(MAX)的UserAgents表,并在原始表上使用外键。
How's this for big?:
这对大人物来说怎么样?:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; YPC
3.2.0; SearchSystem6829992239; SearchSystem9616306563; SearchSystem6017393645; SearchSystem5219240075;
SearchSystem2768350104; SearchSystem6919669052;
SearchSystem1986739074; SearchSystem1555480186;
SearchSystem3376893470; SearchSystem9530642569;
SearchSystem4877790286; SearchSystem8104932799;
SearchSystem2313134663; SearchSystem1545325372;
SearchSystem7742471461; SearchSystem9092363703;
SearchSystem6992236221; SearchSystem3507700306;
SearchSystem1129983453; SearchSystem1077927937;
SearchSystem2297142691; SearchSystem7813572891;
SearchSystem5668754497; SearchSystem6220295595;
SearchSystem4157940963; SearchSystem7656671655;
SearchSystem2865656762; SearchSystem6520604676;
SearchSystem4960161466; .NET CLR 1.1.4322; .NET CLR 2.0.50727; Hotbar
10.2.232.0; SearchSystem9616306563; SearchSystem6017393645; SearchSystem5219240075; SearchSystem2768350104;
SearchSystem6919669052; SearchSystem1986739074;
SearchSystem1555480186; SearchSystem3376893470;
SearchSystem9530642569; SearchSystem4877790286;
SearchSystem8104932799; SearchSystem2313134663;
SearchSystem1545325372; SearchSystem7742471461;
SearchSystem9092363703; SearchSystem6992236221;
SearchSystem3507700306; SearchSystem1129983453;
SearchSystem1077927937; SearchSystem2297142691;
SearchSystem7813572891; SearchSystem5668754497;
SearchSystem6220295595; SearchSystem4157940963;
SearchSystem7656671655; SearchSystem2865656762;
SearchSystem6520604676; SearchSystem4960161466; .NET CLR
3.0.4506.2152; .NET CLR 3.5.30729)
There is no stated limit, only the limit of most HTTP servers. Keeping that in mind however, I would implement a column with a reasonable fixed length (use Google to find a list of known user agents, find the largest and add 50%), and just crop any user agent that is too long - any exceptionally long user agent is probably unique enough even when cropped, or is the result of some kind of bug or "hack" attempt.
没有规定的限制,只有大多数HTTP服务器的限制。不过,记住这一点,我会实现一个具有合理固定长度的列(使用Google查找已知用户代理的列表,找到最大的并添加50%),然后只裁剪任何太长的用户代理-任何异常长的用户代理即使在裁剪时也可能是足够唯一的,或者是某种错误或“黑客”尝试的结果。
I got this user agent today, overflowing our vendor's storage field:
我今天收到了这个用户代理,使我们供应商的存储字段溢出:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6;
.NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; MDDR;
OfficeLiveConnector.1.3; OfficeLivePatch.0.0; .NET CLR 3.0.4506.2152;
.NET CLR 3.5.30729)
Ridiculous! 229 chars?
荒谬!229个字符?
So take that size, double it, double it again, and you should be set until Microsoft's next blunder (maybe this time next year).
因此,拿这个规模来说,翻一番,再翻一番,你应该会准备好,直到微软的下一次失误(可能是明年的这个时候)。
Go bigger than 1000!
超过1000!
Assume the user agent string has no limit on its length and prepare to store such a value. As you've seen, length is unpredictable.
假设用户代理字符串的长度没有限制,并准备存储这样的值。如你所见,长度是不可预测的。
In Postgres, there's a text type that accepts strings of unlimited length. Use that.
在Postgres中,有一种文本类型,可以接受无限长度的字符串。利用这一点。
Most likely though, you'll have to start truncating at some point. Call it good at a reasonably useful increment (200, 1k, 4k) and throw away the rest.
不过,最有可能的情况是,您将不得不在某个时刻开始截断。在一个合理的有用的增量(200,1k,4k)上说它很好,然后把剩下的扔掉。
Not an indication of how big a user agent can get, as there's plenty of answers showing the edge cases they've came across, but the longest that could find on http://www.useragentstring.com/pages/useragentstring.php?name=All was 250 bytes.
这并不能说明用户代理可以获得多大的容量,因为有大量的答案显示了他们遇到的边缘情况,但在http://www.useragentstring.com/pages/useragentstring.php?name=All上可以找到的最长答案是250字节。
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Media Center PC 5.0; SLCC1; OfficeLiveConnector.1.5; OfficeLivePatch.1.3; .NET4.0C; Lunascape 6.3.
I'll give you the standard answer:
我会给你标准答案:
Take the largest possible value you can possibly imagine it being, double it, and that's your answer.
取你能想象到的最大值,加倍,这就是你的答案。
Here is one that is 257
这里有一个是257
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6;
.NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30;
InfoPath.2; .NET CLR 3.0.04506.648; OfficeLiveConnector.1.3;
OfficeLivePatch.0.0; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
There is no strict limit on the size of a User-Agent string defined by any official standards or specifications. The HTTP/1.1 RFC 2616, which defines the HTTP protocol, does not specify a maximum length for the User-Agent header field.
对于任何官方标准或规范定义的用户代理字符串的大小没有严格的限制。定义了HTTP协议的HTTP/1.1 RFC 2616没有为用户代理报头字段指定最大长度。
However, practical limitations may apply depending on the specific web server, browser, or application handling the User-Agent string. Some servers or proxies may impose their own limits on header field sizes to prevent abuse or denial-of-service attacks. Additionally, excessively long User-Agent strings could potentially cause issues with parsing or handling on the server side.
但是,根据处理User-Agent字符串的特定Web服务器、浏览器或应用程序的不同,可能存在实际限制。一些服务器或代理可能会对报头字段大小施加自己的限制,以防止滥用或拒绝服务攻击。此外,过长的User-Agent字符串可能会导致服务器端的解析或处理问题。
In practice, User-Agent strings can vary in length, but they are typically not excessively long. Browsers and user-agent strings aim to provide useful information about the client application without being overly verbose. Most User-Agent strings are well within a few hundred characters.
实际上,User-Agent字符串的长度可能不同,但通常不会太长。浏览器和用户代理字符串旨在提供有关客户端应用程序的有用信息,而不会过于冗长。大多数User-Agent字符串都在几百个字符以内。
It's essential to design web applications and services to handle reasonably long User-Agent strings while also considering potential security and performance implications.
设计Web应用程序和服务以处理相当长的用户代理字符串,同时还要考虑潜在的安全和性能影响,这一点至关重要。
更多回答
I'm less concerned with server limits since I am on IIS, I know it won't ever be bigger then their limit which is still preety large if memory serves....
我不太关心服务器限制,因为我使用的是IIS,我知道它不会超过他们的限制,如果内存服务,它仍然非常大……
@Josh -- memory serves you well, on IIS it's 16K by default. ;-)
@Josh--内存使用得很好,在IIS上默认为16K。;-)
My database has 10,235 distinct user agent strings. I wanted to find the fastest hash algorithm that didn't produce any collisions. For my PHP environment I found md5 performed quickly at 2.3 seconds with no collisions. Interestingly I tried crc32 and crc32b and they also performed at 2.3 seconds with no collisions. But, because md5 has more combinations than crc32 and crc32b, md5 would likely have fewer possible collisions. Anyway, md5 is my choice and I expect it will work fine.
我的数据库有10,235个不同的用户代理字符串。我想找到不产生任何冲突的最快的散列算法。在我的PHP环境中,我发现MD5执行速度很快,只有2.3秒,没有冲突。有趣的是,我尝试了crc32和crc32b,它们也在2.3秒内运行,没有冲突。但是,因为MD5比crc32和crc32b有更多的组合,所以MD5可能会有更少的可能冲突。无论如何,MD5是我的选择,我预计它会工作得很好。
Why hash the User Agent? Is this for quick lookup or something?
为什么要散列用户代理?这是为了快速查看还是怎么的?
@Boom Lookups and uniqueness as DB unique keys can only be so long.
@Boom查找和作为数据库唯一键的唯一性只能这么长。
@noctufaber crc32 is not a hash, it does not attempt to be collision resistant.
@noctuFaber crc32不是散列,它不会尝试防冲突。
Is there anyone that would like to comment on what on earth is going on with this user agent? lol I must add, I am curious how such a beast can form.
有没有人想对这个用户代理到底发生了什么事发表评论?哈哈,我得补充一句,我很好奇这样的野兽是怎么形成的。
If anyone is curious; this one clocks in at 8010 chars. How could anyone on the browser team have thought that this was a good idea? It's as mad as a bag of cats!
如果有人好奇的话,这款手机的能量为8010个字符。浏览器团队中怎么会有人认为这是个好主意呢?它像一袋猫一样疯狂!
Does truncating this user agent string at 256 or 512 get rid of any data that is useful at all?
将此用户代理字符串截断为256或512会删除任何有用的数据吗?
I've made some observations, but not yet worked it out. There are 642
numbers. The first four numbers are always 6
, 7
, 8
, or 9
. The fifth number is always 0
. The last three are always 603
, 703
, 803
, or 903
. Perhaps someone might recognise that pattern? (Half-life 3 confirmed?)
我做了一些观察,但还没有弄明白。一共有642个号码。前四个数字始终为6、7、8或9。第五个数字始终为0。后三个始终是603、703、803或903。也许有人会认出这种模式?(半衰期3确认了吗?)
Interesting. I have now added code to truncate the UA string to 255 chars for my db logs.
有趣.我现在添加了代码,将数据库日志的UA字符串截断为255个字符。
You would probably end up with user agents on a 1-to-a-handful relationship with your users. Most user agents get so tweaked by the items a user has installed, and in a particular order, that they are almost personally identifiable (one other answer has a good example of this happening). In fact, the EFF did a study (pdf) about it.
最终,您可能会与用户代理建立一对二的关系。大多数用户代理都会被用户安装的项目以特定的顺序进行调整,以至于它们几乎是个人可识别的(另一个答案就是这种情况的一个很好的例子)。事实上,EFF对此进行了研究(Pdf)。
@patridge +1 for link, very good study. It's a bit off topic because they look at several fingerprints and not only the user agent strings. In a real world scenario, for a site that gets several million page views per month you would end up with a few thousand user agent string, so normalizing makes sense IMHO. With that said, I'm not very positive on storing user agent strings in the database :P
@帕特里奇+1为链接,学习非常好。这有点离题,因为他们会查看几个指纹,而不仅仅是用户代理字符串。在现实世界的场景中,对于一个每月获得几百万页面浏览量的站点,您最终会得到几千个用户代理字符串,因此标准化是有意义的。话虽如此,我对在数据库中存储用户代理字符串不是很有信心:p
@patridge The link to the study is now broken: updated link
@Patbridge指向这项研究的链接现在已断开:更新的链接
@patridge I agree that your idea sounds plausible, but my data disagrees with us both. I am working with exactly this kind of system right now, and I have around 70k unique UAs for 1.2m users. The reason I am on this page is that I chose 256 as a limit on my database field and have found that 50k out of the 70k were truncated so I have lost some information. I'm going to increase it to 4k now. Will be interesting to know how many would have been unique if they were not truncated
@帕特里奇我同意你的想法听起来有道理,但我的数据对我们两人都有不同意见。我现在使用的正是这种系统,我为120万用户提供了大约7万个独特的UAS。我在这个页面上的原因是我选择了256作为我的数据库字段的限制,并且发现70k中的50k被截断了,所以我丢失了一些信息。我现在要把它增加到4K。将很有兴趣知道如果没有被截断的话有多少是唯一的
For those keeping score, that's 1546 characters, including the leading and trailing quotes.
对于记分的人来说,这是1546个字符,包括前导引号和尾随引号。
I wonder what do .Net CLR and Trident have to do with Mozilla
我想知道.Net CLR和三叉戟与Mozilla有什么关系
heh so how large do you think it will be?
呵呵,你觉得它会有多大?
Twice whatever I think it is, of course. Though 256 seems like a nice round number to double.
当然,是我想的两倍。尽管256看起来是一个很好的整数来加倍。
I find it funny whenever we don't know what a good length would be we always end up with 256 or another multiple of 2.
我觉得有趣的是,每当我们不知道什么是一个好的长度,我们总是以256或2的另一个倍数结束。
Well 512 sounds good that gives me at least 10 years of .net releases and other junk to accumulate and by then I hope to be retired. Thanks again
好吧,512听起来不错,这给了我至少10年的.net发布和其他垃圾积累,到那时我希望退休。再次感谢
@Josh: "by then I hope to be retired"... where have I heard that before?! ;-)
@Josh:“到那时我希望退休。我在哪听过这句话;-)
I've seen up to 255 characters so far on a very very low traffic site. So not surprising. .Net 4.0 will probally add another 20 chars as well.
到目前为止,我在一个流量非常低的网站上看到了多达255个字符。因此,这并不令人惊讶。.NET 4.0可能还会再增加20个字符。
我是一名优秀的程序员,十分优秀!