operating-system - 缓存中 "block size"的概念-6ren

operating-system - 缓存中 "block size"的概念

转载作者：行者123 更新时间：2023-12-02 09:01:00

26

4

我刚刚开始学习直接映射和集合关联缓存的概念。我有一些非常基本的疑问。开始吧。

假设地址长 32 位，并且我有一个 32KB 缓存、64 字节 block 大小和 512 帧，那么“ block ”内实际存储了多少数据？如果我有一条从内存位置加载值的指令，并且该值是 16 位整数，那么 64 字节 block 之一现在仅存储 16 位(2 字节)整数值。该 block 内的其他 62 个字节又如何呢？如果我现在有另一个加载指令也加载一个 16 位整数值，则该值现在根据加载地址进入另一个帧的另一个 block (如果该地址映射到前一个指令的同一帧，则前一个值将被逐出)并且该 block 再次仅存储 64 字节中的 2 字节)。正确的？

如果这看起来是一个非常愚蠢的疑问，请原谅我，这只是我想正确地理解我的概念。

最佳答案

我输入这封电子邮件是为了让某人解释缓存，但我认为您可能会发现它也很有用。

You have 32-bit addresses that can refer to bytes in RAM. You want to be able to cache the data that you access, to use them later.

Let's say you want a 1-MiB (2²⁰ bytes) cache.

What do you do?

You have 2 restrictions you need to meet:

Caching should be as uniform as possible across all addresses. i.e. you don't want to bias toward any particular kind of address.

How do you do this? Use remainder! With mod, you can evenly distribute any integer over whatever range you want.

You want to help minimize bookkeeping costs. That means e.g. if you're caching in blocks of 1 byte, you don't want to store 4 bytes of data just to keep track of where 1 byte belongs to.

How do you do that? You store blocks that are bigger than just 1 byte.

Let's say you choose 16-byte (2⁴-byte) blocks. That means you can cache 2²⁰ / 2⁴ = 2¹⁶ = 65,536 blocks of data.

You now have a few options:

You can design the cache so that data from any memory block could be stored in any of the cache blocks. This would be called a fully-associative cache.

The benefit is that it's the "fairest" kind of cache: all blocks are treated completely equally.

The tradeoff is speed: To find where to put the memory block, you have to search every cache block for a free space. This is really slow.

You can design the cache so that data from any memory block could only be stored in a single cache block. This would be called a direct-mapped cache.

The benefit is that it's the fastest kind of cache: you do only 1 check to see if the item is in the cache or not.

The tradeoff is that, now, if you happen to have a bad memory access pattern, you can have 2 blocks kicking each other out successively, with unused blocks still remaining in the cache.

You can do a mixture of both: map a single memory block into multiple blocks. This is what real processors do -- they have N-way set associative caches.

Direct-mapped cache:

Now you have 65,536 blocks of data, each block being of 16 bytes.
You store it as 65,536 "rows" inside your cache, with each "row" consisting of the data itself, along with the metadata (regarding where the block belongs, whether it's valid, whether it's been written to, etc.).

Question: How does each block in memory get mapped to each block in the cache?

Answer: Well, you're using a direct-mapped cache, using mod. That means addresses 0 to 15 will be mapped to block 0 in the cache; 16-31 get mapped to block 2, etc... and it wraps around as you reach the 1-MiB mark.

So, given memory address M, how do you find the row number N? Easy: N = M % 2²⁰ / 2⁴.
But that only tells you where to store the data, not how to retrieve it. Once you've stored it, and try to access it again, you have to know which 1-MB portion of memory was stored here, right?

So that's one piece of metadata: the tag bits. If it's in row N, all you need to know is what the quotient was, during the mod operation. Which, for a 32-bit address, is 12 bits big (since the remainder is 20 bits).

So your tag becomes 12 bits long -- specifically, the topmost 12 bits of any memory address.
And you already knew that the lowermost 4 bits are used for the offset within a block (since memory is byte-addressed, and a block is 16 bytes).
That leaves 16 bits for the "index" bits of a memory address, which can be used to find which row the address belongs to. (It's just a division + remainder operation, but in binary.)

You also need other bits: e.g. you need to know whether a block is in fact valid or not, because when the CPU is turned on, it contains invalid data. So you add 1 bit of metadata: the Valid bit.

There's other bits you'll learn about, used for optimization, synchronization, etc... but these are the basic ones. :)

关于operating-system - 缓存中 "block size"的概念，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8107965/

26

4

0

文章推荐： Django 和错误请求 (400)

文章推荐： Django另一个优化save()

文章推荐： haskell - 模板 Haskell : reify in GHCi

文章推荐： python:获取操作系统的argv[0]，而不是sys.argv[0]

size - ValueError : Target size (torch. Size([16])) 必须与输入大小相同 (torch.Size([16, 1]))
ValueError Traceback (most recent call last) in 23 out
CSS Percent size specifier sizing element to more than specified size
在 CSS 中，我从来没有真正理解为什么会发生这种情况，但每当我为某物分配 margin-top:50% 时，该元素就会被推到页面底部，几乎完全消失这一页。我假设 50% 时，该元素将位于页面的中间位
neural-network - ValueError : Target size (torch. Size([1000])) must be the same as input size (torch.Size([1000, 1]))
我正在尝试在 pyTorch 中训练我的第一个神经网络(我不是程序员，只是一个困惑的化学家)。网络本身应该采用 1064 个元素向量并用 float 对它们进行评级。到目前为止，我遇到了各种各样的
c# - 数组移位/错误索引/i = [x+y*size+z*size*size]
我有一个简单的问题。如何在 3 个维度上移动线性阵列？这似乎太有效了，但在 X 和 Y 轴上我遇到了索引问题。我想这样做的原因很简单。我想创建一个带有 block 缓冲区的体积地形，所以我只需要在视口
python - 如何解决与输入大小 (torch.Size([1])) 不同的 UserWarning : Using a target size (torch. Size([]))？
我正在尝试运行我购买的一本关于 Pytorch 强化学习的书中的代码。代码应该按照本书工作，但对我来说，模型没有收敛，奖励仍然为负。它还会收到以下用户警告: /home/user/.local/li
python - PyTorch ValueError : Target size (torch. Size([64])) 必须与输入大小相同 (torch.Size([15]))
我目前正在使用 this repo使用我自己的数据集执行 NLP 并了解有关 CNN 的更多信息，但我一直遇到有关形状不匹配的错误: ValueError: Target size (torch.Si
objective-c - UIScrollView.size = view.size - allAdditionalBars.size(如 TabBar 或 NavigationBar)以编程方式
UIScrollView 以编程方式设置，请不要使用 .xib 文件发布答案。我的 UIScrollView 位于我的模型类中，所以我希望代码能够轻松导入到另一个项目中，例如。适用于 iPad 或旋
css - Bootstrap 4 : How Can I Set $font-size-base for Different Monitor Sizes using Responsive Font Sizing?
我在我的 Ruby on Rails 应用程序(版本 4.3.1)中使用 Bootstrap gem。我最近发现了响应式字体大小功能 (rfs)。根据 Bootstrap 文档，它刚刚在 4.3 版中
Android App开发错误: "Bad XML block: header size 60 or total size 3932356 is larger than data size 0"
这个问题不太可能帮助任何 future 的访客；它仅与一个小地理区域、一个特定时刻或一个非常狭窄的情况相关，而这些情况通常不适用于互联网的全局受众。如需帮助使这个问题更广泛地适用，visit the
scala - size 和 size 的区别是
size 之间的语义区别是什么？和 sizeIs ?例如， List(1,2,3).sizeIs > 1 // true List(1,2,3).size > 1 // true Luis 在 c
javascript - 从子元素中删除 Size 和 font-size
我想从 div 中删除一些元素属性。我的 div 是自动生成的。我想遍历每个 div 和子 div，并想删除所有 font-size (font-size: Xpx)和 size里面font tag
python - 使用 self.size = size 时语法无效
super ，对 Python 和一般编程 super 新手。我有一个问题应该很简单。我正在使用一本使用 Python 3.1 版的 python 初学者编程书。我目前正在写书中的一个程序，我正在学
size - native 库 : change thumbnail default size
我无法从 NativeBase 更改缩略图的默认大小。我可以显示默认圆圈，即小圆圈和大圆圈，但我想显示比默认大小更大的圆圈。这是我的缩略图代码: Prop 大小不起作用，缩略图仍然很小。我的 Na
pytorch - pytorch中张量torch.Size([])和torch.Size([1])的形状差异
我是pytorch的新手。在玩张量时，我观察到了两种类型的张量- tensor(58) tensor([57.3895]) 我打印了它们的形状，输出分别是 - torch.Size([]) torch
Docker 镜像 : virtual size vs real size
这是我的 docker images 命令的输出: $ docker images REPOSITORY TAG IMAGE ID CREATED
java - 为什么使用 "s = --size"而不是 "s = size"？
来自 PriorityQueue 的代码: private E removeAt(int i) { assert i >= 0 && i < size; modCount++;
c++ - sizeof() : the size of a class isn't the same as the size of it's members together?
首先，在我的系统上保留以下内容:sizeof(char) == 1 和 sizeof(char*) == 4。很简单，当我们计算下面类的总大小时: class SampleClass { char c
iphone - cocos2d content.size、boundingBox 和 size
我正在编写一个游戏来查找 2 个图像之间的差异。我创建了 CCSprite 的子类 Spot。首先我尝试创建小图像并根据其位置添加自身，但后来我发现位置很难确定，因为很难避免 1 或 2 个像素的偏移
javascript - Tumblr:photoUrl-(size) - size depending on class？
我有一个 Tumblr Site每个帖子的宽度由标签决定。如果一篇文章被标记为 #width200，CSS 类 .width200 被分配。问题是，虽然帖子的宽度不同，但它们都使用主题运算符加载相
c++ - 为什么动态分配的数组大小在插入时是初始数组的 2*size，而不是 size+1？
这个问题在这里已经有了答案: What is the ideal growth rate for a dynamically allocated array? (12 个答案) 关闭 8 年前。我

首页

博学

6Ren·AI

商城

operating-system - 缓存中 "block size"的概念