gpt4 book ai didi

c - 通过C中的预取和缓存优化对阵列的线性访问

转载 作者:IT王子 更新时间:2023-10-28 23:30:04 26 4
gpt4 key购买 nike

披露:我在programmers.stack上尝试过类似的问题,但那个地方离活动stack远不近。
简介
我喜欢用大量的大图片。它们也有多个序列,必须反复处理和回放。有时我使用GPU,有时使用CPU,有时两者都使用。大多数访问模式本质上是线性的(来回),这让我想到了关于数组的更基本的事情,以及一种编写代码的方法应该如何优化给定硬件上的最大内存带宽(允许计算不会阻塞读/写)。
测试规范
我在2011年的MacBookAir4,2(i5-2557m)上完成了这个操作,它带有4GB内存和SSD。在测试期间,除了ITerm2,没有其他任何东西在运行。
GCC 5.2.0(自制),带有标志:-pedantic -std=c99 -Wall -Werror -Wextra -Wno-unused -O0和附加的include和library标志以及框架标志,以便使用我倾向于使用的glfw计时器。我本来可以不用的,没关系。当然是64位的。
我尝试过带有可选-fprefetch-loop-arrays标志的测试,但它似乎根本没有影响结果。
试验
在堆上分配两个n bytes数组-其中n8, 16, 32, 64, 128, 256, 512 and 1024 MB
初始化array0xff,字节一次
测试1-线性拷贝
线性副本:

for(uint64_t i = 0; i < ARRAY_NUM; ++i) {
array_copy[i] = array[i];
}

测试2-用步幅复制。这就是让人困惑的地方。我试过在这里玩预购游戏。我尝试过各种组合,每一个循环应该做多少,似乎每一个循环大约40次可以获得最佳性能。为什么?我不知道。我知道,C99中的 mallocuint64_t会给我内存对齐的块。我还看到了我的l1到l3缓存的大小,它们高于这些缓存,那么我要命中什么呢?线索可能在图表后面。我很想理解这一点。
跨步复制:
for(uint64_t i = 0; i < ARRAY_NUM; i=i+40) {
array_copy[i] = array[i];
array_copy[i+1] = array[i+1];
array_copy[i+2] = array[i+2];
array_copy[i+3] = array[i+3];
array_copy[i+4] = array[i+4];
array_copy[i+5] = array[i+5];
array_copy[i+6] = array[i+6];
array_copy[i+7] = array[i+7];
array_copy[i+8] = array[i+8];
array_copy[i+9] = array[i+9];
array_copy[i+10] = array[i+10];
array_copy[i+11] = array[i+11];
array_copy[i+12] = array[i+12];
array_copy[i+13] = array[i+13];
array_copy[i+14] = array[i+14];
array_copy[i+15] = array[i+15];
array_copy[i+16] = array[i+16];
array_copy[i+17] = array[i+17];
array_copy[i+18] = array[i+18];
array_copy[i+19] = array[i+19];
array_copy[i+20] = array[i+20];
array_copy[i+21] = array[i+21];
array_copy[i+22] = array[i+22];
array_copy[i+23] = array[i+23];
array_copy[i+24] = array[i+24];
array_copy[i+25] = array[i+25];
array_copy[i+26] = array[i+26];
array_copy[i+27] = array[i+27];
array_copy[i+28] = array[i+28];
array_copy[i+29] = array[i+29];
array_copy[i+30] = array[i+30];
array_copy[i+31] = array[i+31];
array_copy[i+32] = array[i+32];
array_copy[i+33] = array[i+33];
array_copy[i+34] = array[i+34];
array_copy[i+35] = array[i+35];
array_copy[i+36] = array[i+36];
array_copy[i+37] = array[i+37];
array_copy[i+38] = array[i+38];
array_copy[i+39] = array[i+39];
}

测试3-用步幅阅读。和用步幅复制一样。
跨步阅读:
    const int imax = 1000;
for(int j = 0; j < imax; ++j) {
uint64_t tmp = 0;
performance = 0;
time_start = glfwGetTime();
for(uint64_t i = 0; i < ARRAY_NUM; i=i+40) {
tmp = array[i];
tmp = array[i+1];
tmp = array[i+2];
tmp = array[i+3];
tmp = array[i+4];
tmp = array[i+5];
tmp = array[i+6];
tmp = array[i+7];
tmp = array[i+8];
tmp = array[i+9];
tmp = array[i+10];
tmp = array[i+11];
tmp = array[i+12];
tmp = array[i+13];
tmp = array[i+14];
tmp = array[i+15];
tmp = array[i+16];
tmp = array[i+17];
tmp = array[i+18];
tmp = array[i+19];
tmp = array[i+20];
tmp = array[i+21];
tmp = array[i+22];
tmp = array[i+23];
tmp = array[i+24];
tmp = array[i+25];
tmp = array[i+26];
tmp = array[i+27];
tmp = array[i+28];
tmp = array[i+29];
tmp = array[i+30];
tmp = array[i+31];
tmp = array[i+32];
tmp = array[i+33];
tmp = array[i+34];
tmp = array[i+35];
tmp = array[i+36];
tmp = array[i+37];
tmp = array[i+38];
tmp = array[i+39];
}

测试4-线性读数。字节/字节。我很惊讶这里没有结果。我以为是为了这些案子。
线性读数:
for(uint64_t i = 0; i < ARRAY_NUM; ++i) {
tmp = array[i];
}

测试5- 320 bytes作为对比。
备忘录:
memcpy(array_copy, array, ARRAY_NUM*sizeof(uint64_t));

结果
样品输出:
样品输出:
Init done in 0.767 s - size of array: 1024 MBs (x2)
Performance: 1304.325 MB/s

Copying (linear) done in 0.898 s
Performance: 1113.529 MB/s

Copying (stride 40) done in 0.257 s
Performance: 3890.608 MB/s

[1000/1000] Performance stride 40: 7474.322 MB/s
Average: 7523.427 MB/s
Performance MIN: 3231 MB/s | Performance MAX: 7818 MB/s

[1000/1000] Performance dumb: 2504.713 MB/s
Average: 2481.502 MB/s
Performance MIN: 1572 MB/s | Performance MAX: 2644 MB/s

Copying (memcpy) done in 1.726 s
Performance: 579.485 MB/s

--

Init done in 0.415 s - size of array: 512 MBs (x2)
Performance: 1233.136 MB/s

Copying (linear) done in 0.442 s
Performance: 1157.147 MB/s

Copying (stride 40) done in 0.116 s
Performance: 4399.606 MB/s

[1000/1000] Performance stride 40: 6527.004 MB/s
Average: 7166.458 MB/s
Performance MIN: 4359 MB/s | Performance MAX: 7787 MB/s

[1000/1000] Performance dumb: 2383.292 MB/s
Average: 2409.005 MB/s
Performance MIN: 1673 MB/s | Performance MAX: 2641 MB/s

Copying (memcpy) done in 0.102 s
Performance: 5026.476 MB/s

--

Init done in 0.228 s - size of array: 256 MBs (x2)
Performance: 1124.618 MB/s

Copying (linear) done in 0.242 s
Performance: 1057.916 MB/s

Copying (stride 40) done in 0.070 s
Performance: 3650.996 MB/s

[1000/1000] Performance stride 40: 7129.206 MB/s
Average: 7370.537 MB/s
Performance MIN: 4805 MB/s | Performance MAX: 7848 MB/s

[1000/1000] Performance dumb: 2456.129 MB/s
Average: 2435.556 MB/s
Performance MIN: 1496 MB/s | Performance MAX: 2637 MB/s

Copying (memcpy) done in 0.050 s
Performance: 5095.845 MB/s

--

Init done in 0.100 s - size of array: 128 MBs (x2)
Performance: 1277.200 MB/s

Copying (linear) done in 0.112 s
Performance: 1147.030 MB/s

Copying (stride 40) done in 0.029 s
Performance: 4424.513 MB/s

[1000/1000] Performance stride 40: 6497.635 MB/s
Average: 6714.540 MB/s
Performance MIN: 4206 MB/s | Performance MAX: 7843 MB/s

[1000/1000] Performance dumb: 2275.336 MB/s
Average: 2335.544 MB/s
Performance MIN: 1572 MB/s | Performance MAX: 2626 MB/s

Copying (memcpy) done in 0.025 s
Performance: 5086.502 MB/s

--

Init done in 0.051 s - size of array: 64 MBs (x2)
Performance: 1255.969 MB/s

Copying (linear) done in 0.058 s
Performance: 1104.282 MB/s

Copying (stride 40) done in 0.015 s
Performance: 4305.765 MB/s

[1000/1000] Performance stride 40: 7750.063 MB/s
Average: 7412.167 MB/s
Performance MIN: 3892 MB/s | Performance MAX: 7826 MB/s

[1000/1000] Performance dumb: 2610.136 MB/s
Average: 2577.313 MB/s
Performance MIN: 2126 MB/s | Performance MAX: 2652 MB/s

Copying (memcpy) done in 0.013 s
Performance: 4871.823 MB/s

--

Init done in 0.024 s - size of array: 32 MBs (x2)
Performance: 1306.738 MB/s

Copying (linear) done in 0.028 s
Performance: 1148.582 MB/s

Copying (stride 40) done in 0.008 s
Performance: 4265.907 MB/s

[1000/1000] Performance stride 40: 6181.040 MB/s
Average: 7124.592 MB/s
Performance MIN: 3480 MB/s | Performance MAX: 7777 MB/s

[1000/1000] Performance dumb: 2508.669 MB/s
Average: 2556.529 MB/s
Performance MIN: 1966 MB/s | Performance MAX: 2646 MB/s

Copying (memcpy) done in 0.007 s
Performance: 4617.860 MB/s

--

Init done in 0.013 s - size of array: 16 MBs (x2)
Performance: 1243.011 MB/s

Copying (linear) done in 0.014 s
Performance: 1139.362 MB/s

Copying (stride 40) done in 0.004 s
Performance: 4181.548 MB/s

[1000/1000] Performance stride 40: 6317.129 MB/s
Average: 7358.539 MB/s
Performance MIN: 5250 MB/s | Performance MAX: 7816 MB/s

[1000/1000] Performance dumb: 2529.707 MB/s
Average: 2525.783 MB/s
Performance MIN: 1823 MB/s | Performance MAX: 2634 MB/s

Copying (memcpy) done in 0.003 s
Performance: 5167.561 MB/s

--

Init done in 0.007 s - size of array: 8 MBs (x2)
Performance: 1186.019 MB/s

Copying (linear) done in 0.007 s
Performance: 1147.018 MB/s

Copying (stride 40) done in 0.002 s
Performance: 4157.658 MB/s

[1000/1000] Performance stride 40: 6958.839 MB/s
Average: 7097.742 MB/s
Performance MIN: 4278 MB/s | Performance MAX: 7499 MB/s

[1000/1000] Performance dumb: 2585.366 MB/s
Average: 2537.896 MB/s
Performance MIN: 2284 MB/s | Performance MAX: 2610 MB/s

Copying (memcpy) done in 0.002 s
Performance: 5059.164 MB/s

线性阅读比跨步阅读慢3倍。步幅读数在大约7500-7800 MB/s范围内最大。但有两件事让我困惑。在DDR31333MHz,最大内存吞吐量应该是 -fprefetch-loop-arrays那么为什么我没有达到它呢?为什么读取速度不一致,我如何优化(缓存未命中?)?从图表中更明显,尤其是线性读数,在性能上有规律的下降。

8-16 MB
8-16 MB
32-64 MB
32-64 MB
128-256 MB
128-256 MB
512-1024 MB
512-1024 MB
全部在一起
ALL
以下是所有感兴趣的人的完整资料:
/*
gcc -pedantic -std=c99 -Wall -Werror -Wextra -Wno-unused -O0 -I "...path to glfw3 includes ..." -L "...path to glfw3 lib ..." arr_test_copy_gnuplot.c -o arr_test_copy_gnuplot -lglfw3 -framework OpenGL -framework Cocoa -framework IOKit -framework CoreVideo

optional: -fprefetch-loop-arrays
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h> /* memcpy */
#include <inttypes.h>
#include <GLFW/glfw3.h>

#define ARRAY_NUM 1000000 * 128 /* GIG */
int main(int argc, char *argv[]) {

if(!glfwInit()) {
exit(EXIT_FAILURE);
}

int cx = 0;
char filename_stride[50];
char filename_dumb[50];
cx = snprintf(filename_stride, 50, "%lu_stride.dat",
((ARRAY_NUM*sizeof(uint64_t))/1000000));
if(cx < 0 || cx >50) { exit(EXIT_FAILURE); }
FILE *file_stride = fopen(filename_stride, "w");
cx = snprintf(filename_dumb, 50, "%lu_dumb.dat",
((ARRAY_NUM*sizeof(uint64_t))/1000000));
if(cx < 0 || cx >50) { exit(EXIT_FAILURE); }
FILE *file_dumb = fopen(filename_dumb, "w");
if(file_stride == NULL || file_dumb == NULL) {
perror("Error opening file.");
exit(EXIT_FAILURE);
}

uint64_t *array = malloc(sizeof(uint64_t) * ARRAY_NUM);
uint64_t *array_copy = malloc(sizeof(uint64_t) * ARRAY_NUM);

double performance = 0.0;
double time_start = 0.0;
double time_end = 0.0;
double performance_min = 0.0;
double performance_max = 0.0;

/* Init array */
time_start = glfwGetTime();
for(uint64_t i = 0; i < ARRAY_NUM; ++i) {
array[i] = 0xff;
}
time_end = glfwGetTime();

performance = ((ARRAY_NUM * sizeof(uint64_t))/1000000) / (time_end - time_start);
printf("Init done in %.3f s - size of array: %lu MBs (x2)\n", (time_end - time_start), (ARRAY_NUM*sizeof(uint64_t)/1000000));
printf("Performance: %.3f MB/s\n\n", performance);

/* Linear copy */
performance = 0;
time_start = glfwGetTime();
for(uint64_t i = 0; i < ARRAY_NUM; ++i) {
array_copy[i] = array[i];
}
time_end = glfwGetTime();

performance = ((ARRAY_NUM * sizeof(uint64_t))/1000000) / (time_end - time_start);
printf("Copying (linear) done in %.3f s\n", (time_end - time_start));
printf("Performance: %.3f MB/s\n\n", performance);

/* Copying with wide stride */
performance = 0;
time_start = glfwGetTime();
for(uint64_t i = 0; i < ARRAY_NUM; i=i+40) {
array_copy[i] = array[i];
array_copy[i+1] = array[i+1];
array_copy[i+2] = array[i+2];
array_copy[i+3] = array[i+3];
array_copy[i+4] = array[i+4];
array_copy[i+5] = array[i+5];
array_copy[i+6] = array[i+6];
array_copy[i+7] = array[i+7];
array_copy[i+8] = array[i+8];
array_copy[i+9] = array[i+9];
array_copy[i+10] = array[i+10];
array_copy[i+11] = array[i+11];
array_copy[i+12] = array[i+12];
array_copy[i+13] = array[i+13];
array_copy[i+14] = array[i+14];
array_copy[i+15] = array[i+15];
array_copy[i+16] = array[i+16];
array_copy[i+17] = array[i+17];
array_copy[i+18] = array[i+18];
array_copy[i+19] = array[i+19];
array_copy[i+20] = array[i+20];
array_copy[i+21] = array[i+21];
array_copy[i+22] = array[i+22];
array_copy[i+23] = array[i+23];
array_copy[i+24] = array[i+24];
array_copy[i+25] = array[i+25];
array_copy[i+26] = array[i+26];
array_copy[i+27] = array[i+27];
array_copy[i+28] = array[i+28];
array_copy[i+29] = array[i+29];
array_copy[i+30] = array[i+30];
array_copy[i+31] = array[i+31];
array_copy[i+32] = array[i+32];
array_copy[i+33] = array[i+33];
array_copy[i+34] = array[i+34];
array_copy[i+35] = array[i+35];
array_copy[i+36] = array[i+36];
array_copy[i+37] = array[i+37];
array_copy[i+38] = array[i+38];
array_copy[i+39] = array[i+39];
}
time_end = glfwGetTime();

performance = ((ARRAY_NUM * sizeof(uint64_t))/1000000) / (time_end - time_start);
printf("Copying (stride 40) done in %.3f s\n", (time_end - time_start));
printf("Performance: %.3f MB/s\n\n", performance);

/* Reading with wide stride */
const int imax = 1000;
double performance_average = 0.0;
for(int j = 0; j < imax; ++j) {
uint64_t tmp = 0;
performance = 0;
time_start = glfwGetTime();
for(uint64_t i = 0; i < ARRAY_NUM; i=i+40) {
tmp = array[i];
tmp = array[i+1];
tmp = array[i+2];
tmp = array[i+3];
tmp = array[i+4];
tmp = array[i+5];
tmp = array[i+6];
tmp = array[i+7];
tmp = array[i+8];
tmp = array[i+9];
tmp = array[i+10];
tmp = array[i+11];
tmp = array[i+12];
tmp = array[i+13];
tmp = array[i+14];
tmp = array[i+15];
tmp = array[i+16];
tmp = array[i+17];
tmp = array[i+18];
tmp = array[i+19];
tmp = array[i+20];
tmp = array[i+21];
tmp = array[i+22];
tmp = array[i+23];
tmp = array[i+24];
tmp = array[i+25];
tmp = array[i+26];
tmp = array[i+27];
tmp = array[i+28];
tmp = array[i+29];
tmp = array[i+30];
tmp = array[i+31];
tmp = array[i+32];
tmp = array[i+33];
tmp = array[i+34];
tmp = array[i+35];
tmp = array[i+36];
tmp = array[i+37];
tmp = array[i+38];
tmp = array[i+39];
}
time_end = glfwGetTime();

performance = ((ARRAY_NUM * sizeof(uint64_t))/1000000) / (time_end - time_start);
performance_average += performance;
if(performance > performance_max) { performance_max = performance; }
if(j == 0) { performance_min = performance; }
if(performance < performance_min) { performance_min = performance; }

printf("[%d/%d] Performance stride 40: %.3f MB/s\r", j+1, imax, performance);
fprintf(file_stride, "%d\t%f\n", j, performance);
fflush(file_stride);
fflush(stdout);
}
performance_average = performance_average / imax;
printf("\nAverage: %.3f MB/s\n", performance_average);
printf("Performance MIN: %3.f MB/s | Performance MAX: %3.f MB/s\n\n",
performance_min, performance_max);

/* Linear reading */
performance_average = 0.0;
performance_min = 0.0;
performance_max = 0.0;
for(int j = 0; j < imax; ++j) {
uint64_t tmp = 0;
performance = 0;
time_start = glfwGetTime();
for(uint64_t i = 0; i < ARRAY_NUM; ++i) {
tmp = array[i];
}
time_end = glfwGetTime();

performance = ((ARRAY_NUM * sizeof(uint64_t))/1000000) / (time_end - time_start);
performance_average += performance;
if(performance > performance_max) { performance_max = performance; }
if(j == 0) { performance_min = performance; }
if(performance < performance_min) { performance_min = performance; }
printf("[%d/%d] Performance dumb: %.3f MB/s\r", j+1, imax, performance);
fprintf(file_dumb, "%d\t%f\n", j, performance);
fflush(file_dumb);
fflush(stdout);
}
performance_average = performance_average / imax;
printf("\nAverage: %.3f MB/s\n", performance_average);
printf("Performance MIN: %3.f MB/s | Performance MAX: %3.f MB/s\n\n",
performance_min, performance_max);

/* Memcpy */
performance = 0;
time_start = glfwGetTime();
memcpy(array_copy, array, ARRAY_NUM*sizeof(uint64_t));
time_end = glfwGetTime();

performance = ((ARRAY_NUM * sizeof(uint64_t))/1000000) / (time_end - time_start);
printf("Copying (memcpy) done in %.3f s\n", (time_end - time_start));
printf("Performance: %.3f MB/s\n", performance);

/* Cleanup and exit */
free(array);
free(array_copy);
glfwTerminate();
fclose(file_dumb);
fclose(file_stride);

exit(EXIT_SUCCESS);
}

总结
当使用线性访问是最常见模式的数组时,我应该如何编写具有最大和(接近)恒定速度的代码?
我可以从这个例子中了解到关于缓存和预取的什么?
这些图表告诉我一些我应该知道但我没有注意到的事情吗?
我还可以如何展开循环?我尝试过不显示任何结果,所以我采用了手动写入循环展开的方法。
多谢长篇大论。
编辑:
似乎 memcpy与不存在 10,664 MB/s标志时的性能不同!什么给予?缺少标志会产生更好的性能,如图中所示。
O flag absent
编辑2:
我终于用AVX达到了极限。
=== READING WITH AVX ===
[1000/1000] Performance AVX: 9868.912 MB/s
Average: 10029.085 MB/s
Performance MIN: 6554 MB/s | Performance MAX: 11464 MB/s

平均值接近10664。我不得不将编译器改为clang,因为gcc让我很难使用avx(-mavx)。这也是为什么图有更明显的下降。我仍然想知道如何/什么是/有不变的表现。我想这是由于缓存/缓存线。它还将解释在这里和那里高于DDR3速度的性能(最大为11464MB/s)。
对不起,我的gnuplot fu和它的钥匙。蓝色是SSE2( -funroll-loops),橙色是AVX( -O0)。紫色的步伐和以前一样,绿色的是愚蠢的阅读一次一个。
King AVX
所以,最后两个问题是:
什么导致了下降以及如何保持更稳定的性能
有没有可能在没有内窥镜的情况下达到天花板?
最新版本的注册表: https://gist.github.com/Keyframe/1ed9062ec52fc4a0d14b
以及该版本的图表: http://imgur.com/a/cPeor

最佳答案

主存储器的峰值带宽的值为2倍。而不是10664 MB/s itshould be 21.3 GB/s(更准确地说应该是(21333_)MB/s-见下面我的推导)。事实上,您有时会看到超过10664MB/s,这应该告诉您在峰值带宽计算中可能存在问题。
为了获得最大带宽for Core2 through Sandy Bridge you need to use non-temporal stores。此外,you need multiple threads。您不需要AVX指令或展开循环。

void copy(char *x, char *y, int n)
{
#pragma omp parallel for schedule(static)
for(int i=0; i<n/16; i++)
{
_mm_stream_ps((float*)&y[16*i], _mm_load_ps((float*)&x[16*i]));
}
}

数组需要16字节对齐,并且也是16的倍数。非暂时存储的经验法则是,当您复制的内存大于最后一级缓存的一半时,使用它们。在您的情况下,三级缓存大小的一半是1.5 MB,而您复制的最小数组是8 MB,因此这比上一级缓存大小的一半要大得多。
这里有一些代码来测试这个。
//gcc -O3 -fopenmp foo.c
#include <stdio.h>
#include <x86intrin.h>
#include <string.h>
#include <omp.h>

void copy(char *x, char *y, int n)
{
#pragma omp parallel for schedule(static)
for(int i=0; i<n/16; i++)
{
_mm_stream_ps((float*)&x[16*i], _mm_load_ps((float*)&y[16*i]));
}
}

void copy2(char *x, char *y, int n)
{
#pragma omp parallel for schedule(static)
for(int i=0; i<n/16; i++)
{
_mm_store_ps((float*)&x[16*i], _mm_load_ps((float*)&y[16*i]));
}
}

int main(void)
{
unsigned n = 0x7fffffff;
char *x = _mm_malloc(n, 16);
char *y = _mm_malloc(n, 16);
double dtime;

memset(x,0,n);
memset(y,1,n);

dtime = -omp_get_wtime();
copy(x,y,n);
dtime += omp_get_wtime();
printf("time %f\n", dtime);

dtime = -omp_get_wtime();
copy2(x,y,n);
dtime += omp_get_wtime();
printf("time %f\n", dtime);

dtime = -omp_get_wtime();
memcpy(x,y,n);
dtime += omp_get_wtime();
printf("time %f\n", dtime);
}

在我的系统core2(nehalem之前)p9600@2.53GHz上,它给出
time non temporal store 0.39
time SSE store 1.10
time memcpy 0.98

复制2GB。
注意,你“触摸”你将要写的记忆是非常重要的(我用memset来做这个)。在访问内存之前,系统不一定要分配内存。如果在进行内存复制时内存没有被访问,那么执行此操作的开销会显著地影响结果。
According to wikipediaddr3-1333的内存时钟为166_MHz。DDR以两倍的内存时钟速率传输数据。此外,DDR3的总线时钟乘数为4。因此,DDR3的每个内存时钟的总乘法为8。另外,您的主板有两个内存通道。所以总的转移率是
 21333⅓ MB/s = (166⅔ 1E6 clocks/s) * (8 lines/clock/channel) * (2 channels) * (64-bits/line) * (byte/8-bits) * (MB/1E6 bytes).

关于c - 通过C中的预取和缓存优化对阵列的线性访问,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33676991/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com