c++ - 如何使用 C++ 测量 Linux 中切换进程上下文的时间?

我需要使用 C++ 测量上下文切换的时间。我知道我可以简单地从 C++ 代码访问 C 函数,但任务是尽可能避免使用 C。我在 Internet 上搜索过这个,但只找到了使用 C 来执行此操作的方法。是否有任何方法可以在 C++ 中使用 OS? pipe(...) 的任何类似物来自 unistd.h , sched_setaffinity(...)来自 sched.h和其他人?


Are there any ways to work with OS in C++?

您引用的所有 C 函数都可以通过直接包含轻松访问。示例:

#include "pthread.h"

并且在 C++ 编译中,自动神奇地获得 extern "C"。

您的链接在 Linux 上需要 -lrt 和 -pthread

Any analogs of pipe(...) from unistd.h, sched_setaffinity(...) 

不是类似物,构建链接到真正的“C”Linux 函数。

I need to measure the time of context switching using C++ means.

我通过重复某些 Action 1 到 10 秒来测量持续时间,并计算循环完成的次数。

在我最新的次要基准测试中,完全用 C++ 编写(但不使用 C++11 功能),我

  • 构建节点链表
  • 每个节点都有自己的线程
  • 每个线程拥有 2 个指向 pthread_mutex 信号量(输入和输出)的指针
  • 每个线程体等待它的输入信号量被信号化(semTake())
  • 唤醒后,线程主体向其输出信号量发出信号 (semGive()) 并执行几乎仅此而已
  • N个线程的信号量被分发给节点线程,循环结束在列表的末尾(即结束列表节点输出信号量句柄指向begin-list-node 输入信号量句柄)

  • 主要任务,使用 semGive() 启动链式 react ,等待 10 秒(使用usleep),然后设置一个每个线程都可以看到的标志。

示例在 6 岁的戴尔上运行。

Compilation started at Wed Jan 15 22:31:33

lmbm101: context-switch duration .. wait up to 10 seconds while measuring.
switch enforced using pthread_mutex semaphores

C5 bogomips: 5210.77 5210.77
686.56 kilo m_thread_switch invocations in 10.88 sec (10000088 us)
68.6554 kilo m_thread_switch events per second
14.5655 u seconds per m_thread_switch event
pid = 12188

now (52d760af): 22:31:43
bdtod 2014/01/15 22:31:43 minod=1351 iod=91 secod=81103 soi=104

我在 C++11 发布之前做了这个小基准测试。这段代码是用 C++11 编译的,但没有使用 C++11 任务……这是我 future 的努力。

我写了这个示例代码 2017-04。我现在倾向于将 std::vector 用于各种事情。以前的测量没有。类似的技术,但简化了结果报告。

#include <chrono>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <thread>
#include <vector>

// see
#include "../../bag/src/eng_format.hpp" // to_engineering_string(), from_engineering_string()

#include <cassert>

#include <semaphore.h> // Note 1 - Ubuntu / Posix feature access, see PPLSEM_t

namespace DTB // doug's test box
// Note 2 - typedefs to simplify chrono access
// 'compressed' chrono access --------------vvvvvvv
typedef std::chrono::high_resolution_clock HRClk_t; // std-chrono-hi-res-clk
typedef HRClk_t::time_point Time_t; // std-chrono-hi-res-clk-time-point
typedef std::chrono::microseconds NS_t; // std-chrono-nanoseconds
typedef std::chrono::microseconds US_t; // std-chrono-microseconds
typedef std::chrono::microseconds MS_t; // std-chrono-milliseconds
using namespace std::chrono_literals; // support suffixes like 100ms, 2s, 30us
// examples:
// Time_t testStart_us = HRClk_t::now();
// auto testDuration_us = std::chrono::duration_cast<US_t>(HRClk_t::now() - testStart_us);
// auto count_us = testDuration_us.count();
// or
// std::cout << " complete " << testDuration_us.count() << " us" << std::endl;

// C++ access to Linux semaphore via Posix
// Posix Process Semaphore, set to Local mode (unnamed, unshared)
class PPLSem_t
public: // shared-between-threads--v v--initial-value is unlocked
PPLSem_t() { assert(0 == ::sem_init(&m_sem, 0, 1)); } // ctor
~PPLSem_t() { assert(0 == ::sem_destroy(&m_sem)); } // dtor

int lock() { return (::sem_wait(&m_sem)); } // returns 0 when success, else -1
int unlock() { return (::sem_post(&m_sem)); } // returns 0 when success, else -1

void wait() { assert(0 == lock()); }
void post() { assert(0 == unlock()); }

::sem_t m_sem;
// POSIX is an api, this C++ class simplifies use
// sem_wait and sem_post are possibly assembly for best performance

// Note 3 - locale what now?
// insert commas from right to left -- change 1234567890 to 1,234,567,890
// input 's' is the digits-to-the-left-of-the-decimal-point
// returns s contents with inserted comma's
std::string digiComma(std::string s)
{ //vvvvv--sSize must be signed int of sufficient size
int32_t sSize = static_cast<int32_t>(s.size());
if (sSize > 3)
for (int32_t indx = (sSize - 3); indx > 0; indx -= 3)
s.insert(static_cast<size_t>(indx), 1, ',');

const std::string dashLine(" --------------------------------------------------------------\n");

// Note 5 - thread sync to wall clock
// action: pauses a thread, resume thread action at next wall-clock-start-of-second
void sleepToWallClockStartOfSec(std::time_t t0 = 0)
if (0 == t0) { t0 = std::time(nullptr); }
while(t0 == std::time(nullptr)) {
std::this_thread::sleep_for(100ms); } // good-neighbor-thread
// a good-neighbor-thread delay does not 'hog' a processor

// Note 4 - typedef examples to simplify
// create new types based on vector ... suffix '_t' reminds that this is a type
typedef std::vector<uint> UintVec_t;
typedef std::vector<uint> TIDSeqVec_t;
typedef std::vector<std::thread*> Thread_pVec_t;

// measure -std=C++14 std::thread average context switch duration
// enforced with one PPLSem_t
class Q6_t
// private data
const uint MaxThreads; // thread count
const uint MaxSecs; // seconds of test
const std::string m_TIDSeqPFN; // capture tid seq to ram (write to file later)
uint m_thrdSwtchCount; // count incremented by all threads
bool m_done; // main to threads: cease and desist
uint m_rdy; // threads to main: thread is ready! (running)
PPLSem_t m_sem; // one semaphore shared by all threads
UintVec_t m_thrdRunCountVec; // counts incremented per thread
TIDSeqVec_t m_TIDSeq_Vec; // sequence (order) of thread execution
Thread_pVec_t m_thread_pVec; // vector of thread pointers


Q6_t() // default ctor
: MaxThreads(10) // total threads
, MaxSecs(10) // controlled seconds of test
, m_TIDSeqPFN("./Q6.txt") // where put data file
, m_thrdSwtchCount(0)
, m_done(false) // main() to threads: cease and desist
, m_rdy(0) // threads to main(): thread is ready!
// m_sem // default ctor ok
// m_thrdRunCountVec // default ctor ok
// m_TIDSeq_Vec // default ctor ok
// m_thread_pVec // default ctor ok
for (size_t i = 0; i < MaxThreads; ++i) {
m_thrdRunCountVec.push_back(0); // 0 each per-thread counter
// your results -----vvvvvvvv----will vary
m_TIDSeq_Vec.reserve(45000000); // observed as many as 42,000,000 on my old Dell
// DO NOT start threads (m_thread_pVec) yet
} // AciveObj_t()

// m_TIDSeq_Vec,
while(m_thread_pVec.size()) { // more to pop and delete
std::thread* t = m_thread_pVec.back(); // return last element
m_thread_pVec.pop_back(); // remove last element
delete t; // delete thread
// m_thrdRunCountVec;
// m_TIDSeqPFN, m_sem, m_rdy; m_done;
// m_thrdSwtchCount; MaxSecs; MaxThreads;
} // ~Q6_t()

// Q6_t::main(..) runs in context thread 'main()', invoked in function main()
int main(std::string label)
std::cout << dashLine << " " << MaxSecs << " second measure of "
<< MaxThreads << " threads, 1 PPLSem_t " << label << "\n"
<< " output: " << m_TIDSeqPFN << '\n'<< std::endl;

assert(0 == m_sem.lock()); // take posession of m_sem
// now all thread will block at critical section entry (in onceThruCritSect())
std::cout << "\n block threads at crit sect " << std::endl;


long int durationUS = 0;

releaseThreadsAndWait(durationUS); // run threads run

std::cout << "\n" << std::endl
<< report(" 'thread context switch' ",
m_thrdSwtchCount, durationUS);




measure_LockUnlock(); // with no context switch, no collision

} // int main() // in 'main' context


void onceThru(uint id) // a crit section
assert(0 == m_sem.lock()); // critical section entry
m_thrdSwtchCount += 1; // 'work'
m_thrdRunCountVec[id] += 1; // diagnostic - thread work-balance
m_TIDSeq_Vec.push_back(id); // thread sequence capture
assert(0 == m_sem.unlock()); // critical section exit

// thread entry point
void threadRun(uint id)
std::cout << '.' << id << std::flush; // "."
m_rdy |= (1 << id); // thread to main: i am ready
do {


if (m_done) break; // exit when done tbr - FIXME -- rare hang


// main() context: create and activate std::thread's with new
void createAndActivateThreads() // main() context
std::cout << " createAndActivateThreads() ";
Time_t start_us = HRClk_t::now();
for (uint id = 0; id < MaxThreads; ++id)
// std::thread activates when instance created
std::thread* thrd = new
std::thread(&Q6_t::threadRun, this, id);
// method-------^^^^^^^^^^^^^^^ ^^--single param for method
// instance*---------------------^^^^
assert(nullptr != thrd);

// create handshake mask for unique 'id' bit of m_rdy
uint mask = (1 << id);

// wait for bit set in m_rdy by thread
while ( ! (mask & m_rdy) ) {
std::this_thread::sleep_for(100ms); // not a poll
// thread has confirmed to main() that it is running

// capture pointer to invoke join's
auto duration_us =
std::chrono::duration_cast<US_t>(HRClk_t::now() - start_us);
std::cout << " (" << digiComma(std::to_string(duration_us.count()))
<< " us)" << std::endl;

sleepToWallClockStartOfSec(); // start-of-second

} // void createAndActivateThreads()

// main() context: measure average context switch duration
// by releasing threads to run
void releaseThreadsAndWait(long int& count_us)
Time_t testStart_us = HRClk_t::now();

// thread 'main()' is current owner of this semaphore - see "Q6_t::main()"
assert(0 == m_sem.unlock()); // release the hounds

std::cout << " releaseThreadsAndWait " << std::flush;

// progress indicator to user
for (size_t i = 0; i < MaxSecs; ++i) // let threads switch for 10 seconds
sleepToWallClockStartOfSec(); // 'main()' sync's to wall clock
std::cout << (MaxSecs-i-1) << ' ' << std::flush; // "9 8 7 6 5 4 3 2 1 0"

// tbr - dedicated mutex for this single-write / multiple read ? or std::atomic ?
m_done = true; // command threads to exit - all threads can see m_done

auto testDuration_us =
std::chrono::duration_cast<US_t>(HRClk_t::now() - testStart_us);
count_us = testDuration_us.count();

// tbr - main() shall confirm all threads complete
// tbr - measure how long to detect m_done

Time_t joinStart_us = HRClk_t::now();
std::cout << "\n join threads ";
for (size_t i = 0; i < MaxThreads; ++i)
m_thread_pVec[i]->join(); // main() waits here for thread[i] completion
std::cout << ". " << std::flush;
auto joinDuration_us =
std::chrono::duration_cast<US_t>(HRClk_t::now() - joinStart_us);
std::cout << " (" << digiComma(std::to_string(joinDuration_us.count()))
<< " us)" << std::endl;

} // void releaseThreadsAndWait(long int& count_us)

void reportThreadActionCounts()
std::cout << "\n each thread run count: \n ";
uint sum = 0;
for (auto it : m_thrdRunCountVec)
std::cout << std::setw(11) << digiComma(std::to_string(it));
sum += it;
std::cout << std::endl;
uint diff = (sum - m_thrdSwtchCount);

std::cout << ' ';
double maxPC = 0.0;
double minPC = 100.0;
for (auto it : m_thrdRunCountVec)
double percent = static_cast<double>(it) / static_cast<double>(sum);
if(percent > maxPC) maxPC = percent;
if(percent < minPC) minPC = percent;
std::cout << std::setw(11) << (percent * 100);
std::cout << " (% of total)\n\n total : " << digiComma(std::to_string(sum));

if (diff) std::cout << " (diff: " << diff << ")";

std::cout << " note variability -- min : " << (minPC*100)
<< "% max : " << (maxPC*100) << "%" << std::endl;
} // void reportThreadActionCounts()

void writeTIDSeqToQ6_txt() // m_TIDSeq_Vec - record sequence of thread access to critsect
size_t sz = m_TIDSeq_Vec.size();
std::cout << '\n' << dashLine << " writing Thread ID sequence of "
<< digiComma(std::to_string(sz)) << " values to "
<< m_TIDSeqPFN << std::endl;

Time_t writeStart_us = HRClk_t::now();

do {
std::ofstream Q6cout(m_TIDSeqPFN);

if ( ! Q6cout.good() )
std::cerr << "not able to open for write: " << m_TIDSeqPFN << std::endl;

size_t lnSz = 0;
for (auto it : m_TIDSeq_Vec)
// encode Thread ID uints: 0 1 2 3 4 5 6 7 8 9
// to letters 'A' thru 'J': vvvvvv 'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J'
Q6cout << static_cast<char>(it+'A');
// whitespace not needed

if (++lnSz > 100) { Q6cout << std::endl; lnSz = 0; } // 100 chars per line
Q6cout << '\n' << std::endl;


} while(0);

auto wDuration_us = std::chrono::duration_cast<US_t>
( HRClk_t::now() - writeStart_us );

std::cout << " complete: "
<< digiComma(std::to_string(wDuration_us.count()))
<< " us" << std::endl;
} // writeTIDSeqToQ6_txt

std::string report(std::string lbl, uint64_t eventCount, uint64_t duration_us)
std::stringstream ss;
ss << " " << to_engineering_string(static_cast<double>(eventCount),9,eng_prefixed)
<< lbl << " events in " << digiComma(std::to_string(duration_us)) << " us" << std::endl;

double eventsPerSec = (1000000.0*(static_cast<double>(eventCount))/

ss << " " << to_engineering_string(eventsPerSec,9,eng_prefixed)
<< lbl << " events per second\n "
<< to_engineering_string((1.0/eventsPerSec), 9, eng_prefixed)
<< " sec per " << lbl << " event " << std::endl;
} // std::string report(std::string lbl, uint64_t eventCount, uint64_t duration_us)

// Note 6 - stack size -> use POSIX 'pthread_attr_...' API
void reportMainStackSize()
pthread_attr_t tattr;
int stat = pthread_attr_init (&tattr);
assert(0 == stat);

size_t size;
stat = pthread_attr_getstacksize(&tattr, &size);
assert(0 == stat);

std::cout << '\n' << dashLine << " Stack Size: "
<< digiComma(std::to_string(size))
<< " [of 'main()' by pthread_attr_getstacksize]\n"
<< std::endl;
stat = pthread_attr_destroy(&tattr);
assert(0 == stat);
} // void reportMainStackSize()

// Note 7 - semaphore API performance
// measure duration when no context switch (i.e. no thread 'collision')
void measure_LockUnlock()
//PPLSem_t* sem1 = new PPLSem_t;
//assert(nullptr != sem1);
PPLSem_t sem1;
size_t count1 = 0;
size_t count2 = 0;
std::cout << dashLine << " 3 second measure of lock()/unlock()"
<< " (no collision) " << std::endl;
time_t t0 = time(0) + 3;

Time_t start_us = HRClk_t::now();
do {
assert(0 == sem1.lock()); count1 += 1;
assert(0 == sem1.unlock()); count2 += 1;
if(time(0) > t0) break;
auto duration_us = std::chrono::duration_cast<US_t>(HRClk_t::now() - start_us);

assert(count1 == count2);
std::cout << report (" 'sem lock()+unlock()' ", count1, duration_us.count());

std::cout << "\n";
} // void mainMeasures_LockUnlock()

}; // class Q6_t

} // namespace DTB

int main(int argc, char* argv[] )
std::cout << "\nargc: " << argc << '\n' << std::endl;
for (int i=0; i<argc; i+=1) std::cout << argv[i] << " ";
std::cout << "\n" << std::endl;

setlocale(LC_ALL, "");
std::time_t t0 = std::time(nullptr);
std::cout << " " << std::asctime(std::localtime(&t0)) << std::endl;;

DTB::Time_t main_start_us = DTB::HRClk_t::now();
int retVal = 0;
DTB::Q6_t q6;
retVal = q6.main(" Q6::main() ");
auto duration_us = std::chrono::duration_cast<DTB::US_t>
(DTB::HRClk_t::now() - main_start_us);

std::cout << " FINI "
<< DTB::digiComma(std::to_string(duration_us.count()))
<< " us" << std::endl;


  Fri Jun 30 15:30:13 2017

10 second measure of 10 threads, 1 PPLSem_t Q6::main()
output: ./Q6.txt

block threads at crit sect
createAndActivateThreads() . (1,002,120 us)
releaseThreadsAndWait 9 8 7 6 5 4 3 2 1 0
join threads . . . . . . . . . . (2,971 us)

31.07730700 M 'thread context switch' events in 10,021,447 us
3.101079814 M 'thread context switch' events per second
322.4683207 n sec per 'thread context switch' event

each thread run count:
3,182,496 3,252,929 3,245,473 3,150,344 3,411,918 2,936,982 2,978,690 3,029,319 3,004,926 2,884,230
10.2406 10.4672 10.4432 10.1371 10.9788 9.45057 9.58478 9.74769 9.6692 9.28082 (% of total)

total : 31,077,307 note variability -- min : 9.28082% max : 10.9788%

writing Thread ID sequence of 31,077,307 values to ./Q6.txt
complete: 3,025,289 us

Stack Size: 8,720,384 [of 'main()' by pthread_attr_getstacksize]

3 second measure of lock()/unlock() (no collision)
173.2359360 M 'sem lock()+unlock()' events in 3,902,491 us
44.39111737 M 'sem lock()+unlock()' events per second
22.52702926 n sec per 'sem lock()+unlock()' event

FINI 18,957,304 us

Q6.txt 示例行的长度为 100 个字符。




