I have a data set that has the average Math and English scores for every school district in the nation, grade 3-8 for 5 years. I'm looking to create a new variable that would be the average test score for the district for each given year regardless of grade. I'd like to eliminate the grade variable entirely. Each district has a unique leaid provided by common core.
我有一个数据集,其中有全国每个学区5年来3-8年级的平均数学和英语成绩。我希望创建一个新的变量,该变量将是该学区每一年的平均测试分数,而不考虑年级。我想要完全消除分数变量。每个区都有一个由共同核心提供的独特的助学金。
My dataset is df_stat and my variables are:
mean_link_math
mean_link_ela
leaid
grade
year
我的数据集是DF_STAT,我的变量是:Mean_LINK_MATH Mean_LINK_ELA LEID年级年份
I have been pretty lost in trying to attempt it. I can do it one observation at a time but with over 300,000 observations I'd rather not use this approach
我在尝试的过程中完全迷失了方向。我可以一次做一个观察,但对于超过300,000个观察,我不愿使用这种方法
更多回答
Hi Kyle Heideman. Please help make this a reproducible question by editing it to include a sample of the dataset. An easy way to do this is copying the output from dput(head(df_stat))
and pasting it into a codeblock in your question.
嗨,凯尔·海德曼。请将此问题编辑为包含数据集的样本,从而使其成为可重现的问题。要做到这一点,一种简单的方法是从dput(head(Df_Stat))复制输出并将其粘贴到问题的代码块中。
Wouldn’t this need the class sizes in order to properly weight the averages of averages calculations.?
这难道不需要班级规模来适当加权平均计算的平均值吗?
优秀答案推荐
我是一名优秀的程序员,十分优秀!