- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我在单列上运行 TF-IDF。我想使用这个 TF-IDF 和另一个缩放整数列来训练我的逻辑回归分类器。不幸的是,我刚开始学习机器学习时遇到了问题。该代码目前有一个错误,但更大的问题是我只是不确定如何将这两个功能结合在一起以及这里的最佳实践是什么。有人能帮我运行吗?
from sklearn import metrics,preprocessing,cross_validation
from sklearn.feature_extraction.text import TfidfVectorizer
import sklearn.preprocessing
import sklearn.linear_model as lm
import pandas as p
loadData = lambda f: np.genfromtxt(open(f,'r'), delimiter=',')
print "loading data.."
traindata = list(np.array(p.read_csv('FinalCSVFin.csv', delimiter=";"))[:,2])#reading text column to make tf-idf
testdata = list(np.array(p.read_csv('FinalTestCSVFin.csv', delimiter=";"))[:,2])
y = np.array(p.read_csv('FinalCSVFin.csv', delimiter=";"))[:,-2] #reading our labels
AlexaTrainData = p.read_csv('FinalCSVFin.csv', delimiter=";")[["alexarank"]] #reading integer columns
AlexaTestData = p.read_csv('FinalTestCSVFin.csv', delimiter=";")[["alexarank"]]
AllAlexaAndGoogleInfo = AlexaTestData.append(AlexaTrainData) #joining integer columns
tfv = TfidfVectorizer(min_df=3, max_features=None, strip_accents='unicode',
analyzer='word',token_pattern=r'\w{1,}',ngram_range=(1, 2), use_idf=1,smooth_idf=1,sublinear_tf=1)
rd = lm.LogisticRegression(penalty='l2', dual=True, tol=0.0001,
C=1, fit_intercept=True, intercept_scaling=1.0,
class_weight=None, random_state=None)
X_all = traindata + testdata #joining tf-idf columns
lentrain = len(traindata)
print "fitting pipeline"
tfv.fit(X_all)
print "transforming data"
X_all = tfv.transform(X_all) #running TF-IDF
X = X_all[:lentrain]
AllAlexaAndGoogleInfo = AllAlexaAndGoogleInfo[:lentrain]
X_test = X_all[lentrain:]
X = np.hstack((X, AllAlexaAndGoogleInfo))
sc = preprocessing.StandardScaler().fit(X)
X = sc.transform(X)
X_test = sc.transform(X_test)
print "20 Fold CV Score: ", np.mean(cross_validation.cross_val_score(rd, X, y, cv=20, scoring='roc_auc'))
print "training on full data"
rd.fit(X,y)
pred = rd.predict_proba(X_test)[:,1]
testfile = p.read_csv('test.tsv', sep="\t", na_values=['?'], index_col=1)
pred_df = p.DataFrame(pred, index=testfile.index, columns=['label'])
pred_df.to_csv('benchmark.csv')
print "submission file created.."
当前代码死于以下错误:
ValueError: all the input arrays must have same number of dimensions
从 FinalCSVFin.csv 中提取:
url;urlid;boilerplate;label;alexarank;;;;
http://www.bloomberg.com/news/2010-12-23/ibm-predicts-holographic-calls-air-breathing-batteries-by-2015.html;4042;"{""title"":""IBM Sees Holographic Calls Air Breathing Batteries ibm sees holographic calls, air-breathing batteries"",""body"":""A sign stands outside the International Business Machines Corp IBM Almaden Research Center campus in San Jose California Photographer Tony Avelar Bloomberg Buildings stand at the International Business Machines Corp IBM Almaden Research Center campus in the Santa Teresa Hills of San Jose California Photographer Tony Avelar Bloomberg By 2015 your mobile phone will project a 3 D image of anyone who calls and your laptop will be powered by kinetic energy At least that s what International Business Machines Corp sees in its crystal ball The predictions are part of an annual tradition for the Armonk New York based company which surveys its 3 000 researchers to find five ideas expected to take root in the next five years IBM the world s largest provider of computer services looks to Silicon Valley for input gleaning many ideas from its Almaden research center in San Jose California Holographic conversations projected from mobile phones lead this year s list The predictions also include air breathing batteries computer programs that can tell when and where traffic jams will take place environmental information generated by sensors in cars and phones and cities powered by the heat thrown off by computer servers These are all stretch goals and that s good said Paul Saffo managing director of foresight at the investment advisory firm Discern in San Francisco In an era when pessimism is the new black a little dose of technological optimism is not a bad thing For IBM it s not just idle speculation The company is one of the few big corporations investing in long range research projects and it counts on innovation to fuel growth Saffo said Not all of its predictions pan out though IBM was overly optimistic about the spread of speech technology for instance When the ideas do lead to products they can have broad implications for society as well as IBM s bottom line he said Research Spending They have continued to do research when all the other grand research organizations are gone said Saffo who is also a consulting associate professor at Stanford University IBM invested 5 8 billion in research and development last year 6 1 percent of revenue While that s down from about 10 percent in the early 1990s the company spends a bigger share on research than its computing rivals Hewlett Packard Co the top maker of personal computers spent 2 4 percent last year At Almaden scientists work on projects that don t always fit in with IBM s computer business The lab s research includes efforts to develop an electric car battery that runs 500 miles on one charge a filtration system for desalination and a program that shows changes in geographic data IBM rose 9 cents to 146 04 at 11 02 a m in New York Stock Exchange composite trading The stock had gained 11 percent this year before today Citizen Science The list is meant to give a window into the company s innovation engine said Josephine Cheng a vice president at IBM s Almaden lab All this demonstrates a real culture of innovation at IBM and willingness to devote itself to solving some of the world s biggest problems she said Many of the predictions are based on projects that IBM has in the works One of this year s ideas that sensors in cars wallets and personal devices will give scientists better data about the environment is an expansion of the company s citizen science initiative Earlier this year IBM teamed up with the California State Water Resources Control Board and the City of San Jose Environmental Services to help gather information about waterways Researchers from Almaden created an application that lets smartphone users snap photos of streams and creeks and report back on conditions The hope is that these casual observations will help local and state officials who don t have the resources to do the work themselves Traffic Predictors IBM also sees data helping shorten commutes in the next five years Computer programs will use algorithms and real time traffic information to predict which roads will have backups and how to avoid getting stuck Batteries may last 10 times longer in 2015 than today IBM says Rather than using the current lithium ion technology new models could rely on energy dense metals that only need to interact with the air to recharge Some electronic devices might ditch batteries altogether and use something similar to kinetic wristwatches which only need to be shaken to generate a charge The final prediction involves recycling the heat generated by computers and data centers Almost half of the power used by data centers is currently spent keeping the computers cool IBM scientists say it would be better to harness that heat to warm houses and offices In IBM s first list of predictions compiled at the end of 2006 researchers said instantaneous speech translation would become the norm That hasn t happened yet While some programs can quickly translate electronic documents and instant messages and other apps can perform limited speech translation there s nothing widely available that acts like the universal translator in Star Trek Second Life The company also predicted that online immersive environments such as Second Life would become more widespread While immersive video games are as popular as ever Second Life s growth has slowed Internet users are flocking instead to the more 2 D environments of Facebook Inc and Twitter Inc Meanwhile a 2007 prediction that mobile phones will act as a wallet ticket broker concierge bank and shopping assistant is coming true thanks to the explosion of smartphone applications Consumers can pay bills through their banking apps buy movie tickets and get instant feedback on potential purchases all with a few taps on their phones The nice thing about the list is that it provokes thought Saffo said If everything came true they wouldn t be doing their job To contact the reporter on this story Ryan Flinn in San Francisco at rflinn bloomberg net To contact the editor responsible for this story Tom Giles at tgiles5 bloomberg net by 2015, your mobile phone will project a 3-d image of anyone who calls and your laptop will be powered by kinetic energy. at least that\u2019s what international business machines corp. sees in its crystal ball."",""url"":""bloomberg news 2010 12 23 ibm predicts holographic calls air breathing batteries by 2015 html""}";0;345;;;;
http://www.popsci.com/technology/article/2012-07/electronic-futuristic-starting-gun-eliminates-advantages-races;8471;"{""title"":""The Fully Electronic Futuristic Starting Gun That Eliminates Advantages in Races the fully electronic, futuristic starting gun that eliminates advantages in races the fully electronic, futuristic starting gun that eliminates advantages in races"",""body"":""And that can be carried on a plane without the hassle too The Omega E Gun Starting Pistol Omega It s easy to take for granted just how insanely close some Olympic races are and how much the minutiae of it all can matter The perfect example is the traditional starting gun Seems easy You pull a trigger and the race starts Boom What people don t consider When a conventional gun goes off the sound travels to the ears of the closest runner a fraction of a second sooner than the others That s just enough to matter and why the latest starting pistol has traded in the mechanical boom for orchestrated electronic noise Omega has been the watch company tasked as the official timekeeper of the Olympic Games since 1932 At the 2010 Vancouver games they debuted their new starting gun which is a far cry from the iconic revolvers associated with early games it s clearly electronic but still more than a button that s pressed to get the show rolling About as far away as you can get probably while still clearly being a starting gun Pull the trigger once and off the Olympians go If it s pressed twice consecutively it signals a false start Working through a speaker system is what eliminates any kind of advantage for athletes It s not a big advantage being close to a gun but the sound of the bullet traveling one meter every three milliseconds could contribute to a win Powder pistols have been connected to a speaker system before but even then runners could react to the sound of the real pistol firing rather than wait for the speaker sounds to reach them This year s setup will have speakers placed equidistant from runners forcing the sound to reach each competitor at exactly the same time It wouldn t be an enormous difference Omega Timing board member Peter H\u00fcrzeler said in an email but when you think about reaction times being measured in tiny fractions of a second placing a speaker behind each lane has eliminated any sort of advantage for any athlete They all hear the start commands and signal at exactly the same moment There s also an ulterior reason for its look In a post September 11th world a gun on its way to a major event is going to raise more than a few TSA eyebrows even if it s a realistic looking fake Rather than deal with that the e gun can be transported while still maintaining the general look of a starting gun But there s still nothing like hearing a starting gun go off at the start of a race more than signaling the runners there s probably some Pavlovian response after more than a century of Olympic games that make people want to hear the real thing not a whiny electronic noise Everyone in the stands at home thankfully will still be getting that The sound is programmable and can be synthesized to sound like almost anything H\u00fcrzeler says but we program it to sound like a pistol it s a way to use the best possible starting technology but to keep a rich tradition alive and that can be carried on a plane without the hassle, too technology,gadgets,london 2012,london olympics,olympics,omega,starting guns,summer olympics,timing,popular science,popsci"",""url"":""popsci technology article 2012 07 electronic futuristic starting gun eliminates advantages races""}";1;5304;;;;
http://www.menshealth.com/health/flu-fighting-fruits?cm_mmc=Facebook-_-MensHealth-_-Content-Health-_-FightFluWithFruit;1164;"{""title"":""Fruits that Fight the Flu fruits that fight the flu | cold & flu | men's health"",""body"":""Apples The most popular source of antioxidants in our diet one apple has an antioxidant effect equivalent to 1 500 mg of vitamin C Apples are loaded with protective flavonoids which may prevent heart disease and cancer Next Papayas With 250 percent of the RDA of vitamin C a papaya can help kick a cold right out of your system The beta carotene and vitamins C and E in papayas reduce inflammation throughout the body lessening the effects of asthma Next Cranberries Cranberries have more antioxidants than other common fruits and veggies One serving has five times the amount in broccoli Cranberries are a natural probiotic enhancing good bacteria levels in the gut and protecting it from foodborne illnesses Next Grapefruit Loaded with vitamin C grapefruit also contains natural compounds called limonoids which can lower cholesterol The red varieties are a potent source of the cancer fighting substance lycopene Next Bananas One of the top food sources of vitamin B6 bananas help reduce fatigue depression stress and insomnia Bananas are high in magnesium which keeps bones strong and potassium which helps prevent heart disease and high blood pressure Next everything you need to know about cold and flu so you don\u2019t get sick this season, at men\u2019s health.com cold, flu, infection, sore throat, sneeze, immunity, germs, allergies, stay healthy, sick, contagious, medicines, cold medicine"",""url"":""menshealth health flu fighting fruits cm mmc Facebook Mens Health Content Health Fight Flu With Fruit""}";1;2663;;;;
和 FinalCSVTestFin.csv :
url;urlid;boilerplate;alexarank
http://www.lynnskitchenadventures.com/2009/04/homemade-enchilada-sauce.html;5865;"{""title"":""Homemade Enchilada Sauce Lynn s Kitchen Adventures "",""body"":""I usually buy my enchilada sauce Yes I knew I should be making it but I had never found a recipe that I was really happy with I had tried several and they just weren t very good So I stuck to the canned stuff you can get at the grocery store I was recently talking to a friend of mine about this She lived in Mexico for a few years so she knows some about Mexican cooking I asked her how she made her enchilada sauce She told me the basics and then gave me an exact recipe I decided to give it a try This recipe was really good This was the best enchilada sauce that I have made It had great flavor I think it was even better than the canned sauce My husband thought it could have been spicier But he likes his enchiladas spicy You can always add more chili powder or chilies if you like it really spicy The kids and I thought it was really good just like it is I did change two things It called for green onions I did not have any so I used regular onions I thought they worked great so I will probably continue to make it this way I also pureed everything in the blender I wanted a very smooth sauce If you want it more chunky just mix the ingredients together and do not blend Enchiladas are a pretty frugal meal and homemade enchilada sauce is a great way to make enchiladas even more frugal 2 8 ounce cans tomato sauce 1 4 ounce can chopped green chilies undrained 1 2 cup onion chopped 2 teaspoons chili powder 1 teaspoon ground cumin 1 4 teaspoon dried oregano 1 clove garlic minced Combine all tomato sauce ingredients and place in a blender Puree ingredients Then place in a saucepan Heat over medium heat until heated through about 5 minutes Use as desired for enchiladas Get your free Quick Easy Breakfasts ebook Just subscribe for free email updates from Lynn s Kitchen Adventures Like this article Share it this homemade enchilada sauce recipe came from a friend of mine who spent several years in mexico."",""url"":""lynnskitchenadventures 2009 04 homemade enchilada sauce html""}";169088
http://lolpics.se/18552-stun-grenade-ar;782;"{""title"":""lolpics Stun grenade ar "",""body"":"" funny pictures at lolpics.se. the best funny images on the internet funny photo images, funny videos, can has cheezburger ,roliga bilder, lol pics, lolpics"",""url"":""lolpics se 18552 stun grenade ar""}";459983
我想在 TF-IDF 列和 alexaranking 列上进行训练以生成答案的 .csv 文件,但我确信尽管我尽了最大努力,但我还是在这里犯了很多错误。如有任何帮助或建议,我们将不胜感激。
谢谢。
更新:
我正在运行以下代码:
from sklearn import metrics,preprocessing,cross_validation
from sklearn.feature_extraction.text import TfidfVectorizer
import sklearn.preprocessing
import sklearn.linear_model as lm
import pandas as p
loadData = lambda f: np.genfromtxt(open(f,'r'), delimiter=',')
print "loading data.."
traindata = list(np.array(p.read_csv('FinalCSVFin.csv', delimiter=";"))[:,2])
testdata = list(np.array(p.read_csv('FinalTestCSVFin.csv', delimiter=";"))[:,2])
y = np.array(p.read_csv('FinalCSVFin.csv', delimiter=";"))[:,-2]
AlexaTrainData = p.read_csv('FinalCSVFin.csv', delimiter=";")[["alexarank"]]
AlexaTestData = p.read_csv('FinalTestCSVFin.csv', delimiter=";")[["alexarank"]]
AllAlexaAndGoogleInfo = AlexaTestData.append(AlexaTrainData)
tfv = TfidfVectorizer(min_df=3, max_features=None, strip_accents='unicode',
analyzer='word',token_pattern=r'\w{1,}',ngram_range=(1, 2), use_idf=1,smooth_idf=1,sublinear_tf=1)
rd = lm.LogisticRegression(penalty='l2', dual=True, tol=0.0001,
C=1, fit_intercept=True, intercept_scaling=1.0,
class_weight=None, random_state=None)
X_all = traindata + testdata
lentrain = len(traindata)
print "fitting pipeline"
tfv.fit(X_all)
print "transforming data"
X_all = tfv.transform(X_all)
X = X_all[:lentrain]
AllAlexaAndGoogleInfo = AllAlexaAndGoogleInfo[:lentrain]
X_test = X_all[lentrain:]
print "X.shape => " + str(X.shape)
print "AllAlexaAndGoogleInfo.shape => " + str(AllAlexaAndGoogleInfo.shape)
print "X_all.shape => " + str(X_all.shape)
#X = np.column_stack((X, AllAlexaAndGoogleInfo))
X = np.hstack((X, AllAlexaAndGoogleInfo))
sc = preprocessing.StandardScaler().fit(X)
X = sc.transform(X)
X_test = sc.transform(X_test)
产生以下输出:
loading data..
fitting pipeline
transforming data
X.shape => (7395, 238377)
AllAlexaAndGoogleInfo.shape => (7395, 1)
X_all.shape => (10566, 238377)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-2b310887b5e4> in <module>()
31 print "X_all.shape => " + str(X_all.shape)
32 #X = np.column_stack((X, AllAlexaAndGoogleInfo))
---> 33 X = np.hstack((X, AllAlexaAndGoogleInfo))
34 sc = preprocessing.StandardScaler().fit(X)
35 X = sc.transform(X)
C:\Users\Simon\Anaconda\lib\site-packages\numpy\core\shape_base.pyc in hstack(tup)
271 # As a special case, dimension 0 of 1-dimensional arrays is "horizontal"
272 if arrs[0].ndim == 1:
--> 273 return _nx.concatenate(arrs, 0)
274 else:
275 return _nx.concatenate(arrs, 1)
ValueError: all the input arrays must have same number of dimensions
最佳答案
以下是我可以给的一些建议:
hstack
抛出的错误尽管错误消息似乎表明 X
和 AllAlexaAndGoogleInfo
具有不同的尺寸,但根据您的调试打印,它们确实具有相同的尺寸。问题是 X
是一个稀疏矩阵,尝试使用 scipy.sparse.hstack
而不是 numpy.hstack
。
以下代码显示,在增广列中,训练样本排在测试样本之后。
AlexaTrainData = p.read_csv('FinalCSVFin.csv', delimiter=";")[["alexarank"]] #reading integer columns
AlexaTestData = p.read_csv('FinalTestCSVFin.csv', delimiter=";")[["alexarank"]]
AllAlexaAndGoogleInfo = AlexaTestData.append(AlexaTrainData) #joining integer columns
根据X_all
的排列,训练样本排在测试样本之前。为了使 AllAlexaAndGoogleInfo
相同,我认为您实际上会想要这样做:
AllAlexaAndGoogleInfo = AlexaTrainData.append(AlexaTestData)
X_test
以包含新列继续:
X = X_all[:lentrain]
AllAlexaAndGoogleInfo = AllAlexaAndGoogleInfo[:lentrain]
X_test = X_all[lentrain:]
X = np.hstack((X, AllAlexaAndGoogleInfo))
sc = preprocessing.StandardScaler().fit(X)
X = sc.transform(X)
X_test = sc.transform(X_test)
在这里,您正在截断 AllAlexaAndGoogleInfo
而不更新 X_test
就像您使用 hstack
和 X
一样。因此,在预测阶段,
pred = rd.predict_proba(X_test)[:,1]
模型会提示给定的 X_test
与训练数据相比具有不同数量的特征。事实上,当你这样做时,错误会更早发生
sc.transform(X_test)
关于python - 错误地连接 Pandas 中的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22139189/
我知道这个问题可能已经被问过,但我检查了所有这些,我认为我的情况有所不同(请友善)。所以我有两个数据集,第一个是测试数据集,第二个是我保存在数据框中的预测(预测值,这就是没有数据列的原因)。我想合并两
在 .loc 方法的帮助下,我根据同一数据框中另一列中的值来识别 Panda 数据框中某一列中的值。 下面给出了代码片段供您引用: var1 = output_df['Player'].loc[out
当我在 Windows 中使用 WinSCP 通过 Ubuntu 连接到 VMware 时,它提示: The server rejected SFTP connection, but it lis
我正在开发一个使用 xml web 服务的 android 应用程序。在 wi-fi 网络中连接时工作正常,但在 3G 网络中连接时失败(未找到 http 404)。 这不仅仅发生在设备中。为了进行测
我有一个XIB包含我的控件的文件,加载到 Interface Builder(Snow Leopard 上的 Xcode 4.0.2)中。 文件的所有者被设置为 someClassController
我在本地计算机上管理 MySQL 数据库,并通过运行以下程序通过 C 连接到它: #include #include #include int main(int argc, char** arg
我不知道为什么每次有人访问我网站上的页面时,都会打开一个与数据库的新连接。最终我到达了大约 300 并收到错误并且页面不再加载。我认为它应该工作的方式是,我将 maxIdle 设置为 30,这意味着
希望清理 NMEA GPS 中的 .txt 文件。我当前的代码如下。 deletes = ['$GPGGA', '$GPGSA', '$GPGSV', '$PSRF156', ] searchquer
我有一个 URL、一个用户名和一个密码。我想在 C# .Net WinForms 中建立 VPN 连接。 你能告诉我从哪里开始吗?任何第三方 API? 代码示例将受到高度赞赏... 最佳答案 您可以像
有没有更好的方法将字符串 vector 转换为字符 vector ,字符串之间的终止符为零。 因此,如果我有一个包含以下字符串的 vector "test","my","string",那么我想接收一
我正在编写一个库,它不断检查 android 设备的连接,并在设备连接、断开连接或互联网连接变慢时给出回调。 https://github.com/muddassir235/connection_ch
我的操作系统:Centos 7 + CLOUDLINUX 7.7当我尝试从服务器登录Mysql时 [root@server3 ~]# Mysql -u root -h localhost -P 330
我收到错误:Puma 发现此错误:无法打开到本地主机的 TCP 连接:9200(连接被拒绝 - 连接(2)用于“本地主机”端口 9200)(Faraday::ConnectionFailed)在我的
请给我一些解决以下错误的方法。 这是一个聊天应用....代码和错误如下:: conversations_controller.rb def create if Conversation.bet
我想将两个单元格中的数据连接到一个单元格中。我还想只组合那些具有相同 ID 的单元格。 任务 ID 名称 4355.2 参与者 4355.2 领袖 4462.1 在线 4462.1 快速 4597.1
我经常需要连接 TSQL 中的字段... 使用“+”运算符时 TSQL 强制您处理的两个问题是 Data Type Precedence和 NULL 值。 使用数据类型优先级,问题是转换错误。 1)
有没有在 iPad 或 iPhone 应用程序中使用 Facebook 连接。 这个想法是登录这个应用程序,然后能够看到我的哪些 facebook 用户也在使用该应用程序及其功能。 最佳答案 是的。
我在连接或打印字符串时遇到了一个奇怪的问题。我有一个 char * ,可以将其设置为字符串文字的几个值之一。 char *myStrLiteral = NULL; ... if(blah) myS
对于以下数据 - let $x := "Yahooooo !!!! Select one number - " let $y := 1 2 3 4 5 6 7 我想得到
我正在看 UDEMY for perl 的培训视频,但是视频不清晰,看起来有错误。 培训展示了如何使用以下示例连接 2 个字符串: #!usr/bin/perl print $str = "Hi";
我是一名优秀的程序员,十分优秀!