gpt4 book ai didi

python requests 破坏了选项标签

转载 作者:行者123 更新时间:2023-12-01 04:50:43 25 4
gpt4 key购买 nike

所以我开始了我的计划:

#Below will start scraping the site
url = 'http://www.chm.bris.ac.uk/motm/motm.htm'
response = requests.get(url)
soup = BeautifulSoup(response.content)
name = soup.find_all('option')

但是输出并不符合预期。由于某种原因,所有结束选项标签都发送到末尾:

<select name="site" onchange="javascript:formHandler()" size="1">
<option value="">Select your molecule

<option value="">--------------------

<option value="#sept2000">ABT-594

<option value="#may2007">Acetyl Coenzyme-A

<option value="#sept2014">Aconitine

<option value="#jan1998">Adenosine Triphosphate (ATP)

<option value="#may1999">Adrenaline

<option value="#dec2000">2,4,5-T (Agent Orange)

<option value="#aug1997">[Ag{(NC)Mn(CO)2-P(OPh3)(dppm)}2]+

<option value="#mar1997">Aluminiuim Fluoride

<option value="#july2002">Alliin

<option value="#june2013">Ammonia

<option value="#sept2009">Ananadamide

<option value="#sept1999">Anatoxin

<option value="#jan2003">Arsenic Pentachloride

<option value="#jan2005">Arsine<option value="#feb2001">Aspartame

<option value="#feb1996">Aspirin

<option value="#may2013">Artemisinin

<option value="#july2004">Atenolol

<option value="#feb2002">Atropine

<option value="#jan2006">Batrachotoxin

<option value="#aug2011">Benzene

<option value="#apr2002">Beta-Carotene

<option value="#aug2013">Bisphenol A

<option value="#may2009">Bombykol

<option value="#feb2000">Boswellic Acid

<option value="#feb2012">Botulinum toxin

<option value="#may2001">Brassinolide

<option value="#may2005">British Anti-Lewisite

<option value="#dec1997">4-Bromo-4-Methoxy-acetophenone Azine

<option value="#july1999">Bupropion

<option value="#oct2004">Butane

<option value="#jan1997">C60 (Buckminster fullerene)

<option value="#july2003">Caeruloplasmin

<option value="#apr2004">Cantharidin

<option value="#may2001">Capsaicin

<option value="#aug2012">Captopril

<option value="#may2012">Carbon Dioxide

<option value="#nov2005">Carbon Monoxide

<option value="#sept2003">Carnitine

<option value="#june2001">Chlorine trifluoride

<option value="#jun2014">Chloroauric acid

<option value="#oct2006">Chloroform

<option value="#may2000">Chlorophyll

<option value="#mar2014">Cholesterol

<option value="#dec2010">Cineole

<option value="#aug2006">Cinnamaldehyde

<option value="#mar2000">cis-gamma-Irone

<option value="#aug2000">Cisplatin

<option value="#nov2009">Citalopram

<option value="#feb2004">Combretastatin A-4

<option value="#dec2002">Coniine

<option value="#jan1999">Cubane

<option value="#mar2006">Cucurbituril

<option value="#july2009">Cyanoacrylate (Superglue)

<option value="#may1997">Cyclooctene

<option value="#dec1996">Decahelicene

<option value="#jan2011">DEET

<option value="#nov2002">Dettol

<option value="#may2010">Diacetyl

<option value="#july1996">Diamond

<option value="#june2005">Dichlorodifluoromethane (Freon)

<option value="#sept1996">Digitalis

<option value="#dec2013">Dimethyldisulfide

<option value="#oct2003">Dimethyl Mercury

<option value="#oct2005">Dimethylsulfide

<option value="#jan2000">DNA

<option value="#sept2005">Dioxin

<option value="#jan2012">DMSO (Dimethyl sulfoxide)

<option value="#nov2000">DNPO (Bis(2,4-dinitrophenyl) oxalate)

<option value="#oct2008">Dopamine

<option value="#mar2004">EDTA

<option value="#june2011">Endosulfan

<option value="#sept2000">Epibatidine

<option value="#oct2002">Epothilone

<option value="#dec2006">Ethene

<option value="#mar2003">Ethyl Acetate

<option value="#apr2011">Eribulin

<option value="#may2002">Etorphine

<option value="#dec2010">Eucalyptol

<option value="#oct1998">Ferritin

<option value="#may1996">Ferrocene

<option value="#sept2012">Filbertone

<option value="#aug1998">Finasteride

<option value="#feb2014">Fluorine

<option value="#june2004">Flunitrazepam

<option value="#jan2013">Fluoroform

<option value="#aug2003">Fluoxetine

<option value="#aug2008">Folic Acid

<option value="#july1998">Formaldehyde

<option value="#dec2005">Formic Acid (Methanoic Acid)

<option value="#feb2000">Frankincense

<option value="#mar2001">Frontalin

<option value="#feb2005">Galactosylceramide

<option value="#nov2012">Galanthamine

<option value="#aug2009">Geosmin

<option value="#apr2007">Glucose

<option value="#apr2010">Glycine

<option value="#jan2010">Green Fluorescent Protein

<option value="#sept2013">HFC134a

<option value="#apr2011">Halaven

<option value="#feb2010">Heavy Water

<option value="#aug1996">Helvetane and Israelane

<option value="#feb2006">Hemoglobin

<option value="#oct2010">Heptan-2-one

<option value="#jan2008">Herceptin

<option value="#mar2005">Hexenal

<option value="#sept1997">Hexol

<option value="#june2008">Histamine

<option value="#june2000">Histrionicotoxin

<option value="#jan2014">Hydrazine

<option value="#nov2011">Hydrogen cyanide

<option value="#sept2006">Hydrogen peroxide

<option value="#mar2009">Hydrogen sulphide

<option value="#sept2002">Ibogaine

<option value="#nov2001">Ibuprofen

<option value="#feb2009">Indigotin

<option value="#july2010">Insulin

<option value="#july2008">Isoprene

<option value="#apr2003">Ketamine

<option value="#nov2010">Kevlar

<option value="#sept2010">Kisspeptin

<option value="#apr2012">Lauric acid

<option value="#mar2008">Limonene

<option value="#oct2013">Linalool

<option value="#aug2006">Linezolid

<option value="#may2006">Linoleic Acid

<option value="#dec1998">LSD

<option value="#june2007">Lutein

<option value="#mar2013">Lithium aluminium hydride (lithal)

<option value="#june2006">The Manganese-calcium oxide cluster of Photosystem II

<option value="#dec2004">Maleimide-Polyethylene Glycol (MPEG4)

<option value="#jan1996">Mauveine dye

<option value="#apr1998">MCM-41

<option value="#oct2012">Medroxyprogesterone acetate

<option value="#apr2000">Melatonin

<option value="#aug2007">Menthol

<option value="#mar2007">Methamphetamine

<option value="#dec2007">Methane

<option value="#sept2001">Methyl Jasmonate

<option value="#nov2008">2-Methylundecanal

<option value="#oct1999">Mescaline

<option value="#mar2002">Mifepristone (RU-486)

<option value="#july2007">Monosodium Glutamate

<option value="#nov2004">Morphine

<option value="#june1998">Mustard Gas

<option value="#mar2011">Muscone

<option value="#aug2014">Myristicin

<option value="#oct1997">N2S2

<option value="#may2003">N3 Amide Dyes

<option value="#oct2000">Nandrolone

<option value="#aug2001">Nicotine

<option value="#nov2007">Nitric acid

<option value="#dec2012">Nitrogen Dioxide

<option value="#dec2001">Nitrogen Triiodide

<option value="#oct2007">Nitroglycerine

<option value="#june1999">Nitrous oxide

<option value="#june2010">Nylon

<option value="#may2004">Osmium Tetroxide

<option value="#may2011">Octanal

<option value="#dec2009">Octenol

<option value="#jan2009">Oxytocin

<option value="#mar1998">Ozone

<option value="#nov2006">Pentacene

<option value="#mar1996">Phthalocyanine

<option value="#apr2013">Phenylbutazone

<option value="#mar2012">Phenylethylamine

<option value="#june2003">Pnictogen

<option value="#nov1998">Polythiophene

<option value="#jun2009">Polytetrafluoroethylene (PTFE)

<option value="#may1998">Proline

<option value="#sept2007">Propanethial S-oxide

<option value="#jan2007">Prostanoic Acid (and Prostaglandins)

<option value="#aug2003">Prozac

<option value="#oct1999">Psilocybin

<option value="#feb1999">Ptaquiloside

<option value="#july2005">Quinine

<option value="#july2012">Raspberry ketone

<option value="#jan2002">Relenza

<option value="#apr2009">Retinal

<option value="#june2004">Rohypnol

<option value="#jan2004">Rotenone

<option value="#nov2003">S-Adenosyl Methionine

<option value="#aug1999">Salbutamol

<option value="#july2014">Salvinorin

<option value="#sept1998">Saxitoxin

<option value="#apr2005">Serotonin

<option value="#nov1996">Sialyl Lewis X

<option value="#nov2013">Silica (Silicon dioxide)

<option value="#apr2006">Skatole

<option value="#oct2011">Sodium hypochlorite

<option value="#mar2010">Sodium lauryl sulfate

<option value="#feb2007">Sodium Thiopental (sodium pentothal)

<option value="#feb2003">Spidroin

<option value="#nov1997">Sscorpionine

<option value="#apr1999">Staurosporine

<option value="#apr2014">Streptomycin

<option value="#oct2009">Strychnine

<option value="#may2014">Sucrose

<option value="#july2011">Sulfanilamide

<option value="#may2008">Sulfuric acid

<option value="#dec2003">Sulphur Dioxide

<option value="#apr2008">Sulphur hexafluoride

<option value="#july2006">Tamiflu

<option value="#dec1999">Tamoxifen

<option value="#dec2008">Taurine

<option value="#feb1997">Taxol

<option value="#jun2009">Teflon

<option value="#oct2001">Tetracycline

<option value="#jan2001">Tetraethyl Lead

<option value="#sept2013">1,1,1,2-Tetrafluoroethane

<option value="#nov1999">Tetrodotoxin

<option value="#jan2015">Tetranitratoxycarbon

<option value="#july2000">Thalidomide

<option value="#apr1996">THC

<option value="#aug2010">THG (tetrahydrogestrinone)

<option value="#feb2015">Thiomersal

<option value="#sept2014">Tramadol

<option value="#dec2011">2,4,6-Tribromophenol

<option value="#aug2004">Trimethylamine

<option value="#dec2014">Trinitrotoluene (TNT)

<option value="#june1997">Triphenylmethyl

<option value="#june2012">Tropane

<option value="#june2002">Tryptophan

<option value="#mar2012">Tyramine

<option value="#aug2002">Uranium Hexafluoride

<option value="#june1996">Urea

<option value="#sept2008">Uric Acid

<option value="#july1997">Vancomycin

<option value="#feb2008">Vanillin

<option value="#feb2013">Vaska's Compound

<option value="#may1997">Vitamin B12

<option value="#july2001">VX gas

<option value="#feb2011">Warfarin

<option value="#oct1996">Water

<option value="#july2013">Wilkinson's catalyst

<option value="#june2007">Zeaxanthin

<option value="#july1999">Zyban

</option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></select>

你能帮忙解决这个问题吗?

最佳答案

它对我来说效果很好,尝试更改解析器以使用lxml:

soup = BeautifulSoup(response.content,"lxml")

使用lxml:

print(name[1])
print(name[2])

<option value="">--------------------
</option>
<option value="#sept2000">ABT-594
</option>

使用html.parser:

soup = BeautifulSoup(response.content,"html.parser")
name = soup.find_all('option')
print(name[1])

.............................
<option value="#aug2002">Uranium Hexafluoride
<option value="#june1996">Urea
<option value="#sept2008">Uric Acid
<option value="#july1997">Vancomycin
<option value="#feb2008">Vanillin
<option value="#feb2013">Vaska's Compound
<option value="#may1997">Vitamin B12
<option value="#july2001">VX gas
<option value="#feb2011">Warfarin
<option value="#oct1996">Water
<option value="#july2013">Wilkinson's catalyst
<option value="#june2007">Zeaxanthin
<option value="#july1999">Zyban
</option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option></option>

您可能需要pip3 install lxml

installing a parser

因此使用 lxml 将正确解析 html。

请注意,我还在 python3.4 中使用 bs4

In [9]: import bs4

In [10]: bs4.__version__
Out[10]: '4.3.2'

In [11]: from lxml import etree

In [12]: etree.LXML_VERSION
Out[12]: (3, 3, 3, 0)

如果您使用的是 bs4 >= 4.2.0,您可以使用 diagnose这将:

打印一份报告,显示不同的解析器如何处理文档,并告诉您是否缺少 Beautiful Soup 可能使用的解析器:

from bs4.diagnose import diagnose
data = response.text
diagnose(data)

关于python requests 破坏了选项标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28553241/

25 4 0