I need to take a table from a PDF file.
我需要从PDF文件中取一张桌子。
the code is:
代码是:
pdf=tabula.read_pdf(arquivo, pages=(1,2), lattice=True)
PDF=tabula.read_pdf(arquivo,Pages=(1,2),Latge=True)
I convert both df to lists, as code below:
我将这两个df转换为列表,代码如下:
lista=pdf[1].values.tolist()
Lista=pdf[1].values.tolist()
lista2=pdf[2].values.tolist()
Lista2=pdf[2].values.tolist()
My problem is that the convertion is losing the first row of dataframe.
我的问题是转换正在丢失第一行数据帧。
The result of convertion of lista2
is:
Lista2的转换结果为:
"[[**8**,
'vitamínicos e/ou minerais /\rVitaminas: C (45mg), E (10mg),\rNiacina (16mg), A (600mcg), ac.\rpantotênico (5mg), D (5mcg), B6\r(1,3mg), B1 (1,2mg), B2 (1,3 mg),\rB12 (1mcg), ác. fólico (200mcg),\rbiotina (30mcg): Minerais: cálcio\r(90mg), fósforo (38mg),\rmanganês (45mg), ferro (5mg),\rzinco (5mg), selênio (30 mcg),\rmanganês (1,2mg), selênio\r(30mcg), iodo (100mcg):\rProbiótico: Lactobacillus\racidophilus / COMPRIMIDO /\rSEM MARCA',
4705050,
'CP',
360,
nan],
[**9**,
'vitaminas + minerais /\rpolivitaminas + poliminerais /\rCOMPRIMIDO REVESTIDO\r/ ZIRVIT MULTI - POR MARCA',
3970019,
'CP',
540,
nan],
[**10**,
'suplemento alimentar / óleo de\rmicroalgas e lecitina de soja /\rCÁPSULA / SEM MARCA',
5717310,
'CP',
360,
nan]]"
When I request the valor of original source (before values.tolist
) pandas data frame pdf[2], I have:
当我请求原始来源(Values.tolist之前)熊猫数据帧pdf[2]的勇气时,我有:
**8** vitamínicos e/ou minerais /\rVitaminas: C (45m... 4705050 CP 360 NaN
**9** vitaminas + minerais /\rpolivitaminas + polimi... 3970019 CP 540 NaN
**10** suplemento alimentar / óleo de\rmicroalgas e l... 5717310 CP 360 NaN"
I have 4 products in pd df (7,8,9,10) and when I convert this to the list, I lost the first value, product ID 7.
我在pd df中有4个产品(7,8,9,10),当我将其转换为列表时,我丢失了第一个值,产品ID 7。
Any idea how to solve this question?
Thank you.
你知道怎么解决这个问题吗?谢谢。
更多回答
我是一名优秀的程序员,十分优秀!