gpt4 book ai didi

python - 删除 tf.dataset 管道中输入字符串的重音

转载 作者:太空宇宙 更新时间:2023-11-03 20:00:39 32 4
gpt4 key购买 nike

我正在尝试创建一个 tf.dataset 管道(TF2),其中读取文本文件并对其进行一些预处理。

mytext.txt 文件的内容如下:

Para este projeto fizemos questão de ter uma equipe formada por mulheres, desde o catering, passando pela maquiagem até a produção, iluminação e direção. Abaixo reunimos algumas histórias dos bastidores:

我的Python代码:

import tensorflow as tf
import unicodedata

# Strip accents from input string.
def unicode_to_ascii(s):
return tf.strings.strip(''.join(c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))

# Text files
files = tf.data.Dataset.list_files('/data/tmp/mytext.txt', shuffle=True, seed=None)

# Pipeline
dataset = tf.data.TextLineDataset(files, compression_type=None, buffer_size=None, num_parallel_reads=None)
dataset = dataset.map(unicode_to_ascii)

for d in dataset:
print(d.numpy().decode('utf8'))

但我收到以下错误:

    /data/dev/python/dlbox/examples/preprocess_text copy.py:6 unicode_to_ascii  *
return tf.strings.strip(''.join(c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))
/home/kleysonr/.virtualenvs/tf2/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py:396 converted_call
return py_builtins.overload_of(f)(*args)

TypeError: normalize() argument 2 must be str, not Tensor

我找不到转换字符串中 s:Tensor 的方法。

我怎样才能让它工作?

编辑 1

尝试使用tf.py_function代替:

# Pipeline
dataset = tf.data.TextLineDataset(files, compression_type=None, buffer_size=None, num_parallel_reads=None)
# dataset = dataset.map(unicode_to_ascii)
dataset = dataset.map(lambda x: tf.py_function(unicode_to_ascii, x, tf.string))

但也出现错误:

    /data/dev/python/dlbox/examples/preprocess_text copy.py:14 None  *
dataset = dataset.map(lambda x: tf.py_function(unicode_to_ascii, x, tf.string))
/home/kleysonr/.virtualenvs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/script_ops.py:407 eager_py_func
return _internal_py_func(func=func, inp=inp, Tout=Tout, eager=True, name=name)
/home/kleysonr/.virtualenvs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/script_ops.py:296 _internal_py_func
input=inp, token=token, Tout=Tout, name=name)
/home/kleysonr/.virtualenvs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_script_ops.py:74 eager_py_func
"EagerPyFunc", input=input, token=token, Tout=Tout, name=name)
/home/kleysonr/.virtualenvs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py:442 _apply_op_helper
(input_name, op_type_name, values))

TypeError: Expected list for 'input' argument to 'EagerPyFunc' Op, not Tensor("args_0:0", shape=(), dtype=string).

最佳答案

设法使用 tf.py_function 让它工作

# Strip accents from input string.
def unicode_to_ascii(s):
return tf.strings.strip(''.join(c for c in unicodedata.normalize('NFD', s.numpy().decode('utf8')) if unicodedata.category(c) != 'Mn'))

# Text files
files = tf.data.Dataset.list_files('/data/tmp/mytext.txt', shuffle=True, seed=None)

# Pipeline
dataset = tf.data.TextLineDataset(files, compression_type=None, buffer_size=None, num_parallel_reads=None)
dataset = dataset.map(lambda line: tf.py_function(unicode_to_ascii, [line], tf.string))

for d in dataset:
print(d.numpy().decode('utf8'))

关于python - 删除 tf.dataset 管道中输入字符串的重音,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59254640/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com