python - 使用 shell 脚本在 python 中收集函数的日志-6ren

python - 使用 shell 脚本在 python 中收集函数的日志

转载作者：塔克拉玛干更新时间：2023-11-03 01:19:26

25

4

我的 pyspark 脚本运行良好。此脚本将从 mysql 中获取数据并在 HDFS 中创建配置单元表。

pyspark 脚本如下。

#!/usr/bin/env python
import sys
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext
conf = SparkConf()
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)

#Condition to specify exact number of arguments in the spark-submit command line
if len(sys.argv) != 8:
    print "Invalid number of args......"
    print "Usage: spark-submit import.py Arguments"
    exit()
table = sys.argv[1]
hivedb = sys.argv[2]
domain = sys.argv[3]
port=sys.argv[4]
mysqldb=sys.argv[5]
username=sys.argv[6]
password=sys.argv[7]

df = sqlContext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.Driver").option("dbtable","{}".format(table)).option("user", "{}".format(username)).option("password", "{}".format(password)).load()

#Register dataframe as table
df.registerTempTable("mytempTable")

# create hive table from temp table:
sqlContext.sql("create table {}.{} as select * from mytempTable".format(hivedb,table))

sc.stop()

现在将使用 shell 脚本调用此 pyspark 脚本。对于此 shell 脚本，我将表名作为文件中的参数传递。

shell 脚本如下。

#!/bin/bash

source /home/$USER/spark/source.sh
[ $# -ne 1 ] && { echo "Usage : $0 table ";exit 1; }

args_file=$1

TIMESTAMP=`date "+%Y-%m-%d"`
touch /home/$USER/logs/${TIMESTAMP}.success_log
touch /home/$USER/logs/${TIMESTAMP}.fail_log
success_logs=/home/$USER/logs/${TIMESTAMP}.success_log
failed_logs=/home/$USER/logs/${TIMESTAMP}.fail_log

#Function to get the status of the job creation
function log_status
{
       status=$1
       message=$2
       if [ "$status" -ne 0 ]; then
                echo "`date +\"%Y-%m-%d %H:%M:%S\"` [ERROR] $message [Status] $status : failed" | tee -a "${failed_logs}"
                #echo "Please find the attached log file for more details"
                exit 1
                else
                    echo "`date +\"%Y-%m-%d %H:%M:%S\"` [INFO] $message [Status] $status : success" | tee -a "${success_logs}"
                fi
}
while read -r table ;do 
  spark-submit --name "${table}" --master "yarn-client" --num-executors 2 --executor-memory 6g  --executor-cores 1 --conf "spark.yarn.executor.memoryOverhead=609" /home/$USER/spark/sql_spark.py ${table} ${hivedb} ${domain} ${port} ${mysqldb} ${username} ${password} > /tmp/logging/${table}.log 2>&1
  g_STATUS=$?
  log_status $g_STATUS "Spark job ${table} Execution"
done < "${args_file}"

echo "************************************************************************************************************************************************************************"

我能够使用上述 shell 脚本为 args_file 中的每个单独的表收集日志。

现在mysql中有200多张表。我修改了 pyspark 脚本，如下所示。我创建了一个函数来处理 args_file 并执行代码。

新的spark脚本

#!/usr/bin/env python
import sys
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext
conf = SparkConf()
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)

#Condition to specify exact number of arguments in the spark-submit command line
if len(sys.argv) != 8:
    print "Invalid number of args......"
    print "Usage: spark-submit import.py Arguments"
    exit()
args_file = sys.argv[1]
hivedb = sys.argv[2]
domain = sys.argv[3]
port=sys.argv[4]
mysqldb=sys.argv[5]
username=sys.argv[6]
password=sys.argv[7]

def testing(table, hivedb, domain, port, mysqldb, username, password):

    print "*********************************************************table = {} ***************************".format(table)
    df = sqlContext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.Driver").option("dbtable","{}".format(table)).option("user", "{}".format(username)).option("password", "{}".format(password)).load()

    #Register dataframe as table
    df.registerTempTable("mytempTable")

    # create hive table from temp table:
    sqlContext.sql("create table {}.{} stored as parquet as select * from mytempTable".format(hivedb,table))

input = sc.textFile('/user/XXXXXXX/spark_args/%s' %args_file).collect()

for table in input:
 testing(table, hivedb, domain, port, mysqldb, username, password)

sc.stop()

现在我想为 args_file 中的单个表收集日志。但我只得到一个日志文件，其中包含所有表的日志。

我怎样才能达到我的要求？还是我做的方法完全错误

New shell script:

spark-submit --name "${args_file}" --master "yarn-client" --num-executors 2 --executor-memory 6g  --executor-cores 1 --conf "spark.yarn.executor.memoryOverhead=609" /home/$USER/spark/sql_spark.py ${table} ${hivedb} ${domain} ${port} ${mysqldb} ${username} ${password} > /tmp/logging/${args_file}.log 2>&1

最佳答案

你可以做的是写一个 python将获取单个日志文件并在日志文件前面剪切一行的脚本 prints table姓名。

例如:

*************************************table=table1***************

那么下一个日志文件开始于

*************************************table=table2****************

等等。您还可以将表名作为文件名

关于python - 使用 shell 脚本在 python 中收集函数的日志，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45684000/

25

4

0

文章推荐： c++ - 递归下降解析和语法树

文章推荐： c++ - 如何检查 C/C++ 中的整数溢出？

文章推荐： linux - 使用 Shell 脚本防止在文件中输入重复记录

详解C语言sscanf()函数、vsscanf()函数、vscanf()函数
C语言sscanf()函数：从字符串中读取指定格式的数据头文件： ?
php - 如何解释at()函数； substr()函数;伪代码中的exist()函数
最近，我有一个关于工作预评估的问题，即使查询了每个功能的工作原理，我也不知道如何解决。这是一个伪代码。下面是一个名为foo()的函数，该函数将被传递一个值并返回一个值。如果将以下值传递给foo函数，
VBS教程：函数-CStr 函数
CStr 函数返回表达式，该表达式已被转换为 String 子类型的 Variant。 CStr(expression) expression 参数是任意有效的表达式。说明通常，可以
VBS教程：函数-CSng 函数
CSng 函数返回表达式，该表达式已被转换为 Single 子类型的 Variant。 CSng(expression) expression 参数是任意有效的表达式。说明通常，可
VBS教程：函数-CreateObject 函数
CreateObject 函数创建并返回对 Automation 对象的引用。 CreateObject(servername.typename [, location]) 参数 serv
VBS教程：函数-Cos 函数
Cos 函数返回某个角的余弦值。 Cos(number) number 参数可以是任何将某个角表示为弧度的有效数值表达式。说明 Cos 函数取某个角并返回直角三角形两边的比值。此比值是
VBS教程：函数-CLng 函数
CLng 函数返回表达式，此表达式已被转换为 Long 子类型的 Variant。 CLng(expression) expression 参数是任意有效的表达式。说明通常，您可以使
VBS教程：函数-CInt 函数
CInt 函数返回表达式，此表达式已被转换为 Integer 子类型的 Variant。 CInt(expression) expression 参数是任意有效的表达式。说明通常，可
VBS教程：函数-Chr 函数
Chr 函数返回与指定的 ANSI 字符代码相对应的字符。 Chr(charcode) charcode 参数是可以标识字符的数字。说明从 0 到 31 的数字表示标准的不可打印的
VBS教程：函数-CDbl 函数
CDbl 函数返回表达式，此表达式已被转换为 Double 子类型的 Variant。 CDbl(expression) expression 参数是任意有效的表达式。说明通常，您可
VBS教程：函数-CDate 函数
CDate 函数返回表达式，此表达式已被转换为 Date 子类型的 Variant。 CDate(date) date 参数是任意有效的日期表达式。说明 IsDate 函数用于判断 d
VBS教程：函数-CCur 函数
CCur 函数返回表达式，此表达式已被转换为 Currency 子类型的 Variant。 CCur(expression) expression 参数是任意有效的表达式。说明通常，
VBS教程：函数-CByte 函数
CByte 函数返回表达式，此表达式已被转换为 Byte 子类型的 Variant。 CByte(expression) expression 参数是任意有效的表达式。说明通常，可以
VBS教程：函数-CBool 函数
CBool 函数返回表达式，此表达式已转换为 Boolean 子类型的 Variant。 CBool(expression) expression 是任意有效的表达式。说明如果 ex
VBS教程：函数-Atn 函数
Atn 函数返回数值的反正切值。 Atn(number) number 参数可以是任意有效的数值表达式。说明 Atn 函数计算直角三角形两个边的比值 (number) 并返回对应角的弧
VBS教程：函数-Asc 函数
Asc 函数返回与字符串的第一个字母对应的 ANSI 字符代码。 Asc(string) string 参数是任意有效的字符串表达式。如果 string 参数未包含字符，则将发生运行时错误。
VBS教程：函数-Array 函数
Array 函数返回包含数组的 Variant。 Array(arglist) arglist 参数是赋给包含在 Variant 中的数组元素的值的列表（用逗号分隔）。如果没有指定此参数，则
VBS教程：函数-Abs 函数
Abs 函数返回数字的绝对值。 Abs(number) number 参数可以是任意有效的数值表达式。如果 number 包含 Null，则返回 Null；如果是未初始化变量，则返回 0。
VBS教程：函数-FormatPercent 函数
FormatPercent 函数返回表达式，此表达式已被格式化为尾随有 % 符号的百分比（乘以 100 ）。 FormatPercent(expression[,NumDigitsAfterD
VBS教程：函数-FormatNumber 函数
FormatNumber 函数返回表达式，此表达式已被格式化为数值。 FormatNumber( expression [,NumDigitsAfterDecimal [,Inc

首页

博学

6Ren·AI

商城

python - 使用 shell 脚本在 python 中收集函数的日志