如何在AWS Glue脚本中导入引用的文件(XML)?

huangapple go评论58阅读模式
英文:

How to import referenced files (XML ) in AWS Glue script

问题

我正在尝试在FAIR调度模式下运行Glue作业。为此,我创建了一个名为fairschedular.xml的XML文件。

然后,我将这个fairschedular.xml文件添加到S3存储桶中,并将该位置添加到Glue作业的引用路径中,如下所示:

<?xml version="1.0"?>
<allocations>
  <pool name="1">
    <schedulingMode>FIFO</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
  </pool>
  <pool name="2">
    <schedulingMode>FIFO</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
  </pool>
</allocations>

然后,我在脚本中使用如下方式:

class JobBase(object):
    
    fair_scheduler_config_file = "fairscheduler.xml"
    rowAsDict = {}
    Oracle_Username = None
    Oracle_Password = None
    Oracle_jdbc_url = None

    def __start_spark_glue_context(self):
        conf = SparkConf().setAppName("python_thread").set('spark.scheduler.mode', 'FAIR').set("spark.scheduler.allocation.file", self.fair_scheduler_config_file)
        self.sc = SparkContext(conf=conf)
        self.glueContext = GlueContext(self.sc)
        self.spark = self.glueContext.spark_session

但是当代码运行时,我在Spark UI历史服务器中看不到公平调度池,但我看到了FAIR调度。

如何在AWS Glue脚本中导入引用的文件(XML)?

英文:

I am trying to run glue job in FAIR Scheduling mode . For this I created one xml file with name fairschedular.xml

Then I added this fairschedular.xml in s3 bucket and add that location in reference path of glue job as follows :

&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;allocations&gt;
 &lt;pool name=&quot;1&quot;&gt;
   &lt;schedulingMode&gt;FIFO&lt;/schedulingMode&gt;
   &lt;weight&gt;1&lt;/weight&gt;
   &lt;minShare&gt;2&lt;/minShare&gt;
 &lt;/pool&gt;
 &lt;pool name=&quot;2&quot;&gt;
   &lt;schedulingMode&gt;FIFO&lt;/schedulingMode&gt;
   &lt;weight&gt;1&lt;/weight&gt;
   &lt;minShare&gt;2&lt;/minShare&gt;
 &lt;/pool&gt;
&lt;/allocations&gt;

如何在AWS Glue脚本中导入引用的文件(XML)?

Then I used in script as follows :

class JobBase(object):
    
    fair_scheduler_config_file= &quot;fairscheduler.xml&quot;
    rowAsDict={}
    Oracle_Username=None
    Oracle_Password=None
    Oracle_jdbc_url=None

    def __start_spark_glue_context(self):
        conf = SparkConf().setAppName(&quot;python_thread&quot;).set(&#39;spark.scheduler.mode&#39;, &#39;FAIR&#39;).set(&quot;spark.scheduler.allocation.file&quot;, self.fair_scheduler_config_file)
        self.sc = SparkContext(conf=conf)
        self.glueContext = GlueContext(self.sc)
        self.spark = self.glueContext.spark_session
        

But when code is running I don't see fair schedule pools in spark ui history server . I do see FAIR scheduling.

如何在AWS Glue脚本中导入引用的文件(XML)?

答案1

得分: 0

问题已解决。我可以在AWS日志中看到池正在生成。

英文:

Issues is resolved . I can see in AWS logs pool are getting generated.

huangapple
  • 本文由 发表于 2023年2月24日 00:53:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75547908.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定