Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] Problem with LSF monitoring

Dear Greg,

this sounds like LML_DA is not able to connect the nodelist parsed for
the jobs with the node names. To validate this, you can compare the data
gathered for jobs with the data gathered for nodes. Therefore, follow
the instructions explained in the question "How do I debug the server
part of PTP's system monitoring capability? " from the page
http://wiki.eclipse.org/PTP/System_Monitoring_FAQ.

Have a look at the generated files nodes_LML.xml and jobs_LML.xml. In
the nodes_LML.xml file you should find nodelist attributes for the
running jobs such as

<data key="nodelist"       value="R0301-M1"/>

The nodelist contains a comma-separated list of real node names. They
should match with the name attributes of the node objects in the
nodes_LML.xml. E.g.

<object id="nd000037" name="R0301-M1" type="node"/>

For further debugging, it is probably helpful to open a bug on this
problem. I would need to have a look at the XML files from your
.eclipsesettings/tmp_ directory. If you would like to debug LML_DA
yourself you can try to execute the LML_DA workflow for your system step
by step. The .eclipsesettings/tmp_.../workflow.xml file contains all
commands executed by LML_DA to generate the final LML file sent to PTP.
It contains several steps, which depend on each other. Each step has
multiple cmd-elements, which contain the actual commands executed in
each step.

Your error seems to happen in the LML2LML module. It parses the raw LML
file and generates an LML file with derived graphical components, which
are easier to parse and visualize by the PTP client. I could probably
tell you more about this problem with the mentioned debug data.

Best regards,

Carsten


On 08/14/14 16:56, Greg Watson wrote:
I’m having a problem when monitoring a system that uses LSF. Jobs are
showing up correctly in the Active/Inactive lists, however I’m not
seeing any activity in the system monitoring view. Looking at the log
files, it seems like there is an issue with the node names not being
interpreted correctly. Would someone (Carsten?) be able to remind me
what I need to do to get this working?

Thanks,
Greg


LML_da.errlog:

LML Data Access Workflow Manager 1.0, starting at (Tue Aug 12 21:35:58
IST 2014)
LML_file_obj: read  XML in 0.0006 sec
LML_file_obj: parse XML in 0.0170 sec
LML_file_obj: read  XML in 0.0001 sec
LML_file_obj: parse XML in 0.0016 sec
Use of uninitialized value $nodenum in numeric lt (<) at
/home/ibm/.eclipsesettings/LML2LML//LML_gen_nodedisplay_insert_job.pm
line 344.
Use of uninitialized value $nodenum in numeric gt (>) at
/home/ibm/.eclipsesettings/LML2LML//LML_gen_nodedisplay_insert_job.pm
line 345.
Use of uninitialized value $nodenum in array element at
/home/ibm/.eclipsesettings/LML2LML//LML_gen_nodedisplay_insert_job.pm
line 351.
Use of uninitialized value $nodenum in array element at
/home/ibm/.eclipsesettings/LML2LML//LML_gen_nodedisplay_insert_job.pm
line 351.
Use of uninitialized value $nodenum in array element at
/home/ibm/.eclipsesettings/LML2LML//LML_gen_nodedisplay_insert_job.pm
line 355.
…

LML_da.log:

execute_step: input file for step not found
./tmp_iitmlogin3_24610/datastep___init__.xml ...
execute_step: --> generating empty
./tmp_iitmlogin3_24610/datastep___init__.xml ...
"Specified Hosts"                        => "",
execute_step: output file not generated by step, renaming input file to
./tmp_iitmlogin3_24610/datastep_getdata.xml ...
reading file: ./tmp_iitmlogin3_24610/sysinfo_LML.xml  ...
LML_file_obj: read  XML in 0.0001 sec
LML_file_obj: parse XML in 0.0006 sec
reading file: ./tmp_iitmlogin3_24610/jobs_LML.xml  ...
LML_file_obj: read  XML in 0.0003 sec
LML_file_obj: parse XML in 0.0088 sec
reading file: ./tmp_iitmlogin3_24610/nodes_LML.xml  ...
LML_file_obj: read  XML in 0.0001 sec
LML_file_obj: parse XML in 0.0056 sec
scan system: type is Cluster
system_type=Cluster
check_jobs: WARNING: unset attribute 'detailedstatus' 6 occurrences
objects: total #42
         |--          6 (job)
         |--         35 (node)
         |--          1 (system)
/home/ibm/.eclipsesettings/LML_color/LML_color_obj.pl
reading file: ./tmp_iitmlogin3_24610/datastep_addcolor.xml  ...
objects: total #42
         |--          6 (job)
         |--         35 (node)
         |--          1 (system)
reading file: ./tmp_iitmlogin3_24610/layout.xml  ...
objects: total #0
tablelayout: total #2
         |--        1x15 (tl_WAIT)
         |--        1x14 (tl_Run)
nodedisplaylayout: total #1
scan system: type is Cluster
LML_gen_table::process: gid=org.eclipse.ptp.rm.lml.ui.InactiveJobsView
contenttype=jobs objtype_pattern=job
Table Layout: tl_WAIT processed (0 objects found)
Table Layout: objects           of tl_WAIT copied (0 new objects)
Table Layout: info objects      of tl_WAIT copied (0 new objects)
Table Layout: info data objects of tl_WAIT copied (0 new objects)
LML_gen_table::process: gid=org.eclipse.ptp.rm.lml.ui.ActiveJobsView
contenttype=jobs objtype_pattern=job
Table Layout: tl_Run processed (6 objects found)
Table Layout: objects           of tl_Run copied (6 new objects)
Table Layout: info objects      of tl_Run copied (6 new objects)
Table Layout: info data objects of tl_Run copied (6 new objects)
_get_system_type: type is 'Cluster'
_get_system_type: name is 'iitmlogin3'
_get_system_size_cluster: found    2 nodes of size: 1
_get_system_size_cluster: found   29 nodes of size: 1152
_get_system_size_cluster: found    1 nodes of size: 1169
_get_system_size_cluster: found    3 nodes of size: 1184
_get_system_size_cluster: Cluster found of size: 35
LML_gen_nodedisplay::process: gid=nd_1
get_numbers_from_name: not found >iitmc11n01-ib0-c00<
get_numbers_from_name: not found >iitmc11n01-ib0-c01<
get_numbers_from_name: not found >iitmc11n01-ib0-c02<
...


_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/ptp-user




------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------



Back to the top