TensorFlow Debugger (tfdb) and emacs
TensorFlow Debugger V2
- Official documentation
- https://www.tensorflow.org/tensorboard/debugger%5Fv2
The tfdbg
CLI is now available for TensorFlow 2.0.
It’s no longer a TUI/curses-like interface, but rather a CLI which you connect to TensorBoard, which serves as its GUI.
How to use
This shell script is an example of how to interact with the debugger.
The above script shows the usage of the --dump_dir
flag.
You may then view in tensorboard. Below is an example.
|
|
|
|
CLI args
|
|
Demo of the tfdbg curses CLI: Locating the source of bad numerical values with TF v2.
This demo contains a classical example of a neural network for the mnist
dataset, but modifications are made so that problematic numerical values (infs
and nans) appear in nodes of the graph during training.
flags:
absl.app:
-?,--[no]help: show this help
(default: 'false')
--[no]helpfull: show full help
(default: 'false')
--[no]helpshort: show this help
(default: 'false')
--[no]helpxml: like --helpfull, but generates XML output
(default: 'false')
--[no]only_check_args: Set to true to validate args and exit.
(default: 'false')
--[no]pdb: Alias for --pdb_post_mortem.
(default: 'false')
--[no]pdb_post_mortem: Set to true to handle uncaught exceptions with PDB post
mortem.
(default: 'false')
--profile_file: Dump profile information to a file (for python -m pstats).
Implies --run_with_profiling.
--[no]run_with_pdb: Set to true for PDB debug mode
(default: 'false')
--[no]run_with_profiling: Set to true for profiling the script. Execution will
be slower, and the output format might change over time.
(default: 'false')
--[no]use_cprofile_for_profiling: Use cProfile instead of the profile module
for profiling. This has no effect unless --run_with_profiling is set.
(default: 'true')
absl.logging:
--[no]alsologtostderr: also log to stderr?
(default: 'false')
--log_dir: directory to write logfiles into
(default: '')
--logger_levels: Specify log level of loggers. The format is a CSV list of
`name:level`. Where `name` is the logger name used with
`logging.getLogger()`, and `level` is a level name (INFO, DEBUG, etc). e.g.
`myapp.foo:INFO,other.logger:DEBUG`
(default: '')
--[no]logtostderr: Should only log to stderr?
(default: 'false')
--[no]showprefixforinfo: If False, do not prepend prefix to info messages when
it's logged to stderr, --verbosity is set to INFO level, and python logging
is used.
(default: 'true')
--stderrthreshold: log messages at this level, or more severe, to stderr in
addition to the logfile. Possible values are 'debug', 'info', 'warning',
'error', and 'fatal'. Obsoletes --alsologtostderr. Using --alsologtostderr
cancels the effect of this flag. Please also note that this flag is subject
to --verbosity and requires logfile not be stderr.
(default: 'fatal')
-v,--verbosity: Logging verbosity level. Messages logged at this level or
lower will be included. Set to 1 for debug logging. If the flag was not set
or supplied, the value will be changed from the default of -1 (warning) to 0
(info) after flags are parsed.
(default: '-1')
(an integer)
absl.testing.absltest:
--test_random_seed: Random seed for testing. Some test frameworks may change
the default value of this flag between runs, so it is not appropriate for
seeding probabilistic tests.
(default: '301')
(an integer)
--test_randomize_ordering_seed: If positive, use this as a seed to randomize
the execution order for test cases. If "random", pick a random seed to use.
If 0 or not set, do not randomize test case execution order. This flag also
overrides the TEST_RANDOMIZE_ORDERING_SEED environment variable.
(default: '')
--test_srcdir: Root of directory tree where source files live
(default: '')
--test_tmpdir: Directory for temporary testing files
(default: '/tmp/absl_testing')
--xml_output_file: File to store XML test results
(default: '')
tensorflow.python.ops.parallel_for.pfor:
--[no]op_conversion_fallback_to_while_loop: DEPRECATED: Flag is ignored.
(default: 'true')
tensorflow.python.tpu.client.client:
--[no]runtime_oom_exit: Exit the script when the TPU runtime is OOM.
(default: 'true')
absl.flags:
--flagfile: Insert flag definitions from the given file into the command line.
(default: '')
--undefok: comma-separated list of flag names that it is okay to specify on
the command line even if the program does not define a flag with that name.
IMPORTANT: flags in this list that have arguments MUST use the --flag=value
format.
(default: '')
TensorFlow Debugger V1
- Relevant source material
- TensorFlow Debugger Screencast - YouTube
DebugTFBasics
https://github.com/tensorflow/tensorflow/issues/29679
V1
of tfdbg
had a TUI (terminal user interface).
Wrapping TensorFlow Sessions With tfdbg
Add the following lines of code to use tfdbg and then contain the Session object using a debugger wrapper.
|
|
- CLI should be called before and after
Session.run()
if you wish to take control of the execution and know the internal state of the graph. - Filters can be added for assisting the diagnosis.
In the provided example, there is already a filter calledtfdbg.has_inf_or_nan
, which determine the presence ofnan
orinf
in any in-between tensors, which are neither inputs nor outputs. - You may write your own custom filters.
Debugging TensorFlow Model Training with tfdbg
This only works for TensorFlow 1 at the moment.
Typical use-cases
- finding NaNs
- finding Infs (infinities)
|
|
How to run tfdbg
.
|
|
As you pointed out, tfdbg
is originally
designed for TF v1.x and is centered around
the tf.Session
API.
As tf.Session
is replaced by eager execution and
tf.functions in 2.0, the debugger feature
needs to be adapted to the architectural
change.
If you have a specific use case (e.g., finding Infinities and NaNs, or other debugging workflows), please let us know.
It’ll inform the design of the new version of tfdbg.
Automating tfdbg
with emacs
I had planned on building a mode for emacs that
includes syntax highlighting and keyboard
macros to improve the tfdb
experience but
V2
now uses TensorBoard as it’s user
interface, so there is no longer a very good
need for this.
If this article appears incomplete, it may be intentional. Try prompting for a continuation.