Recipes
Recipes are convenience modules to perform tasks that are use cases for hansken.py
.
hansken.recipes.export
— Export data and metadata
- to_csv(traces, output, fields, to_dict=<function get_fields>, delimiter='\t', lineterminator='\n', encoding='utf-8', **fmtparams)[source]
Writes values for fields from each trace in traces to output. Field names can be supplied as property names (e.g.
'file.createdOn'
) or astype_fields
instances that automatically expand to properties defined for the specified types. Data can be included by usingdata_stream
instances.Note
Using
type_fields
instances requires that the traces argument carries the applicable trace model as attributemodel
. Usingdata_stream
instances requires that theTrace
objects in the traces argument carry aProjectContext
object. This is the case forSearchResult
instances obtained from calls likeProjectContext.search
:# obtain search results as normally results = context.search('query') # export the results to a local CSV file to_csv(results, 'path/to/export.csv', # explicit fields and automatically expanded fields can be mixed fields=['uid', 'name', # include all model-defined metadata fields for email traces type_fields('email'), # include the first kilobyte of data from the plain data stream (converted to text) data_stream('plain', max_size=1024)])
The exported file with have the explicitly provided fields like
uid
andname
, but also fields from property names generated from the trace model retrieved from theProjectContext
, likeemail.subject
andemail.to
. Thedata_stream
instance will cause adata.plain
field.- Parameters:
traces – collection of traces
output – name of the file to write to
fields – fields to retrieve values for, a sequence of property names (
str
) ortype_fields
instances, used to generate field and property names from a trace typeto_dict – callable to create a dict
{field: value}
; passed kwargs: trace, fieldsdelimiter – field delimiter in the output
lineterminator – line terminator in the output
encoding – text encoding for the output
fmtparams – additional format parameters, see module csv in the standard lib
- class type_fields(*type_names)[source]
Bases:
object
An object to request that all fields of the requested types should be used as field names when exporting to CSV format. Multiple type names can be provided:
to_csv(..., fields=type_fields('file', 'link')) # supplying two types at once is equivalent to supplying two separate type_fields to_csv(..., fields=[type_fields('file'), type_fields('link')])
Both forms would result in headers like
file.createdOn
andlink.target
in the resulting export file.
- class data_stream(data_type, max_size=4096, fallback_encoding=None)[source]
Bases:
object
An object to request text data for traces to be exported. Like
type_fields
, this can be mixed with regular metadata fields:to_csv(..., fields=['data.raw.size', data_stream('raw', max_size=1024)]) to_csv(..., fields=data_stream('text', max_size=None))
A fallback text encoding can be provided for data stream exports that have no explicit or known text encoding:
to_csv(..., fields=data_stream('plain', fallback_encoding='ascii'))
Encoding errors will result in replacement characters — ?’s — in the output CSV. The fallback encoding is unset by default, resulting in no data in the output CSV.
For custom functionality (like including binary data encoded as hex or base64), this class can be extended.
to_text
is called by the exporter (get_fields
whento_csv
is used) to get astr
from aTrace
being exported.Note
The maximum number of bytes that are read from the stream defaults to 4KiB. While it’s possible to include an entire data stream in the export, please note that data streams can grow quite large; use this with caution.
Note
As exporting data requires an additional HTTP request for each data stream of each
Trace
being exported, including data in an export slows down the export considerably.- to_text(trace, default=None)[source]
Get a
str
from aTrace
. Uses thisdata_stream
’sdata_type
andmax_size
to retrieve the requested data and turns it into astr
if possible.- Parameters:
trace – the
Trace
being exporteddefault – the value to be used when retrieving the data or turning it into a
str
fails
- Returns:
a
str
representation of (a part of) the requested data stream
- get_fields(trace, fields, prefix='system.extracted.', use_fallback=True, default=None)[source]
Retrieves values for fields from source by calling source.get(prefix + field) for each field.
- Parameters:
trace – collection of mapped values
fields – fields to retrieve a value for
prefix – prefix used with get
use_fallback – whether to try getting a field without prefix when a value for the full field name is not available
default – value to use when no value was mapped
- Returns:
a dictionary with all of the requested fields and their value in the source of trace, or
None
if trace has no value for the field- Return type:
dict
- bulk(traces, dest, split=1000, stream='raw', fname=<function safe_name>, write=<function to_file>, on_error=None, side_effect=None, jobs=16)[source]
Performs a bulk export of traces to dest.
Note
bulk
is internally parallellized by default, requiring that the argument to write is thread-safe. safe_name, on_error and side_effect are all called from the calling thread after the export of a particular trace in traces was processed.As on_error is not called from the
except
clause that catches theException
instance, logging the exception with its traceback requires special care to pass theexc_info
keyword to eitherlogging
orlogbook
. Leaving on_error asNone
will raise aValueError
on the thread callingbulk
with the error that is processed first as its cause.This also means that the order of traces with which write, on_error and side_effect are called need not be the same order as that of traces. To turn this parallellism off, pass
jobs=False
.- Parameters:
traces – collection of traces to export
dest – path to export traces to
split – max number of files per directory (when set to
None
, all files will be saved to the same directory)stream – stream name to read from the traces, optionally supplied as a
callable
returning the stream name (trace will be omitted from export if the return value is falsy); passed kwargs: tracefname –
callable
to generate a file name for a trace; passed kwargs: trace, num, split, stream (defaults tosafe_name
, resulting in a file name that uses bothtrace.image_id
andtrace.id
, ensuring the name is unique within a project)write – thread-safe
callable
to write a trace to a file name; passed kwargs: trace, output, stream (defaults toto_file
)on_error –
callable
to report an error thrown duringwrite()
; passed kwargs: num, trace, stream, output, exceptionside_effect –
callable
to perform a side effect for each exported trace; passed kwargs: trace, stream, num, split, dest, folder, file, outputjobs – maximum number of data exports to run in parallel (an
int
), orFalse
to turn parallel processing of traces off
- Raises:
ValueError – on the first error result when on_error is not supplied (the error is set as the cause)
- to_file(trace, output, stream='raw', offset=0, size=None, key=<auto-fetch>, bufsize=1048576)[source]
Writes a data stream of a trace to a file.
- Parameters:
trace – trace to write
output – name of the file to write to
stream – named stream to read from the trace
offset – byte offset to start the stream on
size – the number of bytes to make available
key – key for the image of trace (default is to fetch the key automatically, if it’s available)
bufsize – buffer size to be used during the read/write loop
- safe_name(trace, num=None, split=1000, stream='raw', template='{trace.image_id}_{trace.id}_{stream}_{trace.name}')[source]
Generate a file name for a trace. Resulting file name can contain unicode characters, but no slashes, backslashes or line endings.
- Parameters:
trace – trace to name
num – the number of this trace within the set being exported
split – the max number of files in a directory
stream – the named stream to be exported
template – format string used as the file name, slashes and newlines are replaced by underscores in the result; passed kwargs: trace, num, split, stream
- Returns:
generated file name
hansken.recipes.report
— Generate reports from Hansken
The report recipe is split into two parts: a set of templates with macros and a number of utility functions to render templates into content or write them to files. Templates mentioned here use the Jinja2 templating language and accompanying Python modules. A basic template in Jinja2 looks something like this:
{% extends 'hansken/base.html' %}
{% block extra_styles %}
<style type="text/css">
p.special {
color: red;
}
</style>
{% endblock %}
{% block content %}
<p class="special">
Lorum ipsum dolor sit amet. <br />
{{ template_variables }} are included as such. <br />
The result of macros can be included easily: {{ hansken.some_macro(argument) }}. <br />
</p>
{% endblock %}
{% block postamble %}
So, in conclusion, it turns out that this templating stuff isn't hard.
{% endblock %}
The example above extends a ‘base template’ provided with hansken.py
(covered in more detail below),
which contains an HTML document skeleton and provides a number of ‘blocks’ to be filled by extensions of the template.
The list of named blocks the skeleton defines (like extra_styles
and content
) and their roles are listed below.
Printing out template variables (also called arguments), is done by surrounding them with double curly brackets.
Likewise, calling macros like functions is done inside double curly brackets.
Jinja2 can do a lot more than the simple example above.
For additional information, see the Jinja2 documentation.
In particular, the “Template Designer Documentation” section covers the templating side of Jinja2.
Utility functions that create a PDF version of a report use HTML content to render the PDF document with WeasyPrint. See the WeasyPrint documentation for notes on supported features, caveats and additional information on the use of particular parameters.
Note
Template definitions below are presented as classes and methods (this will likely change in the future).
- hansken/base.html
Base template for reports generated by the report recipe.
hansken/base.html
provides an HTML document skeleton with a basic style sheet and defines named blocks available for overrides in extensions. See the “Template Inheritance” section in the Jinja2 docs for a more detailed explanation on how this is used. As this template is intended as a base for other templates, it imports thehansken/macros.html
template by default, exposing its macros on a namespacehansken
. This allows any extension ofhansken/base.html
to call the macros defined inhansken/macros.html
ashansken.macro_name()
.hansken/base.html
defines the following blocks:- title
A block inside the
<title>
element, providing a document title. Defaults to{{ title }}
, allowing a document title to be provided as a template variable as well.- extra_styles
An empty block inside the
<head>
element, intended for user-provided<style>
elements. This block is itself located in a block namedstyles
, which contains a provided style sheet. Override thestyles
block to discard the style sheet provided byhansken.py
.- extra_scripts
An empty block inside the
<head>
element, intended for user-provided<script>
elements. This block is itself located in a block namedscripts
.- preamble
An empty block intended as a place to put introductory content. Defaults to
{{ preamble }}
, allowing a preamble to be provided as a template variable as well. Note that- content
An empty block intended as the main content of a document. Defaults to
{{ content }}
, allowing content to be provided as a template variable as well.- postamble
An empty block intended as a place to put concluding content. Defaults to
{{ postamble }}
, allowing a postamble to be provided as a template variable as well.
- hansken/macros.html
A template containing only macros.
hansken/base.html
imports these macros by default as thehansken
namespace.- traces_table(traces, fields)
- Parameters:
traces – a collection of
Trace
objects, typically aSearchResult
fields – sequence of table columns filled with values from the
Trace
objects passed to traces
- hansken/table.html
A convenience template extending from
hansken/base.html
, overriding thecontent
block with a call to thetrace_table
macro. Parameters to this template are identical to thetrace_table
macro.
- default_environment
Default
jinja2.Environment
used by therender_*
functions in this recipe. This environment is able to load the templates listed above ashansken/template.html
.
- template_path
Path to the template directory bundled with
hansken.py
.
- environment_with(searchpath=None, loader=None, **kwargs)[source]
Create an
Environment
loaded withhansken.py
‘s provided templates, while adding the provided search path or loader as a template source that precedes the provided templates. The resultingEnvironment
is set to auto-escaping unless explicitly set toFalse
in kwargs.- Parameters:
searchpath –
str
or sequencestr
of paths containing templatesloader – a Jinja2 template loader to use in conjunction with
hansken.py
‘sdefault_loader
kwargs –
arguments passed to
Environment
, see the Jinja2 documentation
- Returns:
an
Environment
, loading templates from both searchpath andtemplate_path
- render_template(template_name, environment=<hansken.py default environment>, **kwargs)[source]
Render a named template.
- Parameters:
template_name – the name of the template to be rendered (e.g.
'hansken/table.html'
)environment – the
Environment
to be used, defaults to the template environment defined byhansken.py
kwargs – named arguments to pass to the template
- Returns:
a
str
containing the rendered template
- render_string(string, environment=<hansken.py default environment>, **kwargs)[source]
Render an anonymous template, provided as a
str
.- Parameters:
string – the template content to be rendered
environment – the
Environment
to be used, defaults to the template environment defined byhansken.py
kwargs – named arguments to pass to the template
- Returns:
a
str
containing the rendered template
- to_html_table(output, traces, fields)[source]
Render traces into an HTML table and save to output.
- Parameters:
output – name of the file to write to
traces – collection of
Trace
objects to render (typically aSearchResult
)fields – fields to retrieve values for, a sequence of property names (
str
)
- to_pdf(output, content, base_url='.', **pdf_options)[source]
Save HTML content as PDF to output.
- Parameters:
output – name of the file to write to
content – HTML content to write to PDF
base_url – base url for resolving linked resources in the template (typically only useful when providing custom style sheets, see WeasyPrint documentation)
pdf_options – keyword arguments passed verbatim to
HTML.write_pdf
(see WeasyPrint documentation)
- to_pdf_table(output, traces, fields, base_url='.', **pdf_options)[source]
Render traces into a table and save as PDF to output.
- Parameters:
output – name of the file to write to
traces – collection of
Trace
objects to render (typically aSearchResult
)fields – fields to retrieve values for, a sequence of property names (
str
)base_url – base url for resolving linked resources in the template (typically only useful when providing custom style sheets, see WeasyPrint documentation)
pdf_options – keyword arguments passed verbatim to
HTML.write_pdf
(see WeasyPrint documentation)