hansken.trace — Interact with traces / search results

Todo

  • Introduction to goal of module, default use

  • Attribute access through trace model (trace.file.path, trace.data.raw.sha1)

  • Binary data (previews) is Base64-encoded, see b64decode

  • Trace implementation created at runtime from data model, not intended for user instantiation

Note

Adding information like a note or a tag to a Trace allows the caller to force a refresh. Refreshing a project index is a potentially expensive operation, use this only if the added content is needed immediately. Trace itself defines no programmatic way to remove tags, ProjectContext does:

for tag in trace.tags:
    # context can be retrieved from trace, if need be
    context.delete_tag(trace.uid, tag)
# note that this loop will only delete the tags in the backend, trace
# is left untouched here
image_from_uid(trace_uid)[source]

Splits trace_uid into its two parts, image and id, returning the first. Note that a Trace object will provide these as properties image_id and id.

Parameters:

trace_uid (str) – an image uid

Returns:

the image UUID from trace_uid

Return type:

str

image_from_trace(trace)[source]

Attempts to get an image id from trace, whether trace is a Trace object or dict-like.

Parameters:

trace – a Trace or dict-like trace

Returns:

the image UUID from trace

Return type:

str

class Trace[source]

Bases: AbstractTrace

Base class for traces. Defines convenience methods to navigate or manipulate a trace. Trace data may be accessed using open.

ID_SEP = '-'
property context

The ProjectContext instance that created this Trace.

property image_name

The name / description of this Trace’s image_id, or None.

property parent

This Trace’ parent Trace, or None if not applicable.

note(note, refresh=None)[source]

Add a note to this Trace.

Parameters:
  • note (str) – the note itself

  • refresh – if True, force a full project refresh, making this note immediately searchable

property notes

The notes attached to this Trace. Note that this does not include notes added by note.

tag(tag, refresh=None)[source]

Tag this trace.

Parameters:
  • tag (str) – the tag to set

  • refresh – if True, force a full project refresh, making this note immediately searchable

property tags

The tags attached to this Trace. Note that this does not include tags added by tag.

property privileged

The privileged state of this trace, either None or one of Privileged. Note that None is not a valid value when setting the privileged attribute, an operation that requires authorization.

property creator

The tool that created this Trace, or None if unknown. Includes the version of that tool, e.g.: toolname 1.2.3.

Note

This value is formatted by hansken.py, it is not suitable for use with queries (like finding other traces created by the same tool).

property tool_versions

The tools and versions that are responsible for this Trace’s metadata, as a dict mapping the names of tools to their respective versions. Tool versions typically include the versions of critical software libraries used by those tools.

property audits

An audit log of user-initiated changes to this Trace in the form of a sequence of dict`s, ordered by the audit's creation timestamp. The audit log can be empty, but never `None.

tracelets(tracelet_type, query=None, sort=None)[source]

Provides or retrieves tracelets of type type.

The exact return type of a call to tracelets depends on the tracelet type being requested. If the remote defines type to be ‘few’, the result will be a list of Tracelet objects. If the remote defines type to be ‘many’, the result will be a SearchResult of Tracelet objects. Note that query can only be used with the latter.

Parameters:
  • tracelet_type – the tracelet type to request

  • query – query to match tracelets to

  • sort – ordering of tracelets

Returns:

a sized iterable of Tracelet s (iterable once)

property children

A SearchResult instance containing the child traces of this Trace, if any.

property data_types

A set of data type names available for this Trace. These names can be used with calls to open or attribute access like

if 'raw' in trace.data_types:
    # trace has a raw data stream, attribute access to data.raw.size will be safe
    print('raw data size:', trace.data.raw.size)

for data_type in trace.data_types:
    # format a file name as the trace's name, using the data type name as the extension
    # (e.g. "some-file.raw" or "another-file.text")
    out_file = '{}.{}'.format(trace.name, data_type)
    print('writing first 64 bytes to', out_file)
    with open(out_file, 'wb') as out_file:
        # out_file now opened for writing in binary mode
        # write the first 64 bytes of trace's stream of type data_type to the file
        out_file.write(trace.open(data_type, size=64).read())
Returns:

data type names available for this Trace (possibly empty, but never None)

Return type:

set

open(stream='raw', offset=0, size=None, key=<auto-fetch>)[source]

Open a data stream of a named stream (default raw) for this Trace.

Note

Multiple calls to read(num_bytes) on the stream resulting from this call works fine in Python 3, but will fail in Python 2.

Parameters:
  • stream – stream to read

  • offset – byte offset to start the stream on

  • size – the number of bytes to make available

  • key – key for the image of this trace (default is to fetch the key automatically, if it’s available)

Returns:

a file-like object to read bytes from the named stream

Return type:

io.BufferedReader

descriptor(stream='raw', key=<auto-fetch>)[source]

Retrieve the data descriptor for a named stream (default raw) for this Trace.

Parameters:
  • stream – stream to get the descriptor for

  • key – key for the image of this trace (default is to fetch the key automatically, if it’s available)

Returns:

the stream’s data descriptor (as-is)

property preview_types

A set of preview type names (mime types) available for this Trace. These names can be used with calls to preview.

Returns:

preview type names available for this Trace (possibly empty, but never None)

Return type:

set

preview(mime_type)[source]

Gets a preview of a particular mime type, e.g. ‘text/plain’ or ‘image/png’.

Parameters:

mime_type – the preview type to get

Returns:

bytes or None

snippets(query, num=100, before=200, after=200)[source]

Generate snippets surrounding term hits from query in any of the data streams of this trace.

Parameters:
  • query – the query to generate snippets for (should contain term queries, or no snippets will be generated)

  • num – maximum number of snippets to return

  • before – number of bytes to include before the term hits

  • after – number of bytes to include after the term hits

Returns:

list of Snippet instances

update(key_or_updates=None, value=None, data=None, overwrite=False)[source]

Requests the remote to update or add metadata properties for this Trace.

Note

Calls to update will not update the source of the Trace it’s being called on. To get a Trace instance including the changes made after a successful call to update, use trace.context.trace(trace.uid) to request a new instance of a trace with this Trace’s identifier.

Please note that, for performance reasons, all changes are buffered and not directly effective in subsequent search, update and import requests. As a consequence, successive changes to a single trace might be ignored. Instead, all changes to an individual trace should be bundled in a single update or import request. The project index is refreshed automatically (by default every 30 seconds), so changes will become visible eventually.

Parameters:
  • key_or_updates – either a str (the metadata property to be updated) or a mapping supplying both keys and values to be updated (or None if only data is supplied)

  • value – the value to update metadata property key to (used only when key_or_updates is a str)

  • data – a dict mapping data type / stream name to bytes to be imported

  • overwrite – whether properties to be imported should be overwritten if already present

Returns:

processing information from remote

child_builder(name=None)[source]

Create a TraceBuilder to build a trace to be saved as a child of this Trace. Note that name is a mandatory property for a trace, even though it is optional here. A name can be added later using the TraceBuilder.update method. Furthermore, a new trace will only be added to the index once explicitly saved (e.g. through TraceBuilder.build).

Parameters:

name – the name for the trace being built

Returns:

a TraceBuilder set up to create a child trace of this Trace

class Privileged[source]

Bases: Enum

Possible privileged states of a Trace. Values that correspond to ‘not privileged’ (None or rejected) are falsy, making them suitable to check whether a trace is privileged.

suspected = 'suspected'

trace is suspected of being privileged

confirmed = 'confirmed'

trace is confirmed to be privileged

rejected = 'rejected'

trace is confirmed to be not privileged

class TraceModel[source]

Bases: DictView

Utility to deal with intricacies surrounding the trace / data model used by Hansken. Used by hansken.py to translate and validate user-specified metadata properties to their corresponding place in the data structure for a trace in Hansken.

property intrinsics

The intrinsic properties (properties that any trace can have, regardless of its type(s)) defined by the trace model.

is_intrinsic(steps)[source]

Checks whether the property defined by steps is an intrinsic property.

Parameters:

steps – steps through a Trace’ data structure

Returns:

whether the property defined by steps is an intrinsic property

property origins

The origins defined by the trace model, typically system and user.

property categories

The categories of types and properties defined by the trace model, e.g. extracted or annotated.

property types

The trace types defined by the trace model, e.g. file or classification.

property data_types

Data named data types defined by the trace model for the “data” trace type.

expand(name)[source]

Expands a trace property to ‘steps’ through a nested data structure.

Inserts a properties category if unspecified, does not include an origin.

Parameters:

name – the property name to expand, excluding an origin

Returns:

a tuple of ‘steps’

Raises:

ValueError – when a provided name is not defined by the trace model or is missing required parts

get_serializer(steps)[source]
class TraceBuilder[source]

Bases: DictView

Utility class to aid in creating user-defined traces or updating existing ones. A TraceBuilder is a trace model aware view on a nested mapping, using the trace model to both validate requested updates and finding the correct spot for values in the nested mapping.

This class is not intended for direct user instantiation, see

update(key_or_updates, value=None)[source]

Add or overwrite new metadata properties to this builder.

key_or_updates can mix dotted properties and nested structures, all keys and values are merged before applying updates. A TraceModel is used to find the proper fully qualified property names if needed, allowing both e.g. update('file.name', 'File Name') and update({'extracted': {'file': {'name': 'file name'}}}).

Parameters:
  • key_or_updates – either a str (the metadata property to be updated) or a mapping supplying both keys and values to be updated (or None if only data is supplied)

  • value – the value to update metadata property key to (used only when key_or_updates is a str)

Returns:

this TraceBuilder

add_data(stream, data)[source]

Add data to this trace as a named stream.

Parameters:
  • stream – name of the data stream to be added

  • data – data to be attached

Returns:

this TraceBuilder

property updates

A collection of updates tracked by this TraceBuilder.

property context

The ProjectContext instance that created this TraceBuilder.

property target

The combination of (project id, parent trace uid) this TraceBuilder applies to.

child_builder(name=None)[source]

Creates a new TraceBuilder to build a child trace to the trace to be represented by this builder.

Note

Parent TraceBuilder`s should be built using the `.build() call before their child builders as the unique trace identifier (uid) for the parent is needed to build a child trace.

Parameters:

name – name of the new child trace

Returns:

a TraceBuilder set up to save a new trace as the child trace of this builder

build()[source]

Save the trace being built by this builder to remote.

Note

If this TraceBuilder was put in debug mode, the trace is not sent to remote but is instead logged at warning level.

Returns:

the new trace’ uid (or None in debug mode)

class Snippet(source, separator='.')[source]

Bases: DictView

Snippet result, enabling rendering of a highlighted snippet of text content. Usable as a dictionary where key 'content' contains a snippet of text and key 'highlights' contains a list of dictionaries encoding highlighted terms in the content.

render(start='[[', end=']]')[source]

Render this snippet by surrounding highlights with start and end marker strings, e.g.:

>>> my_snippet.render()
'A [[snippet]] with the term "[[snippet]]" highlighted.'
>>> my_snippet.render(start='<em>', end='</em>')
'A <em>snippet</em> with the term "<em>snippet</em>" highlighted.'
Parameters:
  • start – start marker around highlights

  • end – end marker around highlights

Returns:

this Snippet, highlighted as a str