hansken.remote — Communication with Hansken

Todo

  • Introduction to goal of module, default use

  • Mention error handling, http errors are propagated

class ProjectContext[source]

Bases: object

Utility class adding a particular project id to each rest call that requires one. Provided with a url to a Hansken gatekeeper and the id of a project, methods of this class will make REST requests to the gatekeeper, wrapping results with a Trace class or iterator where applicable.

ProjectContext instances are usable as context managers, which opens the context for use. Calls to methods requiring an initialized connection will automatically open the context if it wasn’t yet opened.

__init__(base_url_or_connection, project_id, keystore_url=None, preference_url=None, auth=None, auto_activate=True, connection_pool_size=None, verify=True)[source]

Creates a new context object facilitating communication to a Hansken gatekeeper. Uses a Connection to track session state. The provided project id is used for all calls. Can be used with an existing Connection instance

Parameters:
  • base_url_or_connection – HTTP endpoint to a Hansken gatekeeper (e.g. https://hansken.nl/gatekeeper) or a Connection instance

  • project_id – project id to associate with

  • keystore_url – HTTP endpoint to a Hansken keystore (e.g. https://hansken.nl/keystore)

  • preference_url – HTTP endpoint to a Hansken preference service

  • authHanskenAuthBase instance to handle authentication, or None

  • auto_activate – whether the project should automatically be activated if it is currently deactivated

  • connection_pool_size – maximum size of HTTP(S) connection pool

  • verify

    how to check SSL/TLS certificates: - True: verify certificates (default); - False: do not verify certificates; - path to a certificate file: verify certificates against a

    specific certificate bundle.

open()[source]
close()[source]
image(image_id)[source]
images()[source]
image_name(image_id)[source]

Retrieves the name (falling back to description if that isn’t available) of an image, identified by its id.

Note

Results of this call are cached inside the ProjectContext, making repeated calls to this method cheap. See Python’s documentation on functools.lru_cache. This cache is cleared when the ProjectContext is closed (this includes it being used as a context manager).

Parameters:

image_id – the id of the image to name

Returns:

image_id’s name

key(image_id)[source]

Retrieve the key for image identified by image_id.

Warning

hansken.py cannot distinguish between an image not needing a key and a user that is not authorized to retrieve data from an image. In both cases, the result of this method is None.

Note

Results of this call are cached inside the ProjectContext, making repeated calls to this method cheap. See Python’s documentation on functools.lru_cache. This cache is cleared when the ProjectContext is closed (this includes it being used as a context manager).

Parameters:

image_id – the id of the image to retrieve the key for

Returns:

image_id’s key, or None

roots()[source]
trace(trace_uid)[source]
descriptor(trace_uid, stream='raw', key=<auto-fetch>)[source]

Retrieve the data descriptor for a named stream (default raw) for a particular trace.

Parameters:
  • trace_uid – the trace to retrieve the data descriptor for

  • stream – stream to get the descriptor for

  • key – key for the image of this trace (default is to fetch the key automatically, if it’s available)

Returns:

the stream’s data descriptor (as-is)

data(trace_uid, stream='raw', offset=0, size=None, key=<auto-fetch>)[source]
snippets(*snippets, keys=<auto-fetch>, highlights=None)[source]

Retrieves textual snippets of data, optionally highlighting term hits within the resulting text.

Parameters:
  • snippets – snippet request dicts

  • keys – keys required for data access, either fetch to automatically attach the corresponding keys to each snippet, a dict mapping image ids to (binary) key data or None

  • highlights – collection of terms to be highlighted in all snippet results (highlights provided here are added to all requests in snippets)

Returns:

a list of snippet response dicts, index-matched to snippets

children(trace_uid, query=None, start=0, count=None, sort=None, facets=None, snippets=None, timeout=None)[source]

Searches the children of a trace. See search and SearchResult.

Parameters:
  • trace_uid – id of the trace retrieve children for

  • query – the query to submit

  • start – the start offset of the retrieved result

  • count – max number of traces to retrieve

  • sort – sort clause(s) of the form score-, some.field

  • facets – facet(s) to be used for search (str, Facet or sequence of either)

  • snippets – maximum number of snippets to return per trace

  • timeout – the maximum amount of time to wait for a search response, in seconds or as a datetime.timedelta (defaults to letting the remote decide), set to 0 to disable timeout

Returns:

a trace stream (iterable once)

note(trace_uid, note, refresh=None)[source]
tag(trace_uid, tag, refresh=None)[source]
delete_tag(trace_uid, tag, refresh=None)[source]
mark_privileged(trace_uid, status, refresh=None)[source]
child_builder(parent_uid)[source]

Create a TraceBuilder to build a trace to be saved as a child of the trace identified by parent_uid. Note that a new trace will only be added to the index once explicitly saved (e.g. through TraceBuilder.build).

Parameters:

parent_uid – a trace identifier to create a child builder for

Returns:

a TraceBuilder set up to save a new trace as the child trace of parent_uid

update_trace(trace, updates=None, data=None, key=<auto-fetch>, overwrite=False)[source]

Updates metadata of trace to reflect requested updates in the user origin. Does not edit trace in-place.

Please note that, for performance reasons, all changes are buffered and not directly effective in subsequent search, update and import requests. As a consequence, successive changes to a single trace might be ignored. Instead, all changes to an individual trace should be bundled in a single update or import request. The project index is refreshed automatically (by default every 30 seconds), so changes will become visible eventually.

Parameters:
  • traceTrace to be updated

  • updates – metadata properties to be added or updated, mapped to new values in a (possibly nested) dict

  • data – a dict mapping data type / stream name to bytes to be imported

  • key – the key data for image trace belongs to, must be fetch, None or binary (bytes)

  • overwrite – whether properties to be imported should be overwritten if already present

Returns:

processing information from remote

search(query=None, start=0, count=None, sort='uid', facets=None, snippets=None, select=<all>, incomplete_results=None, deduplicate_field=None, timeout=None)[source]

Performs a search request, wrapping the result in a sequence that allows iteration, indexing and slicing, automatically fetching batches of results when needed. The returned sequence keeps state and is not threadsafe.

Note

Supplying a value for count is required when using a start offset. The maximum size of such a batch is dictated by the Hansken REST API and will usually be 200. Requesting a batch larger than the maximum amount of traces will raise an error.

Additionally, neither start nor count influence the result of the len builtin on the result set, just the amount of traces iteration of the result will yield, see SearchResult.

param query:

the query to submit

param start:

the start offset of the retrieved result

param count:

max number of traces to retrieve

param sort:

sort clause(s) of the form score-, some.field, defaults to sorting on uid

param facets:

facet(s) to be used for search (str, Facet or sequence of either)

param snippets:

maximum number of snippets to return per trace

param select:

property selector(s), defaults to all properties

param incomplete_results:

whether to allow results from an incomplete index (defaults to letting the remote decide, which will typically not allow results from an incomplete index)

Parameters:

deduplicate_field

which single value field to deduplicate on

(defaults to None: the results are not deduplicated)

param timeout:

the maximum amount of time to wait for a search response, in seconds or as a datetime.timedelta (defaults to letting the remote decide), set to 0 to disable timeout

return:

a Trace stream (iterable once)

rtype:

SearchResult

queue_cleaning(query=None, clean_priority='low')[source]

Performs a search request built from the parameters, and puts the results on the cleaner queue with the given priority.

Parameters:
  • query – the query to submit

  • clean_priority – the priority to use for cleaning the traces

Returns:

True on success (failure will likely result in an HTTPError)

delete_cleaned_data(*, trace_uid=None, image_id=None, data_type=None)[source]

Deletes the cleaned data from the cache. This can be used to force a re-clean of the trace data, because the next time the cleaned data is requested for one of these traces the cleaner will run again.

This can be done in 4 different ways:

  • data_type: delete cleaned data for a specific data type belonging to a specific trace.

  • trace: delete all cleaned data from the datastore (cache) for a trace.

  • image: delete all the cleaned data from the cache for all traces in the given image.

  • project: delete all the cleaned data for all images of a project.

Parameters:
  • trace_uid – uid of the trace of which the cleaned data will be deleted

  • image_id – id of the image of which the cleaned data will be deleted

  • data_type – type of data of the trace to clean (raw, encrypted, ...)

search_tracelets(tracelet_type, query=None, start=0, count=None, sort='id', select=<all>, incomplete_results=None, timeout=None)[source]

Performs a search request for tracelets, wrapping the result in an iterator of DictView objects.

Parameters:
  • tracelet_type – the type of tracelet to search for

  • query – query to match tracelets

  • start – the start offset of the retrieved result

  • count – max number of tracelets to retrieve

  • sort – sort clause(s) of the form score-, some.field, defaults to sorting on id

  • select – property selector(s), defaults to all properties

  • incomplete_results – whether to allow results from an incomplete index (default to letting the remote decide, which will typically not allow results from an incomplete index)

  • timeout – the maximum amount of time to wait for a search response, in seconds or as a datetime.timedelta (defaults to letting the remote decide), set to 0 to disable timeout

Returns:

a tracelet stream containing DictView instances (iterable once)

Return type:

TraceletSearchResult

unique_values(select, query=None, after=None, count=None, incomplete_results=None, timeout=None)[source]

Retrieves unique values for the selected property or properties with their corresponding counts. The set of traces or tracelets from which these values are taken can be controlled with the query argument (though also take note of the particulars of the query argument below). To retrieve unique addressees for deleted email traces, the following could be used:

# define a query to select only traces that are marked as deleted
query = Term('type', 'deleted')
# retrieve unique values for the "email.to" property
for result in context.unique_values('email.to', query):
    print(result['count'], result['value'])

Note

Use trace:{query} to apply a the query to the trace itself, if the property in select is part of a tracelet (i.e. entity.value). To filter by traces of type email for example, add trace:{type:email}.

Parameters:
  • select – the property or properties to select values for

  • query – query to match traces or tracelets to take values from

  • after – starting point for the result stream

  • count – max number of values to retrieve

  • incomplete_results – whether to allow results from an incomplete index (default to letting the remote decide, which will typically not allow results from an incomplete index)

  • timeout – the maximum amount of time to wait for a search response, in seconds or as a datetime.timedelta (defaults to letting the remote decide), set to 0 to disable timeout

Returns:

a stream of values with counters (iterable once), sorted by the natural order of the values

Return type:

ValueSearchResult

suggest(text, query_property='text', count=100)[source]

Expand a search term from terms in the index.

Use key 'text' for a plain version of the expanded term, key 'highlighted' for a highlighted version:

# get the plain expansion of terms starting with "example"
texts = [suggestion['text'] for suggestion in context.suggest('example')]

# get highlighted expansions for file names starting with "example"
highlighted = [suggestion['highlighted']
               for suggestion in context.suggest('example'
                                                 property_query='file.name')]
# values will contain square brackets to show what was expanded, e.g.
# 'example[[.exe]]' or 'example[[file.dat]]'
Parameters:
  • text – the text / search term to be expanded

  • query_property – the search property to expand terms for (e.g. 'file.name' or 'text')

  • count – the maximum number of suggestions to return

Returns:

list of suggestions

task(task_id)[source]
tasks(state='open', start=None, end=None)[source]

Request a listing of tasks for this project.

Parameters:
  • state – the state of tasks to be listed, can be either 'open' or 'closed'

  • start – an optional datetime.date, datetime.datetime or ISO 8601-formatted str to limit the tasks to be listed having its relevant moment after state

  • end – an optional datetime.date, datetime.datetime or ISO 8601-formatted str to limit the tasks to be listed having its relevant moment before state

Returns:

a list of tasks

singlefile()[source]
extract_singlefile(tools=None, configuration=None, query=None)[source]

Schedules the extraction for the image of the singlefile.

singlefile_traces()[source]

Get all traces within a singlefile.

Returns:

a Trace stream (iterable once)

Return type:

SearchResult

singlefile_tracelets(tracelet_type)[source]

Get tracelets of a certain type within a singlefile, wrapping the result in an iterator of DictView objects.

Parameters:

tracelet_type – the type of tracelet to search for

Returns:

a tracelet stream containing DictView instances (iterable once)

Return type:

TraceletSearchResult

add_trace_data(trace_uid, data_type, data, key=<auto-fetch>)[source]

Uploads a single data stream for a specific trace.

Parameters:
  • trace_uid – uid of the trace

  • data_type – the name of the data stream to upload data to. Allowed data types can be found in the trace model

  • data – the data to be uploaded, either bytes, a file-like object or an iterable providing bytes

  • key – the key data for image trace belongs to, must be fetch, None or binary (bytes)

Returns:

True on success (failure will likely result in an HTTPError)

class SearchResult[source]

Bases: Iterable

Stream of traces generated from a remote JSON-encoded stream. Note that this result itself can be iterated only once, but Trace instances obtained from it do not rely on the result they’ve been obtained from.

See ProjectContext.search.

Getting results from a SearchResult can be done in one of three ways:

Treating the result as an iterable:

for trace in result:
    print(trace.name)

Calling take to process one or more batches of traces:

first_100 = result.take(100)
process_batch(first_100)

Calling takeone to get a single trace:

first = result.takeone()
second = result.takeone()

print(first.name, second.name)

If indices of traces within a result are needed, iteration can be combined with enumerate:

for idx, trace in enumerate(result):
    print(idx, trace.name)

Additional result info can be included using including:

# note that score will only have a value if the search was sorted on it
for trace, score in result.including('score'):
    print(score, trace.name)

Note

The underlying response from the REST API for a SearchResult is a stream. This means that hansken.py keeps the connection used for the result open for as long as needed. As a side-effect, the underlying stream can time out if not consumed quickly enough. Consuming at least 100 traces a minute from a SearchResult should keep timeouts at bay, as a rule of thumb.

Depending on the arguments to ProjectContext.search, a SearchResult might be able to auto-resume iteration of results in the event of errors (such as timeouts).

To explicitly release resources used by a SearchResult, use it as a context manager or manually call close.

Warning

The truth value of a SearchResult is based on the expectation of additional traces that can be read from its stream. This is only known for sure when the end of that stream is reached. Avoid code that requires a SearchResult to be truthy before retrieving its results; use iteration wherever possible.

Additionally, the num_results property provides the total number of results for the search call that produced this SearchResult, this does not necessarily align with the number of result objects that can be retrieved from it (e.g. when the count parameter of the search call was used).

Depending on the request and index, the num_results property may not provide an exact number. To help understand this value, you may use the num_results_quality property. This property indicates if num_results is exact (equalTo), a lower bound (equalToOrGreaterThan) or an estimate (estimate). Older Hansken versions may not set this field.

property facets

The facets returned with the search result, a list of FacetResult instances.

takeone(include=None)[source]

Takes the next trace from this result. This method may be called repeatedly.

Parameters:

include – name (str) or names to be included in the result, see including

Returns:

the next trace in the result, or None if it’s exhausted

Return type:

Trace, None or a namedtuple if include was non-empty

take(num, include=None)[source]

Takes num traces from this result. Subsequent calls will produce batches of traces as if using slicing.

Parameters:
  • num – amount of traces to take

  • include – name (str) or names to be included in the result, see including

Returns:

a list of Trace or namedtuple instances, of at most num size (or empty if the result is exhausted)

Return type:

list

including(*names)[source]

Iterates this result, yielding the requested named attributes accompanying a trace along with the trace. The trace is always the leftmost value in the tuples yielded by this method. Other attributes are included in parameter order and can be None.

# the score for each search result is only available when sorted on score
# iteration yields a 2-tuple, the trace with 1 additional attribute
for trace, score in context.search(sort='score-').including('score'):
    print(score, trace.uid)
Parameters:

names – attributes to include in the yielded tuple

Returns:

a generator, yielding namedtuple instances

close()[source]

Closes the file descriptor used to by the underlying JSON stream. After calling this method, no more traces can be obtained from this SearchResult (previously retrieved traces will remain usable).


In some cases, results of multiple search requests might need to be combined. As SearchResult is an iterable object, multiple instances can be chained together using utilities from Python’s itertools:

# chain 'stitches' together multiple iterable objects into one
from itertools import chain

# do two queries, both with their own results
result1 = context.search('query1')
result2 = context.search('query2')

# chain both results together
combined_result = chain(result1, result2)

# the combined result can now be treated as a new iterable object yielding traces
for trace in combined_result:
    print(trace.id, trace.name)

A construction like this comes in handy when we need to combine results based on multiple a (large) set of arguments. Generating a list of file names matching hashes listed in a file (assuming that list is too long to be used with an Or query), could be done as follows:

from itertools import chain

# define a function to read hashes from file
def sha1s_from(fname):
    with open(fname) as f:
        # read hashes as lines of text in f, strip leading and trailing
        # whitespace (avoid empty lines in the input file, dealing with
        # that is left as an exercise to the reader :))
        return [line.strip() for line in f]

# create a chain from a generator expression (which is iterable)
# a generator expression is lazy, making sure the call to search() is done
# at the time it's needed
results = chain.from_iterable(context.search(Term('data.raw.hash.sha1', sha1))
                              for sha1 in sha1s_from('sha1s.list'))

# results is iterable like before
for trace in results:
    print(trace.id, trace.name)

Note that after chaining, only iterating the result is usable; the chain won’t have methods like take or takeone. It could, however, be used in conjunction with something like to_csv, passing a chained result as the traces argument.


Should network or JSON parsing performance overhead ever be an issue, Hansken makes it possible to make a selection in the properties that are returned per trace. Use the select keyword in search calls for this:

traces = context.search('query', select=['name', 'uid'])

for trace in traces:
    print(trace.name, trace.uid)

The above snippet has no need for a trace’ processing information or any extracted properties. Note that a trace’ types are still available and attribute access like trace.email.subject won’t cause errors, but will produce None values unless selected with the search call.

A number of properties will always be returned and can’t be ‘unselected’. The Hansken remote controls this set of properties.


class TraceletSearchResult[source]

Bases: SearchResult

Stream of tracelets generated from a remote JSON-encoded stream. Note that this result itself can be iterated only once, but elements obtained from it do not rely on the result they’ve been obtained from.

See ProjectContext.search_tracelets.

Note

The same notes and caveats expressed for SearchResult apply to the TraceletSearchResult.

class ValueSearchResult[source]

Bases: SearchResult

Stream of values generated from a remote JSON-encoded stream. Note that this result itself can be iterated only once, but elements obtained from it do not rely on the result they’ve been obtained from.

The elements obtained from this result act like dict`s, with at least the keys ``value` and count, representing a value for the selected property and the number of times it occurs respectively. Both of these depend on the query that was supplied to the call that created this result, potentially omitting values or lowering occurrence counts when compared to that same call without a query.

Note

The same notes and caveats expressed for SearchResult apply to the TraceletSearchResult.

class FacetResult[source]

Bases: Mapping

Ordered mapping containing facet results. Iteration yields counter labels, values for which are a named tuple with attributes:

  • label: a provided label for a bucket

  • value: the value for a bucket (actual value of the start of a bucket)

  • count: the number of hits for the labeled bucket within a search

  • count_error: if count_error > 0, count is a minimum, and the true count

    is between [count, count + count_error]

Use as such:

# search a project for all files and get a facet for the file extensions
results = context.search(query=Term('type', 'file'),
                         facets=Facet('file.extension'))
# in case you're only interested in the facet, also supply count=0 to the search() call
# this avoids making Hansken's REST API retrieve all the traces for that search result

# the facet will be available on the result
facet = results.facets[0]

# different ways to use a FacetResult
for label in facet:
    print(label, facet[label].count)

for label, counter in facet.items():
    print(label, counter.count)

for counter in facet.values():
    print(counter.label, f'[{counter.count}-{counter.count + counter.count_error}]')
Counter

alias of FacetCounter

class Connection[source]

Bases: object

Base remote connection establishing a session with a remote gatekeeper. Exposes many calls from the REST API as methods that perform the associated HTTP requests. Calls returning JSON-encoded content will return the decoded Python equivalent (e.g. dict, list).

__init__(base_url, keystore_url=None, preference_url=None, auth=None, connection_pool_size=None, verify=True)[source]

Creates a new connection object facilitating communication to a Hansken gatekeeper, tracking session information provided by the remote end. Note that the username and password arguments are stored on the Connection object for future reuse. The value for password is wrapped in a lambda if it is supplied as a plain value. For production use, getpass.getpass should be supplied here, causing a non-echoing password prompt whenever a password is needed. Note that an authenticated session will likely be kept alive if requests are made periodically.

Parameters:
  • base_url – HTTP endpoint to a Hansken gatekeeper (e.g. https://hansken.nl/gatekeeper)

  • keystore_url – HTTP endpoint to a Hansken keystore (e.g. https://hansken.nl/keystore)

  • preference_url – HTTP endpoint to a Hansken preference service (e.g. https://hansken.nl/preference)

  • authHanskenAuthBase instance to handle authentication, or None

  • connection_pool_size – maximum size of HTTP(S) connection pool

  • verify

    how to check SSL/TLS certificates: - True: verify certificates (default); - False: do not verify certificates; - path to a certificate file: verify certificates against a

    specific certificate bundle.

open()[source]

Establishes a session with the remote(s). Authentication is assumed to be handled by the auth within the session.

Returns:

self

close(check_response=True)[source]

Explicitly ends the session established with the remote(s).

Parameters:

check_response – whether to check the response(s) for the expected status code, raising errors on unsuccessful logout(s)

Returns:

self

url(*path)[source]

Glues parts of a url path to the base url for this connection using /s, dropping any steps that are None.

Parameters:

path – steps to join to the base url

Returns:

a full url

key_url(*path)[source]

Glues parts of a url path to the keystore url for this connection using /s, dropping any steps that are None.

Parameters:

path – steps to join to the keystore url

Returns:

a full url

preferences_url(*path)[source]

Glues parts of a url path to the preference url for this connection using /s, dropping any steps that are None.

Parameters:

path – steps to join to the preference url

Returns:

a full url

single_preference_url(key, visibility)[source]

Creates url to retrieve/update/delete a single specific preference

Parameters:
  • visibility – visibility of the preference

  • key – key indicating the preference

Returns:

a full url

version()[source]

Retrieves the version info reported by the remote.

Returns:

a dict with keys build and timestamp

current_user()[source]

Retrieves information on the current user from the remote.

Returns:

a dict with information and identifiers for the current user

property identity

The current user’s identity for the current session.

Returns:

the currently available user identity of the form <user>@<domain>

identity_uid_at(service_url)[source]
identity_uid()[source]

Retrieves the current user’s identity as seen by the remote gatekeeper.

Returns:

uid of the form <user>@<domain>

match_identity(service_url)[source]
key(image_id, identity=None, raise_on_not_found=True)[source]

Retrieves the key for an image with the provided id, using the provided identity or that of the current user.

Warning

hansken.py cannot distinguish between an image not needing a key and a user that is not authorized to retrieve data from an image. In both cases, the remote key service would respond with a an error that is propagated to the caller be default. Internal uses of this method will typically set raise_on_not_found to False, potentially delaying errors to the point of (unauthorized) data access.

See also fetch.

Parameters:
  • image_id – the image to retrieve the key for

  • identity – the identity to retrieve the key for, defaults to the identity of the current user

  • raise_on_not_found – if False, return None when no key is available for (image_id, identity), otherwise raise an HTTPError like any other request

Returns:

a key for image image_id

Return type:

bytes

store_key(image_id, key, identity=None)[source]

Stores the key for an image, using the provided or current user’s identity.

Parameters:
  • image_id – the image to store the key for

  • key – the (binary) key content

  • identity – the identity to store the key for, defaults to the identity of the current user

Returns:

True on success (failure will likely result in an HTTPError)

delete_key(image_id, identity=None)[source]

Deletes the key for an image, using the provided or current user’s identity.

Parameters:
  • image_id – the image to delete the key for

  • identity – the identity to delete the key for, defaults to the identity of the current user

Returns:

True on success (failure will likely result in an HTTPError)

preference(key, visibility='preferred', project_id=None)[source]

Retrieves a preference. Requires a project id if the visibility is set to project

Parameters:
  • key – key to identify the preference

  • visibility – visibility of the preference. Can be either one of public, private, project or preferred

  • project_id – id of the project if the preference has project wide visibility

Returns:

the preference

preferences(visibility='preferred', project_id=None)[source]

Retrieves list of preferences with the given visibility. Requires a project id if the visibility is set to project

Parameters:
  • visibility – visibility of the preference. Can be either one of public, private, project or preferred

  • project_id – id of the project. Required for when visibility is set to project

Returns:

list of preferences

create_preference(key, visibility, value, project_id=None)[source]

Creates a new preference. Requires a project id if the visibility is set to project

Parameters:
  • key – key to identify the preference

  • visibility – visibility of the preference. Can be either one of public, private or project

  • value – value of the preference

  • project_id – id of the project if the preference has project wide visibility

Returns:

True on success (failure will likely result in an HTTPError)

edit_preference(key, visibility, value, project_id=None)[source]

Edits an existing preference by updating its value. Requires a project id if the visibility is set to project

Parameters:
  • key – key to identify the preference

  • visibility – visibility of the preference. Can be either one of public, private or project

  • value – new value of the preference

  • project_id – id of the project if the preference has project wide visibility

Returns:

True on success (failure will likely result in an HTTPError)

delete_preference(key, visibility, project_id=None)[source]

Deletes an existing preference. Requires a project id if the visibility is set to project

Parameters:
  • key – key to identify the preference

  • visibility – visibility of the preference. Can be either one of public, private or project

  • project_id – id of the project if the preference has project wide visibility

Returns:

True on success (failure will likely result in an HTTPError)

projects()[source]
project(project_id)[source]
activate_project(project_id)[source]
deactivate_project(project_id)[source]
project_images(project_id)[source]
create_project(**kwargs)[source]

Creates a new meta project. The new project’s properties are taken from kwargs, and translated to Hansken property names (camelCased). Specifying the new image’s id is allowed.

Parameters:

kwargs – metadata properties for the new project

Returns:

the new project’s identifier (a str(UUID))

edit_project(project_id, **kwargs)[source]
wait_for_task(task_id, poll_interval_sec=1, not_found_means_completed=False, progress_callback=None)[source]

Wait for a Hansken task to be done.

Parameters:
  • task_id – task to wait for

  • poll_interval_sec – the interval in seconds between polling

  • not_found_means_completed – whether to consider task not found a valid (completed) state. This is relevant for project deletion tasks after which the task itself will be gone as well.

  • progress_callback – method to be called every time the status is polled, method should take two arguments (poll_count: int, progress: double[0..1]), if not provided the callback will be skipped.

Returns:

the status of the task when done (cancelled, failed, completed)

delete_project(project_id, delete_images=False, delete_preferences=False)[source]

Deletes a meta project. Note that this will also remove a project’s search index and preferences.

Parameters:
  • project_id – project to be deleted

  • delete_images – whether to also delete the images linked to the project

  • delete_preferences – whether to also delete the preferences linked project. This is the evidence container for example.

Returns:

a success-value, whether the deletion of the project and all images (if requested) has succeeded (see log output for errors)

reindex_project(project_id, shards)[source]

Re-indexes a project to a specified number of shards.

Parameters:
  • project_id – project to be re-indexed

  • shards – the desired number of shards for the new index

Returns:

the task_id of the re-index task

clone_project(source_id, target_id, filter=None, exclude=None)[source]

Clones traces of a project identified by source_id to a project target_id. When filter (a query) is provided, only traces matching the filter are copied.

Parameters:
  • source_id – project to copy traces from

  • target_id – project to copy traces to

  • filter – query to define what traces to copy

  • exclude – properties to be excluded from the clone (not all properties are supported for this, supplying an unsupported property here will cause errors)

Returns:

the task_id of the clone task

image(image_id, project_id=None)[source]
create_image(**kwargs)[source]

Creates a new meta image. The new image’s properties are taken from kwargs, and translated to Hansken property names (camelCased). Specifying the new image’s id is allowed. Property user defaults to the current session’s identity, overrides are accepted.

Parameters:

kwargs – metadata properties for the new image

Returns:

the new image’s identifier (a str(UUID))

edit_image(image_id, **kwargs)[source]
delete_image(image_id)[source]
images(**kwargs)[source]
trace_model(project_id=None)[source]
trace(project_id, trace_uid)[source]
descriptor(project_id, trace_uid, stream='raw', key=<auto-fetch>)[source]
data(project_id, trace_uid, stream='raw', offset=0, size=None, key=<auto-fetch>, bufsize=8192)[source]

Opens a streaming read request that provides the requested data stream of a particular trace. Note that this provides a stream that should be closed by the user after use.

Parameters:
  • project_id – the project to associate the request with

  • trace_uid – the trace identifier to read from

  • stream – the stream to read (e.g. raw, plain, html)

  • offset – the offset within the data stream

  • size – the amount of data to provide

  • key – the key data to be used to decrypt data, must be either fetch, None or binary (bytes)

  • bufsize – the network buffer size

Returns:

a closable file-like object

Return type:

io.BufferedReader

create_resource(*, data, **kwargs)[source]

Creates a Hansken resource in two stages, registering the resource metadata before uploading the corresponding data.

Note

Required parameters for a resources are not validated by hansken.py. Missing parameters like group, name or version will likely result in HTTPError exceptions.

Parameters:
  • data – data blob (either bytes or a file-like object) to be stored (the actual resource)

  • kwargs – the resource meta data associated with data

Returns:

the id of the newly created resource

edit_resource(resource_id, *, visibility)[source]
delete_resource(resource_id)[source]
resources(**kwargs)[source]

List available resources, optionally filtering by provided properties supplied as keyword arguments.

Parameters:

kwargs – properties to filter for (e.g. group='nl.nfi.example')

Returns:

list of resources matching kwargs

snippets(project_id, *snippets)[source]
roots(project_id)[source]
children(project_id, trace_uid, query=None, start=0, count=None, sort=None, facets=None, snippets=None, select=<all>, incomplete_results=None, timeout=None)[source]

Searches the children of a trace.

Parameters:
  • project_id – id of the project to search in

  • trace_uid – id of the trace retrieve children for

  • query – query to apply on the children of trace_uid (default: retrieve all children)

  • start – the start offset to be included

  • count – max number of children to be retrieved

  • sort – sort clause(s) of the form score-, some.field

  • facets – facet(s) to be used for search (str, Facet or sequence of either)

  • snippets – maximum number of snippets to return per trace

  • select – property selector(s), defaults to all properties

  • incomplete_results – whether to allow results from an incomplete index (defaults to letting the remote decide)

  • timeout – the maximum amount of time to wait for a search response, in seconds or as a datetime.timedelta (defaults to letting the remote decide), set to 0 to disable timeout

Returns:

a file-like object, streaming the raw json response from remote

note(project_id, trace_uid, note, refresh=None)[source]
tag(project_id, trace_uid, tag, refresh=None)[source]
delete_tag(project_id, trace_uid, tag, refresh=None)[source]
mark_privileged(project_id, trace_uid, status, refresh=None)[source]
import_trace(project_id, trace, data=None, key=<auto-fetch>, method='heuristic', properties=None, overwrite=False)[source]

Imports the requested properties on trace into a trace in project.

Please note that, for performance reasons, all changes are buffered and not directly effective in subsequent search, update and import requests. As a consequence, successive changes to a single trace might be ignored. Instead, all changes to an individual trace should be bundled in a single update or import request. The project index is refreshed automatically (by default every 30 seconds), so changes will become visible eventually.

Parameters:
  • project_id – the project to import trace into

  • trace – the trace to be imported

  • data – a dict mapping data type / stream name to bytes to be imported

  • key – the key data for image trace belongs to, must be fetch, None or binary (bytes)

  • method – a method to match trace to an existing trace in project, either 'strict' or 'heuristic'

  • properties – the properties to be imported

  • overwrite – whether properties to be imported should be overwritten if already present

Returns:

a response object encoding the import result, detailing what trace the imported trace was matched to and what properties were imported

add_trace_data(project_id, trace_uid, data_type, data, key=<auto-fetch>)[source]

Uploads a single data stream for a specific trace.

Parameters:
  • project_id – the project to import data into

  • trace_uid – uid of the trace

  • data_type – the name of the data stream to upload data to. Allowed data types can be found in the trace model

  • data – the data to be uploaded, either bytes, a file-like object or an iterable providing bytes

  • key – the key data for image trace belongs to, must be fetch, None or binary (bytes)

Returns:

True on success (failure will likely result in an HTTPError)

create_trace(project_id, parent_uid, child, data=None, key=<auto-fetch>)[source]

Requests a new trace to be indexed as a child trace of an existing trace.

Parameters:
  • project_id – id of the project to index the trace into

  • parent_uid – the trace uid of the trace to attach the new child to

  • child – the new trace to be indexed

  • data – a dict mapping data type / stream name to bytes to be attached to the new trace

  • key – the key data for image parent_uid and thus child belong to

Returns:

the id of the newly created trace

search(project_id, query=None, start=0, count=None, sort=None, facets=None, snippets=None, select=<all>, incomplete_results=None, deduplicate_field=None, timeout=None)[source]

Performs a search request built from the parameters.

param project_id:

id of the project to search in

param query:

the query to submit

param start:

the start offset to be included

param count:

max number of traces to be retrieved

param sort:

sort clause(s) of the form score-, some.field

param facets:

facet(s) to be used for search (str, Facet or sequence of either)

param snippets:

maximum number of snippets to return per trace

param select:

property selector(s), defaults to all properties

param incomplete_results:

whether to allow results from an incomplete index (defaults to letting the remote decide, which will typically not allow results from an incomplete index)

Parameters:

deduplicate_field

which single value field to deduplicate on

(defaults to None: the results are not deduplicated)

param timeout:

the maximum amount of time to wait for a search response, in seconds or as a datetime.timedelta (defaults to letting the remote decide), set to 0 to disable timeout

return:

a file-like object, streaming the raw json response from remote

queue_cleaning(project_id, query=None, clean_priority='low')[source]

Performs a search request built from the parameters, and puts the results on the cleaner queue with the given priority.

Parameters:
  • project_id – id of the project to search in

  • query – the query to submit

  • clean_priority – the priority to use for cleaning the traces

Returns:

True on success (failure will likely result in an HTTPError)

delete_traces(project_id, query=None)[source]

Deletes traces from a project. This deletes traces from the index and the datastore and will also delete the trace children. Note, this is only possible when the image is of type VNFI and the image, from which the traces will be deleted, is linked to only one project. When requirements and permissions are met, a task is scheduled to delete the traces.

Parameters:
  • project_id – id of the project to search in

  • query – the query to submit

Returns:

the task_id of the delete task

delete_cleaned_data_image(project_id, image_id)[source]

Deletes the cleaned data from the cache for all traces in the given image. This can be used to force a re-clean of the trace data, because the next time the cleaned data is requested for one of these traces the cleaner will run again.

Parameters:
  • project_id – id of the project

  • image_id – the image

delete_cleaned_data_project(project_id)[source]

Delete cleaned data for all images of a project.

Parameters:

project_id – id of the project

delete_cleaned_data_trace(project_id, trace_uid, data_type=None)[source]

Delete cleaned data from the datastore (cache) for a trace. This can be used to force a re-clean of the trace data, because the next time the cleaned data is requested for this trace the cleaner will run again.

Parameters:
  • project_id – id of the project the trace belongs to

  • trace_uid – the trace the data belongs to

  • data_type – the type of the cleaned data stream to delete; e.g. raw, encrypted…. If not provided, all cleaned data is deleted

search_tracelets(project_id, tracelet_type, query=None, start=0, count=None, sort='id', select=<all>, incomplete_results=None, timeout=None)[source]

Performs a search request for tracelets built from the parameters.

Parameters:
  • project_id – id of the project to search in

  • tracelet_type – the type of tracelet to search for

  • query – the query to submit

  • start – the start offset to be included

  • count – max number of traces to be retrieved

  • sort – sort clause(s) of the form score-, some.field, defaults to sorting on id

  • select – property selector(s), defaults to all properties

  • incomplete_results – whether to allow results from an incomplete index (defaults to letting the remote decide, which will typically not allow results from an incomplete index)

  • timeout – the maximum amount of time to wait for a search response, in seconds or as a datetime.timedelta (defaults to letting the remote decide), set to 0 to disable timeout

Returns:

a file-like object, streaming the raw json response from remote

unique_values(project_id, select, query=None, after=None, count=None, incomplete_results=None, timeout=None)[source]
create_search_request(query=None, *, start=0, count=None, sort=None, facets=None, snippets=None, tracelet_type=None, project_ids=None, select=<all>, incomplete_results=None, deduplicate_field=None, timeout=None)[source]

Creates a search request from the parameters, transforming parameters when needed.

param query:

the query to submit

param start:

the start offset to be included

param count:

max number of traces to be included

param sort:

sort clause(s) of the form score-, some.field or instance(s) of Sort

param facets:

facet(s) to be used for search (str, Facet or sequence of either)

param snippets:

maximum number of snippets to return per trace

param tracelet_type:

the type of tracelet to search for (only applicable to tracelet searches)

param project_ids:

a collection of project ids to pass this search request to, or None

param select:

property selector(s), defaults to all properties

param incomplete_results:

whether to allow results from an incomplete index (defaults to letting the remote decide)

Parameters:

deduplicate_field

which single value field to deduplicate on

(defaults to None: the results are not deduplicated)

param timeout:

the maximum amount of time to wait for a search response, in seconds or as a datetime.timedelta (defaults to letting the remote decide), set to 0 to disable timeout

suggest(project_id, text, query_property='text', count=100)[source]

Expand a search term from terms in the index.

Parameters:
  • project_id – the project id to suggest terms from

  • text – the text / search term to be expanded

  • query_property – the search property to expand terms for (e.g. 'file.name' or 'text')

  • count – the maximum number of suggestions to return

Returns:

list of suggestions

task(task_id)[source]
cancel_task(task_id)[source]
tasks(state='open', project_id=None, start=None, end=None)[source]

Request a listing of tasks.

Parameters:
  • state – the state of tasks to be listed, can be either 'open' or 'closed'

  • project_id – an optional project id to list tasks for

  • start – an optional datetime.date, datetime.datetime or ISO 8601-formatted str to limit the tasks to be listed having its relevant moment after state

  • end – an optional datetime.date, datetime.datetime or ISO 8601-formatted str to limit the tasks to be listed having its relevant moment before state

Returns:

a list of tasks

task_export_key(task_id, *, image_key=<auto-fetch>, image_id=None)[source]

Retrieve the export key for a specific task.

Note that this requires the key for the image this task is performed on, supplied either direct or by looking it up through a supplied image id.

Parameters:
  • task_id – the task for which to retrieve the export key

  • image_key – key for the task’s image (mutually exclusive with image_id)

  • image_id – image id for the task (mutually exclusive with image_key)

Returns:

the task’s export key

health()[source]
log_messages(project_id, image_id, message_type='log', task_id=None)[source]

Retrieves log messages for the extraction of image image_id within the project project_id.

Parameters:
  • project_id – the project for which the extraction was run

  • image_id – the image id for which to retrieve the log messages

  • message_type – the type of message to retrieve, either 'log' or 'failedTrace'

  • task_id – retrieve only messages related to a particular task id

Returns:

a list of log messages, dict`s with keys `”date”`` and "message"

tools(project_id=None)[source]

Get the tools available for extraction.

Parameters:

project_id – get the tools available for a specific project (leave None for all tools)

Returns:

a dict, mapping the name of a tool to its version and human-friendly description

extract(project_id, image_id, type='index', key=<auto-fetch>, tools=None, configuration=None, query=None, pre_clean_priority=None, engine=None)[source]

Extract traces from an image in the context of a project.

Parameters:
  • project_id – the project the extraction should be part of

  • image_id – the image to be extracted

  • type – the type of extraction to start

  • key – the key data to be used to decrypt data, must be either fetch, None or binary (bytes)

  • tools – the tools to be used; either a sequence of tool names or None, indicating to use the default tools (see also tools)

  • configuration – configuration overrides for this extraction, as a dict (e.g. {'timeout': 0})

  • query – a query to use for extraction types other than ‘index’

  • pre_clean_priority – the pre-clean priority used for this extraction, must be either 'low', 'medium', 'high' or use None to use the default. To disable pre- cleaning, use 'disabled'

  • engine – the extraction engine to use, if unset the default engine is used

Returns:

the job id of the extraction that is started

extract_singlefile(singlefile_id, tools=None, configuration=None)[source]

Schedules the extraction for the singlefile with ID {singlefile_id}. This is an asynchronous call that only initiates the extraction. To monitor the progress of the extraction, use the ‘get_singlefile’ call to retrieve the states of the singlefile image object.

Parameters:
  • singlefile_id – the id of the single file

  • tools – the tools to be used; either a sequence of tool names or None, indicating to use the default tools (see also tools)

  • configuration – configuration overrides for this extraction, as a dict (e.g. {'timeout': 0})

Returns:

the job id of the extraction that is initiated

backup_project(project_id, user_backup_key, image_keys=<auto-fetch>)[source]

Creates a backup of a project.

Parameters:
  • project_id – the project to make a backup of

  • user_backup_key – the key data to be used to encrypt the backup data, must be binary (bytes)

  • image_keys – the key data to be used to decrypt data, must be a dict whose entries have an image id as key and have a binary (bytes) value. If no dict is provided, the image keys are fetched from the keystore.

Returns:

the job id of the backup task that is started

download_backup(task_id, file_path)[source]
export_project(project_id, user_export_key, image_keys=<auto-fetch>, query=None, include_priviliged=False, include_notes=False, include_tags=False, include_entities=False, include_image_data=False, image_id=None)[source]

Initiates an export of a project.

Parameters:
  • project_id – the project to make an export of

  • user_export_key – the key data to be used to encrypt the export data, must be binary (bytes)

  • image_keys – the key data to be used to decrypt data, must be a dict whose entries have an image id as key and have a binary (bytes) value. If no dict is provided, the image keys are fetched from the keystore.

  • query – the query used to select all, or a subset, of traces from a project to be exported

  • include_priviliged – if priviliged traces should be exported

  • include_notes – if notes should be exported

  • include_tags – if tags should be exported

  • include_entities – if entities should be exported

  • include_image_data – if a new sliced image should be generated; if true, image_id should be set, too

  • image_id – the UUID of the original image to generate a new sliced image from

Returns:

the job id of the export task that is started

download_export(task_id, file_path)[source]
prepare_project_import(project_id, user_export_key, export_file, image_keys=<auto-fetch>)[source]

Importing an export into Hansken is a two stage process, in this first stage the export is uploaded and validation occurs to see if the import can go ahead.

Parameters:
  • project_id – the project to which the exported traces need to be added

  • user_export_key – the key data to be used to decrypt the exported data, must be binary (bytes)

  • export_file – the on disk file export file that is to be imported

  • image_keys – the key data to be used to decrypt data, must be a dict whose entries have an image id as key and have a binary (bytes) value. If no dict is provided, the image keys are fetched from the keystore.

Returns:

the job id of the import task that is started

apply_project_import(project_id, user_export_key, task_id, image_keys=<auto-fetch>, project_metadata_import_strategy=ImportStrategy.UPDATE, images_metadata_import_strategy=ImportStrategy.UPDATE)[source]

In the second stage of the import process (the first stage is handled by prepare_project_import) traces from the previously uploaded export file are added to the specified project

Parameters:
  • project_id – the project to add the exported traces to

  • user_export_key – the key data to be used to decrypt the exported data, must be binary (bytes)

  • task_id – the task id of the import task we want to apply now

  • image_keys – the key data to be used to decrypt data, must be a dict whose entries have an image id as key and have a binary (bytes) value. If no dict is provided, the image keys are fetched from the keystore.

Param:

project_metadata_import_strategy: import strategy for the project data in the export

Param:

images_metadata_import_strategy: import strategy for the image data in the export

Returns:

the job id of the import apply task that is started

upload_image(image_id, data, extension=None, offset=None)[source]

Uploads image data. Note that data in the NFI format requires two files (.nfi (data) and .nfi.idx (index)) and thus two upload calls.

Parameters:
  • image_id – the image id of the data to be uploaded

  • data – the image data to be uploaded, either bytes, a file-like object or an iterable providing bytes (see documentation on requests’ upload support)

  • extension – file extension of the upload, either '.nfi', '.nfi.idx' or None

  • offset – byte offset of data within the complete file to be uploaded

Returns:

the image id of the uploaded data

upload_singlefile(data, name)[source]

A singlefile is a temporary single data source for quick extraction of traces. Upload a singlefile performs a number of steps:

  • create a project with property hidden set to true

  • create an image with property hidden set to true

  • link the image to the project

  • upload the singlefile data

Parameters:
  • data – the image data to be uploaded, either bytes, a file-like object or an iterable providing bytes (see documentation on requests’ upload support)

  • name – the name for the singlefile project and image

Returns:

the singlefile’s identifier (a str(UUID))

singlefiles()[source]
singlefile(singlefile_id)[source]
singlefile_traces(singlefile_id)[source]

Get all traces within a singlefile.

Parameters:

singlefile_id – unique id of the singlefile, UUID

Returns:

a file-like object, streaming the raw json response from remote

singlefile_tracelets(singlefile_id, tracelet_type)[source]

Get tracelets of a certain type within a singlefile.

Parameters:
  • singlefile_id – id of the singlefile

  • tracelet_type – the type of tracelet to search for

Returns:

a file-like object, streaming the raw json response from remote

delete_singlefile(singlefile_id)[source]

Delete a singlefile with a given singlefileId.

Parameters:

singlefile_id – singlefile to be deleted

Returns:

a success-value, whether the deletion of the singlefile has succeeded