hansken.remote
— Communication with Hansken
Todo
Introduction to goal of module, default use
Mention error handling, http errors are propagated
- class ProjectContext[source]
Bases:
object
Utility class adding a particular project id to each rest call that requires one. Provided with a url to a Hansken gatekeeper and the id of a project, methods of this class will make REST requests to the gatekeeper, wrapping results with a Trace class or iterator where applicable.
ProjectContext instances are usable as context managers, which opens the context for use. Calls to methods requiring an initialized connection will automatically open the context if it wasn’t yet opened.
- __init__(base_url_or_connection, project_id, keystore_url=None, preference_url=None, auth=None, auto_activate=True, connection_pool_size=None, verify=True)[source]
Creates a new context object facilitating communication to a Hansken gatekeeper. Uses a Connection to track session state. The provided project id is used for all calls. Can be used with an existing
Connection
instance- Parameters:
base_url_or_connection – HTTP endpoint to a Hansken gatekeeper (e.g. https://hansken.nl/gatekeeper) or a
Connection
instanceproject_id – project id to associate with
keystore_url – HTTP endpoint to a Hansken keystore (e.g. https://hansken.nl/keystore)
preference_url – HTTP endpoint to a Hansken preference service
auth –
HanskenAuthBase
instance to handle authentication, orNone
auto_activate – whether the project should automatically be activated if it is currently deactivated
connection_pool_size – maximum size of HTTP(S) connection pool
verify –
how to check SSL/TLS certificates: -
True
: verify certificates (default); -False
: do not verify certificates; - path to a certificate file: verify certificates against aspecific certificate bundle.
- image_name(image_id)[source]
Retrieves the name (falling back to description if that isn’t available) of an image, identified by its id.
Note
Results of this call are cached inside the
ProjectContext
, making repeated calls to this method cheap. See Python’s documentation on functools.lru_cache. This cache is cleared when theProjectContext
is closed (this includes it being used as a context manager).- Parameters:
image_id – the id of the image to name
- Returns:
image_id
’s name
- key(image_id)[source]
Retrieve the key for image identified by image_id.
Warning
hansken.py
cannot distinguish between an image not needing a key and a user that is not authorized to retrieve data from an image. In both cases, the result of this method isNone
.Note
Results of this call are cached inside the
ProjectContext
, making repeated calls to this method cheap. See Python’s documentation on functools.lru_cache. This cache is cleared when theProjectContext
is closed (this includes it being used as a context manager).- Parameters:
image_id – the id of the image to retrieve the key for
- Returns:
image_id
’s key, orNone
- descriptor(trace_uid, stream='raw', key=<auto-fetch>)[source]
Retrieve the data descriptor for a named stream (default
raw
) for a particular trace.- Parameters:
trace_uid – the trace to retrieve the data descriptor for
stream – stream to get the descriptor for
key – key for the image of this trace (default is to fetch the key automatically, if it’s available)
- Returns:
the stream’s data descriptor (as-is)
- snippets(*snippets, keys=<auto-fetch>, highlights=None)[source]
Retrieves textual snippets of data, optionally highlighting term hits within the resulting text.
- Parameters:
snippets – snippet request dicts
keys – keys required for data access, either
fetch
to automatically attach the corresponding keys to each snippet, adict
mapping image ids to (binary) key data orNone
highlights – collection of terms to be highlighted in all snippet results (highlights provided here are added to all requests in snippets)
- Returns:
a
list
of snippet response dicts, index-matched to snippets
- children(trace_uid, query=None, start=0, count=None, sort=None, facets=None, snippets=None, timeout=None)[source]
Searches the children of a trace. See
search
andSearchResult
.- Parameters:
trace_uid – id of the trace retrieve children for
query – the query to submit
start – the start offset of the retrieved result
count – max number of traces to retrieve
sort – sort clause(s) of the form
score-
,some.field
facets – facet(s) to be used for search (
str
,Facet
or sequence of either)snippets – maximum number of snippets to return per trace
timeout – the maximum amount of time to wait for a search response, in seconds or as a
datetime.timedelta
(defaults to letting the remote decide), set to 0 to disable timeout
- Returns:
a trace stream (iterable once)
- child_builder(parent_uid)[source]
Create a
TraceBuilder
to build a trace to be saved as a child of the trace identified by parent_uid. Note that a new trace will only be added to the index once explicitly saved (e.g. throughTraceBuilder.build
).- Parameters:
parent_uid – a trace identifier to create a child builder for
- Returns:
a
TraceBuilder
set up to save a new trace as the child trace of parent_uid
- update_trace(trace, updates=None, data=None, key=<auto-fetch>, overwrite=False)[source]
Updates metadata of trace to reflect requested updates in the user origin. Does not edit trace in-place.
Please note that, for performance reasons, all changes are buffered and not directly effective in subsequent search, update and import requests. As a consequence, successive changes to a single trace might be ignored. Instead, all changes to an individual trace should be bundled in a single update or import request. The project index is refreshed automatically (by default every 30 seconds), so changes will become visible eventually.
- Parameters:
trace –
Trace
to be updatedupdates – metadata properties to be added or updated, mapped to new values in a (possibly nested)
dict
data – a
dict
mapping data type / stream name to bytes to be importedkey – the key data for image trace belongs to, must be
fetch
,None
or binary (bytes
)overwrite – whether properties to be imported should be overwritten if already present
- Returns:
processing information from remote
- search(query=None, start=0, count=None, sort='uid', facets=None, snippets=None, select=<all>, incomplete_results=None, deduplicate_field=None, timeout=None)[source]
Performs a search request, wrapping the result in a sequence that allows iteration, indexing and slicing, automatically fetching batches of results when needed. The returned sequence keeps state and is not threadsafe.
Note
Supplying a value for count is required when using a start offset. The maximum size of such a batch is dictated by the Hansken REST API and will usually be 200. Requesting a batch larger than the maximum amount of traces will raise an error.
Additionally, neither start nor count influence the result of the
len
builtin on the result set, just the amount of traces iteration of the result will yield, seeSearchResult
.- param query:
the query to submit
- param start:
the start offset of the retrieved result
- param count:
max number of traces to retrieve
- param sort:
sort clause(s) of the form
score-
,some.field
, defaults to sorting onuid
- param facets:
facet(s) to be used for search (
str
,Facet
or sequence of either)- param snippets:
maximum number of snippets to return per trace
- param select:
property selector(s), defaults to all properties
- param incomplete_results:
whether to allow results from an incomplete index (defaults to letting the remote decide, which will typically not allow results from an incomplete index)
- Parameters:
deduplicate_field –
- which single value field to deduplicate on
(defaults to None: the results are not deduplicated)
- param timeout:
the maximum amount of time to wait for a search response, in seconds or as a
datetime.timedelta
(defaults to letting the remote decide), set to 0 to disable timeout- return:
a
Trace
stream (iterable once)- rtype:
- queue_cleaning(query=None, clean_priority='low')[source]
Performs a search request built from the parameters, and puts the results on the cleaner queue with the given priority.
- Parameters:
query – the query to submit
clean_priority – the priority to use for cleaning the traces
- Returns:
True on success (failure will likely result in an HTTPError)
- delete_cleaned_data(*, trace_uid=None, image_id=None, data_type=None)[source]
Deletes the cleaned data from the cache. This can be used to force a re-clean of the trace data, because the next time the cleaned data is requested for one of these traces the cleaner will run again.
This can be done in 4 different ways:
data_type: delete cleaned data for a specific data type belonging to a specific trace.
trace: delete all cleaned data from the datastore (cache) for a trace.
image: delete all the cleaned data from the cache for all traces in the given image.
project: delete all the cleaned data for all images of a project.
- Parameters:
trace_uid – uid of the trace of which the cleaned data will be deleted
image_id – id of the image of which the cleaned data will be deleted
data_type – type of data of the trace to clean (
raw
,encrypted
,...
)
- search_tracelets(tracelet_type, query=None, start=0, count=None, sort='id', select=<all>, incomplete_results=None, timeout=None)[source]
Performs a search request for tracelets, wrapping the result in an iterator of
DictView
objects.- Parameters:
tracelet_type – the type of tracelet to search for
query – query to match tracelets
start – the start offset of the retrieved result
count – max number of tracelets to retrieve
sort – sort clause(s) of the form
score-
,some.field
, defaults to sorting onid
select – property selector(s), defaults to all properties
incomplete_results – whether to allow results from an incomplete index (default to letting the remote decide, which will typically not allow results from an incomplete index)
timeout – the maximum amount of time to wait for a search response, in seconds or as a
datetime.timedelta
(defaults to letting the remote decide), set to 0 to disable timeout
- Returns:
a tracelet stream containing
DictView
instances (iterable once)- Return type:
- unique_values(select, query=None, after=None, count=None, incomplete_results=None, timeout=None)[source]
Retrieves unique values for the selected property or properties with their corresponding counts. The set of traces or tracelets from which these values are taken can be controlled with the query argument (though also take note of the particulars of the query argument below). To retrieve unique addressees for deleted email traces, the following could be used:
# define a query to select only traces that are marked as deleted query = Term('type', 'deleted') # retrieve unique values for the "email.to" property for result in context.unique_values('email.to', query): print(result['count'], result['value'])
Note
Use
trace:{query}
to apply a the query to the trace itself, if the property inselect
is part of a tracelet (i.e.entity.value
). To filter by traces of type email for example, addtrace:{type:email}
.- Parameters:
select – the property or properties to select values for
query – query to match traces or tracelets to take values from
after – starting point for the result stream
count – max number of values to retrieve
incomplete_results – whether to allow results from an incomplete index (default to letting the remote decide, which will typically not allow results from an incomplete index)
timeout – the maximum amount of time to wait for a search response, in seconds or as a
datetime.timedelta
(defaults to letting the remote decide), set to 0 to disable timeout
- Returns:
a stream of values with counters (iterable once), sorted by the natural order of the values
- Return type:
- suggest(text, query_property='text', count=100)[source]
Expand a search term from terms in the index.
Use key
'text'
for a plain version of the expanded term, key'highlighted'
for a highlighted version:# get the plain expansion of terms starting with "example" texts = [suggestion['text'] for suggestion in context.suggest('example')] # get highlighted expansions for file names starting with "example" highlighted = [suggestion['highlighted'] for suggestion in context.suggest('example' property_query='file.name')] # values will contain square brackets to show what was expanded, e.g. # 'example[[.exe]]' or 'example[[file.dat]]'
- Parameters:
text – the text / search term to be expanded
query_property – the search property to expand terms for (e.g.
'file.name'
or'text'
)count – the maximum number of suggestions to return
- Returns:
list
of suggestions
- tasks(state='open', start=None, end=None)[source]
Request a listing of tasks for this project.
- Parameters:
state – the state of tasks to be listed, can be either
'open'
or'closed'
start – an optional
datetime.date
,datetime.datetime
or ISO 8601-formattedstr
to limit the tasks to be listed having its relevant moment after stateend – an optional
datetime.date
,datetime.datetime
or ISO 8601-formattedstr
to limit the tasks to be listed having its relevant moment before state
- Returns:
a
list
of tasks
- extract_singlefile(tools=None, configuration=None, query=None)[source]
Schedules the extraction for the image of the singlefile.
- singlefile_traces()[source]
Get all traces within a singlefile.
- Returns:
a
Trace
stream (iterable once)- Return type:
- singlefile_tracelets(tracelet_type)[source]
Get tracelets of a certain type within a singlefile, wrapping the result in an iterator of
DictView
objects.- Parameters:
tracelet_type – the type of tracelet to search for
- Returns:
a tracelet stream containing
DictView
instances (iterable once)- Return type:
- add_trace_data(trace_uid, data_type, data, key=<auto-fetch>)[source]
Uploads a single data stream for a specific trace.
- Parameters:
trace_uid – uid of the trace
data_type – the name of the data stream to upload data to. Allowed data types can be found in the trace model
data – the data to be uploaded, either
bytes
, a file-like object or an iterable providing byteskey – the key data for image trace belongs to, must be
fetch
,None
or binary (bytes
)
- Returns:
True on success (failure will likely result in an HTTPError)
- class SearchResult[source]
Bases:
Iterable
Stream of traces generated from a remote JSON-encoded stream. Note that this result itself can be iterated only once, but
Trace
instances obtained from it do not rely on the result they’ve been obtained from.Getting results from a
SearchResult
can be done in one of three ways:Treating the result as an iterable:
for trace in result: print(trace.name)
Calling
take
to process one or more batches of traces:first_100 = result.take(100) process_batch(first_100)
Calling
takeone
to get a single trace:first = result.takeone() second = result.takeone() print(first.name, second.name)
If indices of traces within a result are needed, iteration can be combined with
enumerate
:for idx, trace in enumerate(result): print(idx, trace.name)
Additional result info can be included using
including
:# note that score will only have a value if the search was sorted on it for trace, score in result.including('score'): print(score, trace.name)
Note
The underlying response from the REST API for a
SearchResult
is a stream. This means thathansken.py
keeps the connection used for the result open for as long as needed. As a side-effect, the underlying stream can time out if not consumed quickly enough. Consuming at least 100 traces a minute from aSearchResult
should keep timeouts at bay, as a rule of thumb.Depending on the arguments to
ProjectContext.search
, aSearchResult
might be able to auto-resume iteration of results in the event of errors (such as timeouts).To explicitly release resources used by a
SearchResult
, use it as a context manager or manually callclose
.Warning
The truth value of a
SearchResult
is based on the expectation of additional traces that can be read from its stream. This is only known for sure when the end of that stream is reached. Avoid code that requires aSearchResult
to be truthy before retrieving its results; use iteration wherever possible.Additionally, the
num_results
property provides the total number of results for the search call that produced thisSearchResult
, this does not necessarily align with the number of result objects that can be retrieved from it (e.g. when thecount
parameter of the search call was used).Depending on the request and index, the
num_results
property may not provide an exact number. To help understand this value, you may use thenum_results_quality
property. This property indicates ifnum_results
is exact (equalTo
), a lower bound (equalToOrGreaterThan
) or an estimate (estimate
). Older Hansken versions may not set this field.- property facets
The facets returned with the search result, a
list
ofFacetResult
instances.
- takeone(include=None)[source]
Takes the next trace from this result. This method may be called repeatedly.
- take(num, include=None)[source]
Takes num traces from this result. Subsequent calls will produce batches of traces as if using slicing.
- including(*names)[source]
Iterates this result, yielding the requested named attributes accompanying a trace along with the trace. The trace is always the leftmost value in the tuples yielded by this method. Other attributes are included in parameter order and can be
None
.# the score for each search result is only available when sorted on score # iteration yields a 2-tuple, the trace with 1 additional attribute for trace, score in context.search(sort='score-').including('score'): print(score, trace.uid)
- Parameters:
names – attributes to include in the yielded tuple
- Returns:
a
generator
, yieldingnamedtuple
instances
- close()[source]
Closes the file descriptor used to by the underlying JSON stream. After calling this method, no more traces can be obtained from this
SearchResult
(previously retrieved traces will remain usable).
In some cases, results of multiple search requests might need to be combined.
As SearchResult
is an iterable object, multiple instances can be chained together using utilities from Python’s itertools
:
# chain 'stitches' together multiple iterable objects into one
from itertools import chain
# do two queries, both with their own results
result1 = context.search('query1')
result2 = context.search('query2')
# chain both results together
combined_result = chain(result1, result2)
# the combined result can now be treated as a new iterable object yielding traces
for trace in combined_result:
print(trace.id, trace.name)
A construction like this comes in handy when we need to combine results based on multiple a (large) set of arguments.
Generating a list of file names matching hashes listed in a file (assuming that list is too long to be used with an Or
query),
could be done as follows:
from itertools import chain
# define a function to read hashes from file
def sha1s_from(fname):
with open(fname) as f:
# read hashes as lines of text in f, strip leading and trailing
# whitespace (avoid empty lines in the input file, dealing with
# that is left as an exercise to the reader :))
return [line.strip() for line in f]
# create a chain from a generator expression (which is iterable)
# a generator expression is lazy, making sure the call to search() is done
# at the time it's needed
results = chain.from_iterable(context.search(Term('data.raw.hash.sha1', sha1))
for sha1 in sha1s_from('sha1s.list'))
# results is iterable like before
for trace in results:
print(trace.id, trace.name)
Note that after chaining, only iterating the result is usable; the chain won’t have methods like take
or takeone
.
It could, however, be used in conjunction with something like to_csv
, passing a chained result as the traces argument.
Should network or JSON parsing performance overhead ever be an issue, Hansken makes it possible to make a selection in the properties that are returned per trace.
Use the select
keyword in search calls for this:
traces = context.search('query', select=['name', 'uid'])
for trace in traces:
print(trace.name, trace.uid)
The above snippet has no need for a trace’ processing information or any extracted properties.
Note that a trace’ types are still available and attribute access like trace.email.subject
won’t cause errors,
but will produce None
values unless selected with the search call.
A number of properties will always be returned and can’t be ‘unselected’. The Hansken remote controls this set of properties.
- class TraceletSearchResult[source]
Bases:
SearchResult
Stream of tracelets generated from a remote JSON-encoded stream. Note that this result itself can be iterated only once, but elements obtained from it do not rely on the result they’ve been obtained from.
See
ProjectContext.search_tracelets
.Note
The same notes and caveats expressed for
SearchResult
apply to theTraceletSearchResult
.
- class ValueSearchResult[source]
Bases:
SearchResult
Stream of values generated from a remote JSON-encoded stream. Note that this result itself can be iterated only once, but elements obtained from it do not rely on the result they’ve been obtained from.
The elements obtained from this result act like
dict`s, with at least the keys ``value`
andcount
, representing a value for the selected property and the number of times it occurs respectively. Both of these depend on the query that was supplied to the call that created this result, potentially omitting values or lowering occurrence counts when compared to that same call without a query.Note
The same notes and caveats expressed for
SearchResult
apply to theTraceletSearchResult
.
- class FacetResult[source]
Bases:
Mapping
Ordered mapping containing facet results. Iteration yields counter labels, values for which are a named tuple with attributes:
label: a provided label for a bucket
value: the value for a bucket (actual value of the start of a bucket)
count: the number of hits for the labeled bucket within a search
- count_error: if count_error > 0, count is a minimum, and the true count
is between [count, count + count_error]
Use as such:
# search a project for all files and get a facet for the file extensions results = context.search(query=Term('type', 'file'), facets=Facet('file.extension')) # in case you're only interested in the facet, also supply count=0 to the search() call # this avoids making Hansken's REST API retrieve all the traces for that search result # the facet will be available on the result facet = results.facets[0] # different ways to use a FacetResult for label in facet: print(label, facet[label].count) for label, counter in facet.items(): print(label, counter.count) for counter in facet.values(): print(counter.label, f'[{counter.count}-{counter.count + counter.count_error}]')
- Counter
alias of
FacetCounter
- class Connection[source]
Bases:
object
Base remote connection establishing a session with a remote gatekeeper. Exposes many calls from the REST API as methods that perform the associated HTTP requests. Calls returning JSON-encoded content will return the decoded Python equivalent (e.g.
dict
,list
).- __init__(base_url, keystore_url=None, preference_url=None, auth=None, connection_pool_size=None, verify=True)[source]
Creates a new connection object facilitating communication to a Hansken gatekeeper, tracking session information provided by the remote end. Note that the username and password arguments are stored on the Connection object for future reuse. The value for password is wrapped in a lambda if it is supplied as a plain value. For production use, getpass.getpass should be supplied here, causing a non-echoing password prompt whenever a password is needed. Note that an authenticated session will likely be kept alive if requests are made periodically.
- Parameters:
base_url – HTTP endpoint to a Hansken gatekeeper (e.g. https://hansken.nl/gatekeeper)
keystore_url – HTTP endpoint to a Hansken keystore (e.g. https://hansken.nl/keystore)
preference_url – HTTP endpoint to a Hansken preference service (e.g. https://hansken.nl/preference)
auth –
HanskenAuthBase
instance to handle authentication, orNone
connection_pool_size – maximum size of HTTP(S) connection pool
verify –
how to check SSL/TLS certificates: -
True
: verify certificates (default); -False
: do not verify certificates; - path to a certificate file: verify certificates against aspecific certificate bundle.
- open()[source]
Establishes a session with the remote(s). Authentication is assumed to be handled by the auth within the session.
- Returns:
self
- close(check_response=True)[source]
Explicitly ends the session established with the remote(s).
- Parameters:
check_response – whether to check the response(s) for the expected status code, raising errors on unsuccessful logout(s)
- Returns:
self
- url(*path)[source]
Glues parts of a url path to the base url for this connection using /s, dropping any steps that are
None
.- Parameters:
path – steps to join to the base url
- Returns:
a full url
- key_url(*path)[source]
Glues parts of a url path to the keystore url for this connection using /s, dropping any steps that are
None
.- Parameters:
path – steps to join to the keystore url
- Returns:
a full url
- preferences_url(*path)[source]
Glues parts of a url path to the preference url for this connection using /s, dropping any steps that are
None
.- Parameters:
path – steps to join to the preference url
- Returns:
a full url
- single_preference_url(key, visibility)[source]
Creates url to retrieve/update/delete a single specific preference
- Parameters:
visibility – visibility of the preference
key – key indicating the preference
- Returns:
a full url
- version()[source]
Retrieves the version info reported by the remote.
- Returns:
a dict with keys
build
andtimestamp
- current_user()[source]
Retrieves information on the current user from the remote.
- Returns:
a
dict
with information and identifiers for the current user
- property identity
The current user’s identity for the current session.
- Returns:
the currently available user identity of the form <user>@<domain>
- identity_uid()[source]
Retrieves the current user’s identity as seen by the remote gatekeeper.
- Returns:
uid of the form <user>@<domain>
- key(image_id, identity=None, raise_on_not_found=True)[source]
Retrieves the key for an image with the provided id, using the provided identity or that of the current user.
Warning
hansken.py
cannot distinguish between an image not needing a key and a user that is not authorized to retrieve data from an image. In both cases, the remote key service would respond with a an error that is propagated to the caller be default. Internal uses of this method will typically set raise_on_not_found toFalse
, potentially delaying errors to the point of (unauthorized) data access.See also
fetch
.- Parameters:
image_id – the image to retrieve the key for
identity – the identity to retrieve the key for, defaults to the identity of the current user
raise_on_not_found – if
False
, returnNone
when no key is available for (image_id, identity), otherwise raise anHTTPError
like any other request
- Returns:
a key for image image_id
- Return type:
bytes
- store_key(image_id, key, identity=None)[source]
Stores the key for an image, using the provided or current user’s identity.
- Parameters:
image_id – the image to store the key for
key – the (binary) key content
identity – the identity to store the key for, defaults to the identity of the current user
- Returns:
True on success (failure will likely result in an HTTPError)
- delete_key(image_id, identity=None)[source]
Deletes the key for an image, using the provided or current user’s identity.
- Parameters:
image_id – the image to delete the key for
identity – the identity to delete the key for, defaults to the identity of the current user
- Returns:
True on success (failure will likely result in an HTTPError)
- preference(key, visibility='preferred', project_id=None)[source]
Retrieves a preference. Requires a project id if the visibility is set to project
- Parameters:
key – key to identify the preference
visibility – visibility of the preference. Can be either one of public, private, project or preferred
project_id – id of the project if the preference has project wide visibility
- Returns:
the preference
- preferences(visibility='preferred', project_id=None)[source]
Retrieves list of preferences with the given visibility. Requires a project id if the visibility is set to project
- Parameters:
visibility – visibility of the preference. Can be either one of public, private, project or preferred
project_id – id of the project. Required for when visibility is set to project
- Returns:
list of preferences
- create_preference(key, visibility, value, project_id=None)[source]
Creates a new preference. Requires a project id if the visibility is set to project
- Parameters:
key – key to identify the preference
visibility – visibility of the preference. Can be either one of public, private or project
value – value of the preference
project_id – id of the project if the preference has project wide visibility
- Returns:
True on success (failure will likely result in an HTTPError)
- edit_preference(key, visibility, value, project_id=None)[source]
Edits an existing preference by updating its value. Requires a project id if the visibility is set to project
- Parameters:
key – key to identify the preference
visibility – visibility of the preference. Can be either one of public, private or project
value – new value of the preference
project_id – id of the project if the preference has project wide visibility
- Returns:
True on success (failure will likely result in an HTTPError)
- delete_preference(key, visibility, project_id=None)[source]
Deletes an existing preference. Requires a project id if the visibility is set to project
- Parameters:
key – key to identify the preference
visibility – visibility of the preference. Can be either one of public, private or project
project_id – id of the project if the preference has project wide visibility
- Returns:
True on success (failure will likely result in an HTTPError)
- create_project(**kwargs)[source]
Creates a new meta project. The new project’s properties are taken from kwargs, and translated to Hansken property names (camelCased). Specifying the new image’s id is allowed.
- Parameters:
kwargs – metadata properties for the new project
- Returns:
the new project’s identifier (a str(UUID))
- wait_for_task(task_id, poll_interval_sec=1, not_found_means_completed=False, progress_callback=None)[source]
Wait for a Hansken task to be done.
- Parameters:
task_id – task to wait for
poll_interval_sec – the interval in seconds between polling
not_found_means_completed – whether to consider task not found a valid (completed) state. This is relevant for project deletion tasks after which the task itself will be gone as well.
progress_callback – method to be called every time the status is polled, method should take two arguments (poll_count: int, progress: double[0..1]), if not provided the callback will be skipped.
- Returns:
the status of the task when done (cancelled, failed, completed)
- delete_project(project_id, delete_images=False, delete_preferences=False)[source]
Deletes a meta project. Note that this will also remove a project’s search index and preferences.
- Parameters:
project_id – project to be deleted
delete_images – whether to also delete the images linked to the project
delete_preferences – whether to also delete the preferences linked project. This is the evidence container for example.
- Returns:
a success-value, whether the deletion of the project and all images (if requested) has succeeded (see log output for errors)
- reindex_project(project_id, shards)[source]
Re-indexes a project to a specified number of shards.
- Parameters:
project_id – project to be re-indexed
shards – the desired number of shards for the new index
- Returns:
the task_id of the re-index task
- clone_project(source_id, target_id, filter=None, exclude=None)[source]
Clones traces of a project identified by source_id to a project target_id. When filter (a query) is provided, only traces matching the filter are copied.
- Parameters:
source_id – project to copy traces from
target_id – project to copy traces to
filter – query to define what traces to copy
exclude – properties to be excluded from the clone (not all properties are supported for this, supplying an unsupported property here will cause errors)
- Returns:
the task_id of the clone task
- create_image(**kwargs)[source]
Creates a new meta image. The new image’s properties are taken from kwargs, and translated to Hansken property names (camelCased). Specifying the new image’s id is allowed. Property user defaults to the current session’s identity, overrides are accepted.
- Parameters:
kwargs – metadata properties for the new image
- Returns:
the new image’s identifier (a str(UUID))
- data(project_id, trace_uid, stream='raw', offset=0, size=None, key=<auto-fetch>, bufsize=8192)[source]
Opens a streaming read request that provides the requested data stream of a particular trace. Note that this provides a stream that should be closed by the user after use.
- Parameters:
project_id – the project to associate the request with
trace_uid – the trace identifier to read from
stream – the stream to read (e.g. raw, plain, html)
offset – the offset within the data stream
size – the amount of data to provide
key – the key data to be used to decrypt data, must be either
fetch
,None
or binary (bytes
)bufsize – the network buffer size
- Returns:
a closable file-like object
- Return type:
io.BufferedReader
- create_resource(*, data, **kwargs)[source]
Creates a Hansken resource in two stages, registering the resource metadata before uploading the corresponding data.
Note
Required parameters for a resources are not validated by
hansken.py
. Missing parameters like group, name or version will likely result inHTTPError
exceptions.- Parameters:
data – data blob (either bytes or a file-like object) to be stored (the actual resource)
kwargs – the resource meta data associated with data
- Returns:
the id of the newly created resource
- resources(**kwargs)[source]
List available resources, optionally filtering by provided properties supplied as keyword arguments.
- Parameters:
kwargs – properties to filter for (e.g.
group='nl.nfi.example'
)- Returns:
list
of resources matching kwargs
- children(project_id, trace_uid, query=None, start=0, count=None, sort=None, facets=None, snippets=None, select=<all>, incomplete_results=None, timeout=None)[source]
Searches the children of a trace.
- Parameters:
project_id – id of the project to search in
trace_uid – id of the trace retrieve children for
query – query to apply on the children of trace_uid (default: retrieve all children)
start – the start offset to be included
count – max number of children to be retrieved
sort – sort clause(s) of the form
score-
,some.field
facets – facet(s) to be used for search (
str
,Facet
or sequence of either)snippets – maximum number of snippets to return per trace
select – property selector(s), defaults to all properties
incomplete_results – whether to allow results from an incomplete index (defaults to letting the remote decide)
timeout – the maximum amount of time to wait for a search response, in seconds or as a
datetime.timedelta
(defaults to letting the remote decide), set to 0 to disable timeout
- Returns:
a file-like object, streaming the raw json response from remote
- import_trace(project_id, trace, data=None, key=<auto-fetch>, method='heuristic', properties=None, overwrite=False)[source]
Imports the requested properties on trace into a trace in project.
Please note that, for performance reasons, all changes are buffered and not directly effective in subsequent search, update and import requests. As a consequence, successive changes to a single trace might be ignored. Instead, all changes to an individual trace should be bundled in a single update or import request. The project index is refreshed automatically (by default every 30 seconds), so changes will become visible eventually.
- Parameters:
project_id – the project to import trace into
trace – the trace to be imported
data – a
dict
mapping data type / stream name to bytes to be importedkey – the key data for image trace belongs to, must be
fetch
,None
or binary (bytes
)method – a method to match trace to an existing trace in project, either
'strict'
or'heuristic'
properties – the properties to be imported
overwrite – whether properties to be imported should be overwritten if already present
- Returns:
a response object encoding the import result, detailing what trace the imported trace was matched to and what properties were imported
- add_trace_data(project_id, trace_uid, data_type, data, key=<auto-fetch>)[source]
Uploads a single data stream for a specific trace.
- Parameters:
project_id – the project to import data into
trace_uid – uid of the trace
data_type – the name of the data stream to upload data to. Allowed data types can be found in the trace model
data – the data to be uploaded, either
bytes
, a file-like object or an iterable providing byteskey – the key data for image trace belongs to, must be
fetch
,None
or binary (bytes
)
- Returns:
True on success (failure will likely result in an HTTPError)
- create_trace(project_id, parent_uid, child, data=None, key=<auto-fetch>)[source]
Requests a new trace to be indexed as a child trace of an existing trace.
- Parameters:
project_id – id of the project to index the trace into
parent_uid – the trace uid of the trace to attach the new child to
child – the new trace to be indexed
data – a
dict
mapping data type / stream name to bytes to be attached to the new tracekey – the key data for image parent_uid and thus child belong to
- Returns:
the id of the newly created trace
- search(project_id, query=None, start=0, count=None, sort=None, facets=None, snippets=None, select=<all>, incomplete_results=None, deduplicate_field=None, timeout=None)[source]
Performs a search request built from the parameters.
- param project_id:
id of the project to search in
- param query:
the query to submit
- param start:
the start offset to be included
- param count:
max number of traces to be retrieved
- param sort:
sort clause(s) of the form
score-
,some.field
- param facets:
facet(s) to be used for search (
str
,Facet
or sequence of either)- param snippets:
maximum number of snippets to return per trace
- param select:
property selector(s), defaults to all properties
- param incomplete_results:
whether to allow results from an incomplete index (defaults to letting the remote decide, which will typically not allow results from an incomplete index)
- Parameters:
deduplicate_field –
- which single value field to deduplicate on
(defaults to None: the results are not deduplicated)
- param timeout:
the maximum amount of time to wait for a search response, in seconds or as a
datetime.timedelta
(defaults to letting the remote decide), set to 0 to disable timeout- return:
a file-like object, streaming the raw json response from remote
- queue_cleaning(project_id, query=None, clean_priority='low')[source]
Performs a search request built from the parameters, and puts the results on the cleaner queue with the given priority.
- Parameters:
project_id – id of the project to search in
query – the query to submit
clean_priority – the priority to use for cleaning the traces
- Returns:
True on success (failure will likely result in an HTTPError)
- delete_traces(project_id, query=None)[source]
Deletes traces from a project. This deletes traces from the index and the datastore and will also delete the trace children. Note, this is only possible when the image is of type
VNFI
and the image, from which the traces will be deleted, is linked to only one project. When requirements and permissions are met, a task is scheduled to delete the traces.- Parameters:
project_id – id of the project to search in
query – the query to submit
- Returns:
the task_id of the delete task
- delete_cleaned_data_image(project_id, image_id)[source]
Deletes the cleaned data from the cache for all traces in the given image. This can be used to force a re-clean of the trace data, because the next time the cleaned data is requested for one of these traces the cleaner will run again.
- Parameters:
project_id – id of the project
image_id – the image
- delete_cleaned_data_project(project_id)[source]
Delete cleaned data for all images of a project.
- Parameters:
project_id – id of the project
- delete_cleaned_data_trace(project_id, trace_uid, data_type=None)[source]
Delete cleaned data from the datastore (cache) for a trace. This can be used to force a re-clean of the trace data, because the next time the cleaned data is requested for this trace the cleaner will run again.
- Parameters:
project_id – id of the project the trace belongs to
trace_uid – the trace the data belongs to
data_type – the type of the cleaned data stream to delete; e.g. raw, encrypted…. If not provided, all cleaned data is deleted
- search_tracelets(project_id, tracelet_type, query=None, start=0, count=None, sort='id', select=<all>, incomplete_results=None, timeout=None)[source]
Performs a search request for tracelets built from the parameters.
- Parameters:
project_id – id of the project to search in
tracelet_type – the type of tracelet to search for
query – the query to submit
start – the start offset to be included
count – max number of traces to be retrieved
sort – sort clause(s) of the form
score-
,some.field
, defaults to sorting onid
select – property selector(s), defaults to all properties
incomplete_results – whether to allow results from an incomplete index (defaults to letting the remote decide, which will typically not allow results from an incomplete index)
timeout – the maximum amount of time to wait for a search response, in seconds or as a
datetime.timedelta
(defaults to letting the remote decide), set to 0 to disable timeout
- Returns:
a file-like object, streaming the raw json response from remote
- unique_values(project_id, select, query=None, after=None, count=None, incomplete_results=None, timeout=None)[source]
- create_search_request(query=None, *, start=0, count=None, sort=None, facets=None, snippets=None, tracelet_type=None, project_ids=None, select=<all>, incomplete_results=None, deduplicate_field=None, timeout=None)[source]
Creates a search request from the parameters, transforming parameters when needed.
- param query:
the query to submit
- param start:
the start offset to be included
- param count:
max number of traces to be included
- param sort:
sort clause(s) of the form
score-
,some.field
or instance(s) ofSort
- param facets:
facet(s) to be used for search (
str
,Facet
or sequence of either)- param snippets:
maximum number of snippets to return per trace
- param tracelet_type:
the type of tracelet to search for (only applicable to tracelet searches)
- param project_ids:
a collection of project ids to pass this search request to, or
None
- param select:
property selector(s), defaults to all properties
- param incomplete_results:
whether to allow results from an incomplete index (defaults to letting the remote decide)
- Parameters:
deduplicate_field –
- which single value field to deduplicate on
(defaults to None: the results are not deduplicated)
- param timeout:
the maximum amount of time to wait for a search response, in seconds or as a
datetime.timedelta
(defaults to letting the remote decide), set to 0 to disable timeout
- suggest(project_id, text, query_property='text', count=100)[source]
Expand a search term from terms in the index.
- Parameters:
project_id – the project id to suggest terms from
text – the text / search term to be expanded
query_property – the search property to expand terms for (e.g.
'file.name'
or'text'
)count – the maximum number of suggestions to return
- Returns:
list
of suggestions
- tasks(state='open', project_id=None, start=None, end=None)[source]
Request a listing of tasks.
- Parameters:
state – the state of tasks to be listed, can be either
'open'
or'closed'
project_id – an optional project id to list tasks for
start – an optional
datetime.date
,datetime.datetime
or ISO 8601-formattedstr
to limit the tasks to be listed having its relevant moment after stateend – an optional
datetime.date
,datetime.datetime
or ISO 8601-formattedstr
to limit the tasks to be listed having its relevant moment before state
- Returns:
a
list
of tasks
- task_export_key(task_id, *, image_key=<auto-fetch>, image_id=None)[source]
Retrieve the export key for a specific task.
Note that this requires the key for the image this task is performed on, supplied either direct or by looking it up through a supplied image id.
- Parameters:
task_id – the task for which to retrieve the export key
image_key – key for the task’s image (mutually exclusive with image_id)
image_id – image id for the task (mutually exclusive with image_key)
- Returns:
the task’s export key
- log_messages(project_id, image_id, message_type='log', task_id=None)[source]
Retrieves log messages for the extraction of image image_id within the project project_id.
- Parameters:
project_id – the project for which the extraction was run
image_id – the image id for which to retrieve the log messages
message_type – the type of message to retrieve, either
'log'
or'failedTrace'
task_id – retrieve only messages related to a particular task id
- Returns:
a
list
of log messages,dict`s with keys `
”date”`` and"message"
- tools(project_id=None)[source]
Get the tools available for extraction.
- Parameters:
project_id – get the tools available for a specific project (leave
None
for all tools)- Returns:
a
dict
, mapping the name of a tool to its version and human-friendly description
- extract(project_id, image_id, type='index', key=<auto-fetch>, tools=None, configuration=None, query=None, pre_clean_priority=None, engine=None)[source]
Extract traces from an image in the context of a project.
- Parameters:
project_id – the project the extraction should be part of
image_id – the image to be extracted
type – the type of extraction to start
key – the key data to be used to decrypt data, must be either
fetch
,None
or binary (bytes
)tools – the tools to be used; either a sequence of tool names or
None
, indicating to use the default tools (see alsotools
)configuration – configuration overrides for this extraction, as a dict (e.g.
{'timeout': 0}
)query – a query to use for extraction types other than ‘index’
pre_clean_priority – the pre-clean priority used for this extraction, must be either
'low'
,'medium'
,'high'
or useNone
to use the default. To disable pre- cleaning, use'disabled'
engine – the extraction engine to use, if unset the default engine is used
- Returns:
the job id of the extraction that is started
- extract_singlefile(singlefile_id, tools=None, configuration=None)[source]
Schedules the extraction for the singlefile with ID {singlefile_id}. This is an asynchronous call that only initiates the extraction. To monitor the progress of the extraction, use the ‘get_singlefile’ call to retrieve the states of the singlefile image object.
- Parameters:
singlefile_id – the id of the single file
tools – the tools to be used; either a sequence of tool names or
None
, indicating to use the default tools (see alsotools
)configuration – configuration overrides for this extraction, as a dict (e.g.
{'timeout': 0}
)
- Returns:
the job id of the extraction that is initiated
- backup_project(project_id, user_backup_key, image_keys=<auto-fetch>)[source]
Creates a backup of a project.
- Parameters:
project_id – the project to make a backup of
user_backup_key – the key data to be used to encrypt the backup data, must be binary (
bytes
)image_keys – the key data to be used to decrypt data, must be a dict whose entries have an image id as key and have a binary (
bytes
) value. If no dict is provided, the image keys are fetched from the keystore.
- Returns:
the job id of the backup task that is started
- export_project(project_id, user_export_key, image_keys=<auto-fetch>, query=None, include_priviliged=False, include_notes=False, include_tags=False, include_entities=False, include_image_data=False, image_id=None)[source]
Initiates an export of a project.
- Parameters:
project_id – the project to make an export of
user_export_key – the key data to be used to encrypt the export data, must be binary (
bytes
)image_keys – the key data to be used to decrypt data, must be a dict whose entries have an image id as key and have a binary (
bytes
) value. If no dict is provided, the image keys are fetched from the keystore.query – the query used to select all, or a subset, of traces from a project to be exported
include_priviliged – if priviliged traces should be exported
include_notes – if notes should be exported
include_tags – if tags should be exported
include_entities – if entities should be exported
include_image_data – if a new sliced image should be generated; if true, image_id should be set, too
image_id – the UUID of the original image to generate a new sliced image from
- Returns:
the job id of the export task that is started
- prepare_project_import(project_id, user_export_key, export_file, image_keys=<auto-fetch>)[source]
Importing an export into Hansken is a two stage process, in this first stage the export is uploaded and validation occurs to see if the import can go ahead.
- Parameters:
project_id – the project to which the exported traces need to be added
user_export_key – the key data to be used to decrypt the exported data, must be binary (
bytes
)export_file – the on disk file export file that is to be imported
image_keys – the key data to be used to decrypt data, must be a dict whose entries have an image id as key and have a binary (
bytes
) value. If no dict is provided, the image keys are fetched from the keystore.
- Returns:
the job id of the import task that is started
- apply_project_import(project_id, user_export_key, task_id, image_keys=<auto-fetch>, project_metadata_import_strategy=ImportStrategy.UPDATE, images_metadata_import_strategy=ImportStrategy.UPDATE)[source]
In the second stage of the import process (the first stage is handled by prepare_project_import) traces from the previously uploaded export file are added to the specified project
- Parameters:
project_id – the project to add the exported traces to
user_export_key – the key data to be used to decrypt the exported data, must be binary (
bytes
)task_id – the task id of the import task we want to apply now
image_keys – the key data to be used to decrypt data, must be a dict whose entries have an image id as key and have a binary (
bytes
) value. If no dict is provided, the image keys are fetched from the keystore.
- Param:
project_metadata_import_strategy: import strategy for the project data in the export
- Param:
images_metadata_import_strategy: import strategy for the image data in the export
- Returns:
the job id of the import apply task that is started
- upload_image(image_id, data, extension=None, offset=None)[source]
Uploads image data. Note that data in the NFI format requires two files (
.nfi
(data) and.nfi.idx
(index)) and thus two upload calls.- Parameters:
image_id – the image id of the data to be uploaded
data – the image data to be uploaded, either
bytes
, a file-like object or an iterable providing bytes (see documentation onrequests
’ upload support)extension – file extension of the upload, either
'.nfi'
,'.nfi.idx'
orNone
offset – byte offset of data within the complete file to be uploaded
- Returns:
the image id of the uploaded data
- upload_singlefile(data, name)[source]
A singlefile is a temporary single data source for quick extraction of traces. Upload a singlefile performs a number of steps:
create a project with property hidden set to true
create an image with property hidden set to true
link the image to the project
upload the singlefile data
- Parameters:
data – the image data to be uploaded, either
bytes
, a file-like object or an iterable providing bytes (see documentation onrequests
’ upload support)name – the name for the singlefile project and image
- Returns:
the singlefile’s identifier (a str(UUID))
- singlefile_traces(singlefile_id)[source]
Get all traces within a singlefile.
- Parameters:
singlefile_id – unique id of the singlefile, UUID
- Returns:
a file-like object, streaming the raw json response from remote