mgkit.io.utils module¶
Various utilities to help read and process files
-
exception
mgkit.io.utils.
UnsupportedFormat
[source]¶ Bases:
OSError
Raised if the a file can’t be opened with the correct module
-
mgkit.io.utils.
compressed_handle
(file_handle)[source]¶ New in version 0.1.13.
Tries to wrap a file handle in the appropriate compressed file class.
- Parameters
file_handle (str) – file handle
- Returns
the same file handle if no suitable compressed file class is found or the new file_handle which supports the compression
- Return type
file
- Raises
UnsupportedFormat – if the module to open the file is not available
-
mgkit.io.utils.
group_tuples_by_key
(iterator, key_func=None, skip_elements=0)[source]¶ New in version 0.3.1.
Group the elements of an iterator by a key and yields the grouped elements. The elements yielded by the iterator are assumed to be a list or tuple, with the default key (when key_func is None) being the first of the of the objects inside that element. This behaviour can be customised by passing to key_func a function that accept an element and returns the key to be used.
Note
the iterable assumen that the elements are already sorted by their keys
- Parameters
iterator (iterable) – iterator to be grouped
key_func (func) – function that accepts a element and returns its associated key
skip_elements (int) – number of elements to skip at the start
- Yields
list – a list of the grouped elements by key
-
mgkit.io.utils.
open_file
(file_name, mode='r')[source]¶ New in version 0.1.12.
Changed in version 0.3.4: using io.open, always in binary mode
Changed in version 0.4.2: when a file handle is detected, it is passed to
compressed_handle()
to detect if the handle is a compressed fileOpens a file using the extension as a guide to which module to use.
Note
Unicode makes for a slower .translate method in Python2, so it’s best to use the open builtin.
- Parameters
- Returns
file handle
- Return type
file
- Raises
UnsupportedFormat – if the module to open the file is not available
-
mgkit.io.utils.
split_write
(records, name_mask, write_func, num_files=2)[source]¶ New in version 0.1.13.
Splits the writing of a number of records in a series of files. The name_mask is used as template for the file names. A string like “split-files-{0}” can be specified and the function applies format with the index of the pieces.
- Parameters
records (iterable) – an iterable that returns a object to be saved
name_mask (str) – a string used as template for the output file names on which the function applies
string.format()
write_func (func) – a function that is called to write to the files. It needs to accept a file handles as first argument and the record returned by records as the second argument
num_files (int) – the number of files to split the records