kodiak package

Submodules

kodiak.args_dict_builder module

class kodiak.args_dict_builder.ArgsDictBuilder(parser=None, transform=None, new_col_combiner=None)[source]

Bases: object

build(new_col)[source]

kodiak.args_parser module

class kodiak.args_parser.ArgsParser(pattern=None, separator=', ')[source]

Bases: object

parse(string)[source]
class kodiak.args_parser.Match(original, label=None, value=None, payload=None)[source]

Bases: object

An object generated after the process of matching and passed to the colbuilder

original

the unmodified matched string

Type:str
value

a possible derived string from original

Type:str
label

used as the name or title of the Match

Type:str
payload

a dict with extra information as default_colbuilder that can be used by the colbuilder in kodiak_dataframe.gencol

Type:dict

kodiak.colbuilders module

Helper methods to use as colbuilders with colgen and mutcol

kodiak.colbuilders.as_attribute(x, y)[source]

interprets Match value y as an attribute of x

kodiak.colbuilders.as_method(x, y)[source]

interprets Match value y as an instance method of x

kodiak.colbuilders.splitter(pattern=None)[source]

A builder function that returns a colbuilder

Parameters:pattern – a string pattern used to split a string

Example

>>> from kodiak.kodiak_dataframe import KodiakDataFrame
>>> from kodiak.colbuilders import splitter
>>> df = KodiakDataFrame({'name': ['Groucho Marx', 'Harpo Marx']})
>>> df.gencol('{first,last}_name', 'name', splitter(" "))

Will return the following data frame:

>>> #            name  first_name  last_name
>>> # 0  Groucho Marx    Groucho      Marx
>>> # 1    Harpo Marx      Harpo      Marx
Returns:A function used as a colbuilder

kodiak.config module

kodiak.config.base_config(parser=None, match_transform=None, new_col_combiner=None, unpack=None, drop=None, col_pair_combiner=None)[source]

Default config used by gencol and mutcol

Parameters:
  • parser – Kodiak by default uses ArgsParser to parse newcols
  • match_transform – data passed to the colbuilder could be transformed first, by default we use the default_transform pipeline, you could replace it with an array of Transforms objects.
  • new_col_combiner – params present in the newcols template provide arguments to the colbuilder you can combine arguments in different groups in different ways, ie: “foo_{a,b}_{c,d}” has two groups: [‘a’,’b’] and [‘c’, ‘d’] by default we use zip but you could replace it with a function with equal signature.
  • unpack (bool) – True by default. The arguments passed to the colbuilder is of type Match in certain occasions you can pass strings
  • drop (bool) – False by default. Set to True if you want to drop the column col in gencol after the new columns are created
  • col_pair_combiner – Once you have the arguments from the newcol template string they’re combined with the data extracted from the col. This option controls the way this two elements are combined. Currently we use product from itertools, any replacement must fulfill the same signature.
Returns:

dict with base config options

kodiak.config.cfg(parser=None, match_transform=None, new_col_combiner=None, unpack=None, drop=None, col_pair_combiner=None)

Default config used by gencol and mutcol

Parameters:
  • parser – Kodiak by default uses ArgsParser to parse newcols
  • match_transform – data passed to the colbuilder could be transformed first, by default we use the default_transform pipeline, you could replace it with an array of Transforms objects.
  • new_col_combiner – params present in the newcols template provide arguments to the colbuilder you can combine arguments in different groups in different ways, ie: “foo_{a,b}_{c,d}” has two groups: [‘a’,’b’] and [‘c’, ‘d’] by default we use zip but you could replace it with a function with equal signature.
  • unpack (bool) – True by default. The arguments passed to the colbuilder is of type Match in certain occasions you can pass strings
  • drop (bool) – False by default. Set to True if you want to drop the column col in gencol after the new columns are created
  • col_pair_combiner – Once you have the arguments from the newcol template string they’re combined with the data extracted from the col. This option controls the way this two elements are combined. Currently we use product from itertools, any replacement must fulfill the same signature.
Returns:

dict with base config options

kodiak.config.restore_default_config(*keys)[source]

Restore original configuration on all or specific properties

If no key is present the whole configuration will be restored, if keys are present only them will be restored

Parameters:keys – a list of strings that correspond to options
Returns:Nothing
Raises:KeyError if key is not a valid option

kodiak.kodiak_dataframe module

class kodiak.kodiak_dataframe.KodiakDataFrame(*args, **kwargs)[source]

Bases: pandas.core.frame.DataFrame

A KodiakDataFrame is a pandas.DataFrame that has new capabilities to ease your workflow: gencol and mutcol

Example

>>> from kodiak import KodiakDataFrame
>>> kdf = KodiakDataFrame({'country': ['ar','br','cl','co']})
gencol(newcols, col, colbuilder=None, drop=None, enum=False, config=None)[source]

Generate new columns following the newcols pattern based on col

Parameters:
  • newcols (str) – new column/s template string
  • col (str) – column name from where data is taken
  • colbuilder

    a function to build the new columns, could be omitted if it can be deduced from newcols. Usually the signature of the function has two arguments x, y, x is the data extracted from col and y is the argument extracted from the newcols template.

    Example

    If newcol is "born_{month,day,year}", col is born and an instance of born is the date 1980-12-24, then in different instances x, y would be ('1980-12-24', 'month') ('1980-12-24','day') ('1980-12-24','year')

  • drop (bool) – True if you want to drop the column col
  • enum (bool) – False by default. If true, it expects that the signature of the colbuilder has three arguments: index, x and y
  • config – custom configuration build with base_config
Raises:

ValueError

mutcol(col, colbuilder=None, config=None)[source]

Mutates the column col. Similar to gencol with newcols and col equals to col

kodiak.kodiak_dataframe.default_colbuilder(x, y)[source]

Uses Match payload attribute to extract a default colbuider

kodiak.transforms module

class kodiak.transforms.ComposerTransform(transforms)[source]

Bases: object

transform(match)[source]
class kodiak.transforms.IntTransform[source]

Bases: object

transform(match)[source]
class kodiak.transforms.MethodTransform[source]

Bases: object

transform(match)[source]

Adds to the Match object payload the default_colbuilder: colbuilders.as_method

Parameters:match (Match) – The Match object that is going to be enriched.
Returns:The enriched Match object with a default_colbuilder key in the payload
Return type:Match
Raises:ValueError – in case the Match value attribute is ambiguous.
class kodiak.transforms.PropertyTransform[source]

Bases: object

transform(match)[source]

Adds to the Match object payload the default_colbuilder: colbuilders.as_attribute

Parameters:match (Match) – The Match object that is going to be enriched.
Returns:The enriched Match object with a default_colbuilder key in the payload
Return type:Match
Raises:ValueError – in case the Match value attribute is ambiguous.
kodiak.transforms.is_number(s)[source]

Module contents

Top-level package for kodiak.