Filter class

class querier.Filter[source]

Dict subclass to compose search criteria for extracting data from a database.

They are used in querier.Connection methods to filter data from the database. A filter is composed by simple conditions. By default, an empty filter will match all the entries from the database. A condition is added by calling any condition method on the filter object.

For example, the following code:

import querier as qr
f = qr.Filter()
f.greater_than('x', 0)
f.less_or_equals('x', 100)

Creates a filter that matches all entries with a field named ‘x’ whose value is in range (0, 100]. As showed in the example, a call to a condition method performs an AND operation. (To perform an OR operation, see Filter.or_filter())

Note

Entries are collections of pairs field_id-value. They are returned as python dictionaries and values can of any type: string, numeric, list, dictionary,… Excerpt of an example entry from the twitter database:

{
    "lang": "es",
    "place": {
        "coordinates": [[[-109.479171, -56.557358],
                        [-109.479171, -17.497384],
                        [-66.15203, -17.497384],
                        [-66.15203, -56.557358]]]
        "country_code": "ES",
    },
    "favorite_count": 124,
}

To filter by nested fields the dot (‘.’) notation should be used. For example, the following code:

f = Filter()
f.any_of('place.country_code', ['ES', 'FR', 'PT'])

will match entries from Spain (ES), France (FR) or Portugal (PT).

and_filter(other)[source]

Perform an AND operation with a second filter, modifies inplace.

Parameters:

other (Filter | dict) – Another Filter object or dict to perform the operation with.

Return type:

Filter

Returns:

self

Raises:

InvalidFilter – if one of the filters is empty or both filters are the same object

Examples

Create a Filter f1 that matches tweets with a number of retweets between 500 and 1000:

import querier as qr

f1 = qr.Filter()
f1.greater_or_equals('retweet_count', 500)

f2 = qr.Filter()
f2.less_or_equals('retweet_count', 1000)

f1.and_filter(f2)

Alternative notation using the & operator, not modifying f1 in place but getting the resulting Filter as a new instance f3:

f3 = f1 & f2
any_of(field_id, values)[source]

Add a condition to the filter that matches when the field is equal to any of the values in the list.

python equivalent: field_id in values

Return type:

Filter

copy()[source]
Return type:

Filter

equals(field_id, value)[source]

Add a condition to the filter that matches when the field is equal to a value.

python equivalent: field_id == value

Return type:

Filter

exists(field_id)[source]

Add a condition to the filter that matches when the field exist.

python equivalent: field_id is not None

Return type:

Filter

geo_intersects(field_id, geo, geo_type='Polygon', invert=False)[source]

Add a condition to the filter that matches when the field’s geometry intersects with a geometry geo.

Parameters:
  • field_id (str) – Name of the field supposed to contain geometries.

  • geo (GeoFilterType) – Geometry that has to be intersected to pass the filter. Can be given as a shapely geometry or as nested lists giving the coordinates of the geometry. For geo_type="bbox", should be [minx, miny, maxx, maxy].

  • geo_type (str) – Type of the geometry: “Polygon”, “MultiPolygon” or “bbox”. Does not need to be specified when geo is a shapely geometry.

  • invert (bool) – If True, filters out geometries that intersect geo, if False (default), keeps only those that intersect geo.

Return type:

Filter

Warning

Geometries in MongoDB use EPSG:4326 as the default coordinate reference system (CRS). Beware also that the geometries in the field field_id should be valid, so for instance polygons should be closed. This is not the case for place.bounding_box in tweet collections, and only the case for valid_bounding_box in places collections.

geo_within(field_id, geo, geo_type='Polygon', invert=False)[source]

Add a condition to the filter that matches when the field’s geometry is fully contained within a geometry geo.

Parameters:
  • field_id (str) – Name of the field supposed to contain geometries.

  • geo (GeoFilterType) – Geometry within which they have to be to pass the filter. Can be given as a shapely geometry or as nested lists giving the coordinates of the geometry. For geo_type="bbox", should be [minx, miny, maxx, maxy].

  • geo_type (str) – Type of the geometry: “Polygon”, “MultiPolygon” or “bbox”. Does not need to be specified when geo is a shapely geometry.

  • invert (bool) – If True, filters out geometries that are within geo, if False (default), keeps only those that are within geo.

Return type:

Filter

Warning

Geometries in MongoDB use EPSG:4326 as the default coordinate reference system (CRS). Beware also that the geometries in the field field_id should be valid, so for instance polygons should be closed. This is not the case for place.bounding_box in tweet collections, and only the case for valid_bounding_box in places collections.

greater_or_equals(field_id, value)[source]

Add a condition to the filter that matches when the field is greater or equal to a value.

python equivalent: field_id >= value

Return type:

Filter

greater_than(field_id, value)[source]

Add a condition to the filter that matches when the field is greater than a value.

python equivalent: field_id > value

Return type:

Filter

is_empty()[source]

Return True if the filter is empty (has no conditions), False otherwise.

Return type:

bool

less_or_equals(field_id, value)[source]

Add a condition to the filter that matches when the field is less or equal to a value.

python equivalent: field_id <= value

Return type:

Filter

less_than(field_id, value)[source]

Add a condition to the filter that matches when the field is less than a value.

python equivalent: field_id < value

Return type:

Filter

none_of(field_id, values)[source]

Add a condition to the filter that matches when the field is not equal to any of the values in the list.

python equivalent: field_id not in values

Return type:

Filter

not_equals(field_id, value)[source]

Add a condition to the filter that matches when the field is not equal to a value.

python equivalent: field_id != value

Return type:

Filter

not_exists(field_id)[source]

Add a condition to the filter that matches when the field doesn’t exist.

python equivalent: field_id is None

Return type:

Filter

or_filter(other)[source]

Perform an OR operation with a second filter, modifies inplace.

Parameters:

other (Filter | dict) – Another Filter object or dict to perform the operation with.

Return type:

Filter

Returns:

self

Raises:

InvalidFilter – if one of the filters is empty or both filters are the same object.

Examples

Create a Filter f1 that matches tweets with a number of retweets larger than 1000 OR a number of favorites larger than 500:

import querier as qr

f1 = qr.Filter()
f1.greater_or_equals('retweet_count', 1000)

f2 = qr.Filter()
f2.greater_or_equals('favorite_count', 500)

f1.or_filter(f2)

Alternative notation using the | operator, not modifying f1 in place but getting the resulting Filter as a new instance f3:

f3 = f1 | f2
regex(field_id, pattern)[source]

Add a condition to the filter that matches when the field matches the regular expression pattern.

python equivalent: pattern.match(field_id) is not None

Return type:

Filter