Filter class
- class querier.Filter[source]
Dict subclass to compose search criteria for extracting data from a database.
They are used in
querier.Connectionmethods to filter data from the database. A filter is composed by simple conditions. By default, an empty filter will match all the entries from the database. A condition is added by calling any condition method on the filter object.For example, the following code:
import querier as qr f = qr.Filter() f.greater_than('x', 0) f.less_or_equals('x', 100)
Creates a filter that matches all entries with a field named ‘x’ whose value is in range (0, 100]. As showed in the example, a call to a condition method performs an AND operation. (To perform an OR operation, see
Filter.or_filter())Note
Entries are collections of pairs field_id-value. They are returned as python dictionaries and values can of any type: string, numeric, list, dictionary,… Excerpt of an example entry from the twitter database:
{ "lang": "es", "place": { "coordinates": [[[-109.479171, -56.557358], [-109.479171, -17.497384], [-66.15203, -17.497384], [-66.15203, -56.557358]]] "country_code": "ES", }, "favorite_count": 124, }
To filter by nested fields the dot (‘.’) notation should be used. For example, the following code:
f = Filter() f.any_of('place.country_code', ['ES', 'FR', 'PT'])
will match entries from Spain (ES), France (FR) or Portugal (PT).
- and_filter(other)[source]
Perform an AND operation with a second filter, modifies inplace.
- Parameters:
other (Filter | dict) – Another Filter object or dict to perform the operation with.
- Return type:
- Returns:
self
- Raises:
InvalidFilter – if one of the filters is empty or both filters are the same object
Examples
Create a Filter f1 that matches tweets with a number of retweets between 500 and 1000:
import querier as qr f1 = qr.Filter() f1.greater_or_equals('retweet_count', 500) f2 = qr.Filter() f2.less_or_equals('retweet_count', 1000) f1.and_filter(f2)
Alternative notation using the
&operator, not modifying f1 in place but getting the resulting Filter as a new instance f3:f3 = f1 & f2
- any_of(field_id, values)[source]
Add a condition to the filter that matches when the field is equal to any of the values in the list.
python equivalent:
field_id in values- Return type:
- equals(field_id, value)[source]
Add a condition to the filter that matches when the field is equal to a value.
python equivalent:
field_id == value- Return type:
- exists(field_id)[source]
Add a condition to the filter that matches when the field exist.
python equivalent:
field_id is not None- Return type:
- geo_intersects(field_id, geo, geo_type='Polygon', invert=False)[source]
Add a condition to the filter that matches when the field’s geometry intersects with a geometry geo.
- Parameters:
field_id (str) – Name of the field supposed to contain geometries.
geo (GeoFilterType) – Geometry that has to be intersected to pass the filter. Can be given as a shapely geometry or as nested lists giving the coordinates of the geometry. For
geo_type="bbox", should be[minx, miny, maxx, maxy].geo_type (str) – Type of the geometry: “Polygon”, “MultiPolygon” or “bbox”. Does not need to be specified when geo is a shapely geometry.
invert (bool) – If True, filters out geometries that intersect geo, if False (default), keeps only those that intersect geo.
- Return type:
Warning
Geometries in MongoDB use
EPSG:4326as the default coordinate reference system (CRS). Beware also that the geometries in the field field_id should be valid, so for instance polygons should be closed. This is not the case forplace.bounding_boxin tweet collections, and only the case forvalid_bounding_boxin places collections.
- geo_within(field_id, geo, geo_type='Polygon', invert=False)[source]
Add a condition to the filter that matches when the field’s geometry is fully contained within a geometry geo.
- Parameters:
field_id (str) – Name of the field supposed to contain geometries.
geo (GeoFilterType) – Geometry within which they have to be to pass the filter. Can be given as a shapely geometry or as nested lists giving the coordinates of the geometry. For
geo_type="bbox", should be[minx, miny, maxx, maxy].geo_type (str) – Type of the geometry: “Polygon”, “MultiPolygon” or “bbox”. Does not need to be specified when geo is a shapely geometry.
invert (bool) – If True, filters out geometries that are within geo, if False (default), keeps only those that are within geo.
- Return type:
Warning
Geometries in MongoDB use
EPSG:4326as the default coordinate reference system (CRS). Beware also that the geometries in the field field_id should be valid, so for instance polygons should be closed. This is not the case forplace.bounding_boxin tweet collections, and only the case forvalid_bounding_boxin places collections.
- greater_or_equals(field_id, value)[source]
Add a condition to the filter that matches when the field is greater or equal to a value.
python equivalent:
field_id >= value- Return type:
- greater_than(field_id, value)[source]
Add a condition to the filter that matches when the field is greater than a value.
python equivalent:
field_id > value- Return type:
- is_empty()[source]
Return True if the filter is empty (has no conditions), False otherwise.
- Return type:
- less_or_equals(field_id, value)[source]
Add a condition to the filter that matches when the field is less or equal to a value.
python equivalent:
field_id <= value- Return type:
- less_than(field_id, value)[source]
Add a condition to the filter that matches when the field is less than a value.
python equivalent:
field_id < value- Return type:
- none_of(field_id, values)[source]
Add a condition to the filter that matches when the field is not equal to any of the values in the list.
python equivalent:
field_id not in values- Return type:
- not_equals(field_id, value)[source]
Add a condition to the filter that matches when the field is not equal to a value.
python equivalent:
field_id != value- Return type:
- not_exists(field_id)[source]
Add a condition to the filter that matches when the field doesn’t exist.
python equivalent:
field_id is None- Return type:
- or_filter(other)[source]
Perform an OR operation with a second filter, modifies inplace.
- Parameters:
other (Filter | dict) – Another Filter object or dict to perform the operation with.
- Return type:
- Returns:
self
- Raises:
InvalidFilter – if one of the filters is empty or both filters are the same object.
Examples
Create a Filter f1 that matches tweets with a number of retweets larger than 1000 OR a number of favorites larger than 500:
import querier as qr f1 = qr.Filter() f1.greater_or_equals('retweet_count', 1000) f2 = qr.Filter() f2.greater_or_equals('favorite_count', 500) f1.or_filter(f2)
Alternative notation using the
|operator, not modifying f1 in place but getting the resulting Filter as a new instance f3:f3 = f1 | f2