Tagged: filterable

Type Class – Filterable

Mapping is about preserving structure while changing values. Folding is about destroying structure. But what about filtering, what does it mean to filter?

Like folding, filtering destroys the original structure but it rebuilds the structure after removing undesired elements. It seems then, that filtering might be a special case of folding. For example:

int main()
{
    std::vector<int> numbers{0,1,2,3,4,5,6,7,8,9,10};

    // filter using lfold
    auto append_even = [](std::vector<int> ns, int n)
    {
        if (n % 2 == 0) ns.push_back(n);
        return ns;
    };

    for (int n : lfold(append_even,std::vector<int>{},numbers))
        std::cout << n << " ";

    return 0;
}

Output:

0 2 4 6 8 10

Another way to look at filtering is to think of each element as a single element list. Filtering then becomes a matter of transforming the lists into empty ones then using fold to recombine the lists:

int main()
{
    std::vector<int> numbers{0,1,2,3,4,5,6,7,8,9,10};

    // filter using map and fold
    auto to_even_vector = [](int n){
        if (n % 2 == 0)
            return std::vector<int>{n};

        return std::vector<int>{};
    };

    for (int n : fold(map(to_even_vector,numbers)))
        std::cout << n << " ";
}

Note that this fold uses monoid to get the aggregator and seed value for a vector. We’re not going to require that filterable inherit from monoid, since that would exclude things like the upcoming std::optional but it’s an recurring idea. In haskell, mfilter actually treats Maybe, the equivalent of optional, as a monoid.

In addition to being foldable, a filterable needs to be structuraly simple enough that it can be rebuilt by appending new values; filtering a binary tree wouldn’t make much sense. This can be seen in the haskell type class ListLike, which provides filtering and inherits from both monoid and a version of foldable.

We define filterable like this:

template<class T>
struct filterable // : foldable
{
    // Type value_type
    // filterable<A> filter(bool(A),filterable<A>)
    static constexpr bool is_instance = false;
};

// Convenience function
template<class F, class T>
auto filter(F&& f, const T& in)
{
    return filterable<T>::filter(std::forward<F>(f),in);
}

Defining the stl container instances:

// container instances
template<class T>
struct default_container_filterable : public foldable<T>
{
    // Type value_type
    using value_type = typename container_traits<T>::value_type;

    // filterable<A> filter(bool(A),filterable<A>)
    template<class F>
    static auto filter(F&& f, const T& in) -> T
    {
        T out;

        for (auto& a : in)
            if (eval(f,a))
                container_traits<T>::add_element(out,a);

        return out;
    }

    static constexpr bool is_instance = false;
};

#define FC_DEFAULT_CONTAINER_FILTERABLE(T)\
    template<class... Args>\
    struct filterable<T<Args...>> : public default_container_filterable<T<Args...>>\
    {};

FC_DEFAULT_CONTAINER_FILTERABLE(std::deque);
FC_DEFAULT_CONTAINER_FILTERABLE(std::list);
FC_DEFAULT_CONTAINER_FILTERABLE(std::multiset);
FC_DEFAULT_CONTAINER_FILTERABLE(std::set);
FC_DEFAULT_CONTAINER_FILTERABLE(std::basic_string);
FC_DEFAULT_CONTAINER_FILTERABLE(std::unordered_multiset);
FC_DEFAULT_CONTAINER_FILTERABLE(std::unordered_set);
FC_DEFAULT_CONTAINER_FILTERABLE(std::vector);

For the sake of efficiency, we don’t actually use fold in the implementation of filter.

Despite std::optional being a good candidate for filterable, we avoid making std::shared_ptr filterable. If the pointed value passes the filter, should filter return a pointer to the same value or a pointer to a copied value? There are compelling reasons for both and this seems like an area ripe for unexpected results, so let’s just go ahead and stay out of it.

Now that we have map, filter and fold, we can combine these to do all sorts of interesting operations on data. For example, this code will find the mean income of people in their 20s:

struct Person
{
    std::string name;
    int age;
    double income;
};

double mean_20s_income(const std::vector<Person>& people)
{
    auto in_20s = [](const Person& p){return p.age < 30 && p.age >= 20;};

    auto get_mean = [n=0](double last_mean, double income) mutable
    {
        ++n;
        return last_mean*(n-1)/n + (double)income/n;
    };

    return lfold(get_mean, 0.0, map(&Person::income, filter(in_20s, people)));
}

Of course, this could be done by storing the filtered list of incomes, but whatever! The possibilities are endless!

Complete code is available here. I would like to thank the friendly people at r/haskell for all their help.

Advertisements