Re: [Haskell-cafe] Loading a csv file with ~200 columns into Haskell Record

2 Oct 2017


      ...
Having to not have something which I can quickly start off on
What do you mean by that? And what precisely is the  discomfort between
Haskell vs python for your use-case?

On 02-Oct-2017 7:29 AM, "Guru Devanla"  wrote:
...
Thank you all for your helpful suggestions. As I wrote the original
question, even I was trying to decide between the approach of using Records
to represent each row or  define a vector for each column and each vector
becomes an attribute of the record.  Even, I was leaning towards the latter
given the performance needs.
Since, the file is currently available as a CSV adding Persistent and any
ORM library would be an added dependency.
I was trying to solve this problem without too many dependencies of other
libraries and wanting to learn new DSLs. Its a tempting time killer as
everyone here would understand.
@Anthony Thank your for your answer as well. I have explored Frames
library in the past as I tried to look for Pandas like features in Haskell
The library is useful and I have played around with it. But, I was never
confident in adopting it for a serious project. Part of my reluctance,
would be the learning curve plus I also need to familiarize myself with
`lens` as well. But, looks like this project I have in hand is a good
motivation to do both. I will try to use Frames and then report back. Also,
apologies for not being able to share the data I am working on.
With the original question, what I was trying to get to is, how are these
kinds of problems solved in real-world projects. Like when Haskell is used
in data mining, or in financial applications. I believe these applications
deal with this kind of data where the tables are wide. Having to not have
something which I can quickly start off on troubles me and makes me wonder
if the reason is my lack of understanding or just the pain of using static
typing.
Regards
On Sun, Oct 1, 2017 at 1:58 PM, Anthony Cowley 
wrote:
...
...
On Sep 30, 2017, at 9:30 PM, Guru Devanla 
wrote:
Hello All,
I am in the process of replicating some code in Python in Haskell.
In Python, I load a couple of csv files, each file having more than 100
columns into a Pandas' data frame. Panda's data-frame, in short is a
tabular structure which lets me performs on bunch of joins, and filter out
data. I generated different shapes of reports using these operations. Of
course, I would love some type checking to help me with these merge, join
operations as I create different reports.
>
> I am not looking to replicate the Pandas data-frame functionality in
Haskell. First thing I want to do is reach out to the 'record' data
structure. Here are some ideas I have:
>
> 1.  I need to declare all these 100+ columns into multiple record
structures.
> 2.  Some of the columns can have NULL/NaN values. Therefore, some of
the attributes of the record structure would be 'MayBe' values. Now, I
could drop some columns during load and cut down the number of attributes i
created per record structure.
> 3.  Create a dictionary of each record structure which will help me
index into into them.'
>
> I would like some feedback on the first 2 points. Seems like there is a
lot of boiler plate code I have to generate for creating 100s of record
attributes. Is this the only sane way to do this?  What other patterns
should I consider while solving such a problem.
>
> Also, I do not want to add too many dependencies into the project, but
open to suggestions.
>
> Any input/advice on this would be very helpful.
>
> Thank you for the time!
> Guru
The Frames package generates a vinyl record based on your data (like
hlist; with a functor parameter that can be Maybe to support missing data),
storing each column in a vector for very good runtime performance. As you
get past 100 columns, you may encounter compile-time performance issues. If
you have a sample data file you can make available, I can help diagnose
performance troubles.
Anthony
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Re: [Haskell-cafe] Loading a csv file with ~200 columns into Haskell Record

Saurabh Nanda