Posts

Showing posts from 2020

How to compute Row Size in Complex Spark DataFrame?

Definitely there may be a better way of doing this. I was running into an error where data load from lake to CosmosDB is failing due to a record size exceeding 2MB limit. Hence, converting it to JSON grouped by primary key and computing the size seemed appropriate and easy. Comment below if this can be done better.