Skip to content

Crossbar CSV generation#

First, the issue faced was when JSON objects were loaded with differing keys, the resulting CSV would have the same keys end up in different cells. For instance, if the first chunk of JObjs have keys a, c, and e, the header row would be "a","c","e". When the next chunk of JObjs are loaded, there might be a key b present, causing the row to be "a-value","b-value","c-value","e-value". So instead of c-value being in column 2 as the first chunk was, it is offset by one cell to the right.

A subsequent chunk that only has the a, c, and e keys would then re-align back to the proper cell locations.

This makes processing CSVs for billing difficult and annoying for end users - not good!

Possible solutions discussed#

  1. Maintain a view that lists all known keys for each pvt_type of document. When creating the CSV, get the list of keys (probably via the (re)reduce mechanism). This solution was rejected as the index would be over every document's keys resulting in M*N entries in the index.

  2. Maintain a list of keys in the kzd accessor modules for each pvt_type of CSV-enabled fields. This would essentially hard-code the JSON paths to be used for CSV generation. This solution was rejected as the ability to know, a priori, the list of keys is not possible. For instance, custom SIP headers and custom application vars cannot be enumerated on the CDR documents.

  3. Use CouchDB's ability to return the results as a CSV instead of JSON objects. While very attractive, this ultimately was rejected because queries frequently are over the span of 2+ MODBs. More investigation into the possibility is encouraged at some point though.

Current solution#

The accepted solution is to chunk the JSON objects from the DB onto the local disk in a CSV file, while maintaining the list of "seen" key paths. As eash JSON object is processed, check its paths against the known paths and append new paths to the list. Then write the CSV row using that adjusted list and return the list for the next JSON object.

Once all JSON objects are processed, use the final list of paths to prepend the CSV header row.

It is assumed that clients reading the CSV will be able to adjust to variable row lengths.

Pseudocode#

We'll assume objects of the shape of the example above. Objects will have keys a, b, c, and/or e.

  1. Flatten the first JSON object and use the flattened object's keys as the seed for the cell ordering:
    JObj = {"a": 1, "c": 3, "e": 5}
    CellOrdering = kz_json:get_keys(kz_json:flatten(JObj)) % [["a"], ["c"], ["e"]]
    
  2. Create a temporary file for the CSV. The file and the cell ordering become the "accumulator" for the chunking: {File, CellOrdering}
  3. Fold over the JSON objects. For each JSON object:
  4. Flatten the JSON object and get the keys (all available paths)
  5. Compare keys to CellOrdering, appending unseen keys to the end of CellOrdering
    CellOrdering = [["a"], ["c"], ["e"]]
    FlatKeys = [["a"], ["b"]],
    NewCellOrdering = [["a"], ["c"], ["e"], ["b"]]
    
    Here you can see that all subsequent rows will put the value for ["b"] in column 4; and preceding JSON objects will not have a cell for ["b"] (or any other keys that didn't exist at the time the JSON object was converted to a CSV row.
  6. Convert the JSON object to a CSV row, using CellOrdering to get a list of values from the JSON object, using the ?ZILCH macro when the value doesn't exist in the JSON object (resulting in "" for the cell).
  7. Once the fold has completed, the return is {File, CellOrdering} with the latest and greatest list of JSON paths in CellOrdering.
  8. Subsequent calls will include the accumulator, updating CellOrdering as necessary, and ensuring JSON values are written to the same column for each row.
  9. Once all JSON objects are complete, use CellOrdering to write a CSV header row and prepend it to the CSV file.
  10. Use sendfile to stream the file to the client.

Example run#

| JSON | CellOrderingBefore | CellOrderingAfter | Row | | {"a":1,"c":3,"e":5} | [] | [["a"],["c"],["e"]] | 1,3,5 | | {"a":1,"b":2,"e":5} | [["a"],["c"],["e"]] | [["a"],["c"],["e"],["b"]] | 1,"",5,2 | | {"a":1,"c":3,"e":5} | [["a"],["c"],["e"],["b"]] | [["a"],["c"],["e"],["b"]] | 1,3,5,"" | | {"b":2} | [["a"],["c"],["e"],["b"]] | [["a"],["c"],["e"],["b"]] | "","","",2 | | Header Row | [["a"],["c"],["e"],["b"]] | | "a","c","e","b" |

Resulting CSV:

"a","c","e","b"
1,3,5
1,"",5,2
1,3,5,""
"","","",2